
Public Member Functions | |
| HtmlParse (HtmlLex src) | |
| void | Parse () |
| Starts the parser, reads from the source, and starts events. | |
Protected Member Functions | |
| virtual bool | OnAttribute (string attributeName, string value, IDictionary attributes) |
| This will be called on defining an attribute of an element. | |
| virtual void | OnClosingElement (string currentElement, string tag) |
| This will be called whenever the stack decreases. | |
| virtual void | OnDefaultEvent (string token) |
| This will be called whenever this is not a part of more specific events. | |
| virtual void | OnEndElementTag (string currentElement, string tagString, IDictionary attributes) |
| Overload this to react on the start of an element. | |
| virtual void | OnRemark (string remark) |
| This is called whenever the parser has read a remark. | |
| virtual void | OnStartElementTag (string currentElement) |
| This will be called before any other domain on starting a new element tag. | |
Properties | |
| int | Depth [get] |
| This is the depth of the internal stack representing the nested structure of tags. | |
| StackEntry | this [int depth] [get] |
| This will access the internal stack element according to the given depth. | |
Classes | |
| class | StackEntry |
| This class represents an entry in the element stack. More... | |
This class uses HtmlLex to scan for tokens in an HTML text stream. It calls some virtual methods like OnEnterElement() on certain events in parsing the text. Additionally, specializations can store a stack of state information that grows with parsing into nested tags and decreases with leaving nested tags either parsing the corresponding end tag or an end tag of a tag deeper in the stack.
This procedure is rather optimized for robustness and not for compatibility of standards. In fact, nearly anything defined by W3C is ignored here. The only tested purpose is the adoption of doxygen output to the wx.NET help viewer.
Definition at line 28 of file HtmlParse.cs.
| Contrib.Html.HtmlParse.HtmlParse | ( | HtmlLex | src | ) |
| virtual bool Contrib.Html.HtmlParse.OnAttribute | ( | string | attributeName, | |
| string | value, | |||
| IDictionary | attributes | |||
| ) | [protected, virtual] |
This will be called on defining an attribute of an element.
this[0] will always be the current element. OnElement() has not yet been called. The value will be stripped of quotes if necessary.
| attributeName | is the name of the attribute | |
| value | is the value (without quotes) or empty attributeName is an attribute without values. | |
| attributes | maps the names of the parsed attributes to their values. You may extend this to inline new attributes that have not been parsed but that shall be processed by OnEndElementTag(). |
true this method tells the parser to add the received attribute to the list of attributes to ba passed to OnEndElementTag(). Reimplemented in wx.ZipRC.DoxygenHtbConverter.
Definition at line 132 of file HtmlParse.cs.
| virtual void Contrib.Html.HtmlParse.OnClosingElement | ( | string | currentElement, | |
| string | tag | |||
| ) | [protected, virtual] |
This will be called whenever the stack decreases.
this[0] is always the currentElement.
Reimplemented in wx.ZipRC.DoxygenHtbConverter.
Definition at line 140 of file HtmlParse.cs.
| virtual void Contrib.Html.HtmlParse.OnDefaultEvent | ( | string | token | ) | [protected, virtual] |
This will be called whenever this is not a part of more specific events.
Reimplemented in wx.ZipRC.DoxygenHtbConverter.
Definition at line 146 of file HtmlParse.cs.
| virtual void Contrib.Html.HtmlParse.OnEndElementTag | ( | string | currentElement, | |
| string | tagString, | |||
| IDictionary | attributes | |||
| ) | [protected, virtual] |
Overload this to react on the start of an element.
| currentElement | is the element's name in lower case letters (e.g. "ul"). | |
| currentElementString | is the full string describing the current element like for instance 'this[0] will always be this element. | |
| currentElement | is the current element's name like "ul" | |
| tagString | is the full tag string including attributes introduced by OnAttribute(). | |
| attributes | maps the names of the parsed attributes to their values. |
Reimplemented in wx.ZipRC.DoxygenHtbConverter.
Definition at line 117 of file HtmlParse.cs.
| virtual void Contrib.Html.HtmlParse.OnRemark | ( | string | remark | ) | [protected, virtual] |
This is called whenever the parser has read a remark.
| remark | is the remark text without surrounding tags. |
Definition at line 153 of file HtmlParse.cs.
| virtual void Contrib.Html.HtmlParse.OnStartElementTag | ( | string | currentElement | ) | [protected, virtual] |
This will be called before any other domain on starting a new element tag.
Reimplemented in wx.ZipRC.DoxygenHtbConverter.
Definition at line 105 of file HtmlParse.cs.
| void Contrib.Html.HtmlParse.Parse | ( | ) |
Starts the parser, reads from the source, and starts events.
Definition at line 159 of file HtmlParse.cs.
References Contrib.Html.HtmlParse.StackEntry.Element.
Referenced by wx.ZipRC.ZipResourceCompiler.Main().
00160 { 00161 IDictionary attributes=null; 00162 string attributeName=null; 00163 bool readingAttributeValue=false; 00164 StringBuilder fullElementString = null; 00165 for (string token = this._src.NextToken(); 00166 token != null; 00167 token = this._src.NextToken()) 00168 { 00169 if (token.StartsWith("<!--")) 00170 { 00171 // processing remarks 00172 token = token.Substring(4, token.Length - 7); 00173 token = token.Trim(); 00174 this.OnRemark(token); 00175 token = this._src.NextToken(); 00176 if (token == null) break; 00177 } 00178 if (token.TrimStart().StartsWith("</")) 00179 { 00180 token = token.Trim(); 00181 // end tag. decrease stack. 00182 string element = token.Substring(2).ToLower(); 00183 token+=this._src.NextToken().Trim(); 00184 // ignore end tags without start. doxygen seems to add some 00185 // end tags without start. So, first search for a start. 00186 bool foundTag = false; 00187 foreach (StackEntry entry in this._stack) 00188 { 00189 if (entry.Element == element) 00190 { 00191 foundTag = true; 00192 break; 00193 } 00194 } 00195 if (foundTag) 00196 { 00197 // If we found a possible start, remove stack elements until 00198 // start reached. 00199 while (this._stack.Count > 0) 00200 { 00201 StackEntry current = (StackEntry)this._stack[0]; 00202 if (current.Element == element) 00203 { 00204 this.OnClosingElement(current.Element, token); 00205 this._stack.RemoveAt(0); 00206 break; 00207 } 00208 else 00209 this._stack.RemoveAt(0); 00210 } 00211 } 00212 } 00213 else if (token.TrimStart().StartsWith("<")) 00214 { 00215 // start tag. stack grows. 00216 token = token.Trim(); 00217 string element = token.Substring(1).ToLower(); 00218 if (this.Depth > 0 && this[0].Element == element) 00219 { 00220 // the new element is equal to the old element. 00221 // act as if the old element has been closed explicitely. 00222 this.OnClosingElement(this[0].Element, string.Format("</{0}>", this[0].Element)); 00223 this._stack.RemoveAt(0); 00224 } 00225 this._stack.Insert(0, new StackEntry(element)); 00226 if (this.Depth > 1) 00227 this[0].State = (ICloneable)this[1].State.Clone(); // State inheritance 00228 this.OnStartElementTag(element); 00229 attributes = new Hashtable(); // Start reading attributes 00230 fullElementString = new StringBuilder(); 00231 fullElementString.AppendFormat("{0} ", token); 00232 } 00233 else if (token.Trim() == ">") 00234 { 00235 token = token.Trim(); 00236 // closing the element definition. Only interesting if reading attributes. 00237 if (attributes != null && fullElementString != null) 00238 { 00239 this.OnEndElementTag(this[0].Element, fullElementString.ToString()+token, attributes); 00240 } 00241 fullElementString = null; 00242 attributes = null; 00243 attributeName = null; 00244 } 00245 else if (attributes!=null && attributeName != null && readingAttributeValue) 00246 { 00247 token = token.Trim(); 00248 // the token is an attribute value. 00249 string attributeValue = token; 00250 if (attributeValue.StartsWith("\"") && attributeValue.EndsWith("\"") && attributeValue.Length > 1) 00251 attributeValue = attributeValue.Substring(1, attributeValue.Length - 2); 00252 IDictionary newAttributes = new Hashtable(); 00253 bool addThis=this.OnAttribute(attributeName, attributeValue, newAttributes); 00254 foreach (DictionaryEntry entry in newAttributes) 00255 { 00256 attributes.Add(entry.Key, entry.Value); 00257 if (entry.Value==null || entry.Value.ToString().Length > 0) 00258 fullElementString.AppendFormat("{0}=\"{1}\" ", entry.Key, entry.Value); 00259 else 00260 fullElementString.AppendFormat("{0} ", entry.Key); 00261 } 00262 if (addThis) 00263 { 00264 fullElementString.AppendFormat("{0}=\"{1}\" ", attributeName, attributeValue); 00265 } 00266 readingAttributeValue = false; 00267 attributeName = null; 00268 } 00269 else if (attributes != null && attributeName == null) 00270 { 00271 // the token is an attribute name 00272 token = token.Trim(); 00273 attributeName = token; 00274 } 00275 else if (attributes != null) 00276 { 00277 // this branch expects an equal sign. 00278 token = token.Trim(); 00279 if (token == "=") 00280 { 00281 readingAttributeValue = true; 00282 } 00283 else 00284 { 00285 IDictionary newAttributes=new Hashtable(); 00286 bool addThis=this.OnAttribute(attributeName, "", newAttributes); 00287 foreach (DictionaryEntry entry in newAttributes) 00288 { 00289 attributes.Add(entry.Key, entry.Value); 00290 if (entry.Value == null || entry.Value.ToString().Length > 0) 00291 fullElementString.AppendFormat("{0}=\"{1}\" ", entry.Key, entry.Value); 00292 else 00293 fullElementString.AppendFormat("{0} ", entry.Key); 00294 } 00295 if (addThis) 00296 { 00297 fullElementString.AppendFormat("{0} ", attributeName); 00298 } 00299 readingAttributeValue = false; 00300 attributeName = null; 00301 } 00302 } 00303 else 00304 { 00305 this.OnDefaultEvent(token); 00306 } 00307 } 00308 }
int Contrib.Html.HtmlParse.Depth [get, protected] |
This is the depth of the internal stack representing the nested structure of tags.
Definition at line 61 of file HtmlParse.cs.
| StackEntry Contrib.Html.HtmlParse.this[int depth] | ( | ) | [get, protected] |
This will access the internal stack element according to the given depth.
this[0] returns the entry on the current element. this[1] will return the entry on the element containing the current element or null of there is not containing element. All elements without explicit end tag are considered to contain everything that their predecessor contains. So,
<p>
A paragraph.
<p>
Another paragraph.
<p>
A paragraph.
<p>
Another paragraph.
</p>
</p>
<p>
A paragraph.
</p>
<p>
Another paragraph.
</p>
Definition at line 94 of file HtmlParse.cs.
(c) 2003-2010 the wx.NET project