I was writing a parser for a reasonable subset of HTML5 (a full parser would actually need to include JavaScript parsing as well) and there are certain states where the same character must be reprocessed under a different parser state. [1]
I wrote my parser by iterating character by character (foreach) and when reaching those states that need reprocessing I found it nicer to jump to a goto label at the start of the foreach, after the state variable was updated.
If anyone is aware of a purely functional HTML5 compliant parser let me know, as I'd love to steal some ideas.
A good first smoke-test for HTML5 supporting parsers is the following html fragment
<input disabled>
XML-like parsers (masquerading as having "HTML5 support") will fail if tags aren't closed or self-closing, and if there are attributes without an explicit value.
I wrote my parser by iterating character by character (foreach) and when reaching those states that need reprocessing I found it nicer to jump to a goto label at the start of the foreach, after the state variable was updated.
If anyone is aware of a purely functional HTML5 compliant parser let me know, as I'd love to steal some ideas.
[1] https://www.w3.org/TR/2011/WD-html5-20110113/tokenization.ht... ctrl-f reconsume