Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was writing a parser for a reasonable subset of HTML5 (a full parser would actually need to include JavaScript parsing as well) and there are certain states where the same character must be reprocessed under a different parser state. [1]

I wrote my parser by iterating character by character (foreach) and when reaching those states that need reprocessing I found it nicer to jump to a goto label at the start of the foreach, after the state variable was updated.

If anyone is aware of a purely functional HTML5 compliant parser let me know, as I'd love to steal some ideas.

[1] https://www.w3.org/TR/2011/WD-html5-20110113/tokenization.ht... ctrl-f reconsume




That is a HTML generator library.

And as I've said, for HTML5 compliance you need to also parse JavaScript, see second sentence here https://dev.w3.org/html5/spec-LC/parsing.html#overview-of-th...

A good first smoke-test for HTML5 supporting parsers is the following html fragment

   <input disabled>
XML-like parsers (masquerading as having "HTML5 support") will fail if tags aren't closed or self-closing, and if there are attributes without an explicit value.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: