Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From a quick glance, it looks like you'd need to make Gumbo a backend for lxml. Is that even possible?


https://github.com/html5lib/html5lib-python/issues/105 seems to imply that such a thing is possible. I am unsure about the requirement for lxml. I was under the impression that lxml is an optional walker, the default is the slower pure python walker.


You're misunderstanding the level at which html5lib operates: it merely parses to a tree (using a "tree builder" to provide a common API to the parser to build the tree, which can be a DOM tree, an ElementTree, an lxml tree, whatever) and provides a generic "tree walker" API that walks over one of those tree formats and provides a common stream of events (start tag, end tag, text, comment, etc.) which can then be used, e.g., in the serialiser.

This can therefore be used with Gumbo by passing the lxml tree builder into its html5lib.parse like method.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: