I've found it works better to chunk by some logical sections in the document, e.g header h2 h3 h4 etc or 1.1 1.1.1 ... plus to be able to ignore some stuff (header and footer) plus other customizations.
At least for use cases where there are clusters of many similarly formatted documents, it would be cool to have a way of easily customizing chunking.
At least for use cases where there are clusters of many similarly formatted documents, it would be cool to have a way of easily customizing chunking.