Hacker Newsnew | past | comments | ask | show | jobs | submit | tompec's commentslogin

It doesn't embed images, no. But that's a great idea for the roadmap!


great. I really want a feature like that! I'd like to query my knowledge base about images as well!


I chunk pages and generate embeddings for each chunk. So there's no real size limit per page.


The more detail, the better. If `<section>` elements are found you chunk those? Do you do it recursively or do you stop after a certain level? And when section elements don't exist, you use `<h1>`, `<h2>`, etc. to infer logical chunks?


Having looked at a lot of HTMLs, I noticed that sections are not really the default. I rely on headings (h1, h2, ...) to chunk each pages. Each chunk has its heading hierarchy attached to it. There are a lot of optimizations that could be done at that level.


i'm just guessing but i would think following whatever semantics leads to the highest search rank in google's algorithm would be what you're most likely to find out in the wild.


Gotta start somewhere :)


Sorry about that, a bit too much load at the moment


Thanks! I'm still figuring things out about pricing, but there will be small plans available.


It does respect robots.txt when crawling. I'll add more details about this in the docs.


I appreciate the reply. As someone who runs multiple CMSs it’s painful to deal with the ai crawlers these days. Specially the ones that don’t respect my terms.


Currently just a cloud-toy.


Thanks! The chat demo is actually just a small thing I put together as a preview of what can be done, but the main product is the API. But seeing that most users seem to like that, there's probably something there... If you want to email me at support at embedding.io with some requirements, I can see how to make that work for you.


You can group as many websites as you want into a collection. Then query that collection. Not sure what you mean by exporting; you would like to export the vectors themselves? Or just the chunks of text from the websites?


It currently will try to find a sitemap on its own. But I have on the roadmap to let users add their own.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: