More

tompec · on Aug 1, 2024

It doesn't embed images, no. But that's a great idea for the roadmap!

boredemployee · on Aug 2, 2024

great. I really want a feature like that! I'd like to query my knowledge base about images as well!

tompec · on Aug 1, 2024

I chunk pages and generate embeddings for each chunk. So there's no real size limit per page.

kaycebasques · on Aug 1, 2024

The more detail, the better. If `<section>` elements are found you chunk those? Do you do it recursively or do you stop after a certain level? And when section elements don't exist, you use `<h1>`, `<h2>`, etc. to infer logical chunks?

tompec · on Aug 1, 2024

Having looked at a lot of HTMLs, I noticed that sections are not really the default. I rely on headings (h1, h2, ...) to chunk each pages. Each chunk has its heading hierarchy attached to it. There are a lot of optimizations that could be done at that level.

chasd00 · on Aug 1, 2024

i'm just guessing but i would think following whatever semantics leads to the highest search rank in google's algorithm would be what you're most likely to find out in the wild.

tompec · on July 31, 2024

Gotta start somewhere :)

tompec · on July 31, 2024

Sorry about that, a bit too much load at the moment

tompec · on July 31, 2024

Thanks! I'm still figuring things out about pricing, but there will be small plans available.

tompec · on July 31, 2024

It does respect robots.txt when crawling. I'll add more details about this in the docs.

pryelluw · on July 31, 2024

I appreciate the reply. As someone who runs multiple CMSs it’s painful to deal with the ai crawlers these days. Specially the ones that don’t respect my terms.

tompec · on July 31, 2024

Currently just a cloud-toy.

tompec · on July 31, 2024

Thanks! The chat demo is actually just a small thing I put together as a preview of what can be done, but the main product is the API. But seeing that most users seem to like that, there's probably something there... If you want to email me at support at embedding.io with some requirements, I can see how to make that work for you.

tompec · on July 31, 2024

You can group as many websites as you want into a collection. Then query that collection. Not sure what you mean by exporting; you would like to export the vectors themselves? Or just the chunks of text from the websites?

tompec · on July 31, 2024

It currently will try to find a sitemap on its own. But I have on the roadmap to let users add their own.