More

freezed8 · 2025-07-22T16:24:37 1753201477

This blog post makes some good points about using vision models for retrieval, but I do want to call out a few problems: 1. The blog conflates indexing/retrieval with document parsing. Document parsing itself is the task of converting a document into a structured text representation, whether it's markdown/JSON (or in the case of extraction, an output that conforms to a schema). It has many uses, one of which is RAG, but many of which are not necessarily RAG related.

ColPali is great for retrieval, but you can't use ColPali (at least natively) for pure document parsing tasks. There's a lot of separate benchmarks for just evaluating doc parsing while the author mostly talks about visual retrieval benchmarks.

2. This whole idea of "You can DIY document parsing by screenshotting a page" is not new at all, lots of people have been talking about it! It's certainly fine as a baseline and does work better than standard OCR in many cases.

a. But from our experience there's still a long-tail of accuracy issues. b. It's missing metadata like confidence scores/bounding boxes etc. out of the box c. Honestly this is underrated, but creating a good screenshotting pipeline itself is non-trivial.

3. In general for retrieval, it's helpful to have both text and image representations. Image tokens are obviously much more powerful. Text tokens are way cheaper to store and let you do things like retrieval entire documents (instead of just chunks) and input that into the LLM.

(disclaimer: I am ceo of llamaindex, and we have worked on both document parsing and retrieval with LlamaCloud, but I hope my point stands in a general sense)

freezed8 · 2025-06-26T00:14:55 1750896895

hi! (i'm jerry ceo of llamaindex)

would be happy to chat and show you our excel agent capabilities, feel free to send us a message at [email protected].

financial statements like P&L are actually our sweet spot atm so this sounds like a good fit

freezed8 · 2025-02-05T23:50:24 1738799424

(disclaimer I am CEO of llamaindex, which includes LlamaParse)

Nice article! We're actively benchmarking Gemini 2.0 right now and if the results are as good as implied by this article, heck we'll adapt and improve upon it. Our goal (and in fact the reason our parser works so well) is to always use and stay on top of the latest SOTA models and tech :) - we blend LLM/VLM tech with best-in-class heuristic techniques.

Some quick notes: 1. I'm glad that LlamaParse is mentioned in the article, but it's not mentioned in the performance benchmarks. I'm pretty confident that our most accurate modes are at the top of the table benchmark - our stuff is pretty good.

2. There's a long tail of issues beyond just tables - this includes fonts, headers/footers, ability to recognize charts/images/form fields, and as other posters said, the ability to have fine-grained bounding boxes on the source elements. We've optimized our parser to tackle all of these modes, and we need proper benchmarks for that.

3. DIY'ing your own pipeline to run a VLM at scale to parse docs is surprisingly challenging. You need to orchestrate a robust system that can screenshot a bunch of pages at the right resolution (which can be quite slow), tune the prompts, and make sure you're obeying rate limits + can retry on failure.

M4v3R · 2025-02-06T11:03:18 1738839798

The very first (and probably hand-picked & checked) example on your website [0] suffers from the very problem people are talking about here - in "Fiscal 2024" row it contains an error for CEO CAP column. On the image it says "$234.1" but the parsed result says "$234.4". A small error, but error nonetheless. I wonder if we can ever fix these kind of errors with LLM parsing.

[0] https://www.llamaindex.ai/llamaparse

dilDDoS · 2025-02-06T21:47:57 1738878477

Looks like this was fixed, the parsed result says "$234.1" on my end. I wonder if the error was fixed manually or with another round of LLM parsing?

heidarb · 2025-02-06T11:51:37 1738842697

I'm a happy customer. I wrote a ruby client for your API and have been parsing thousands of different types of PDFs through it with great results. I tested almost everything out there at the time and I couldn't find anything that came close to being as good as llamaparse.

BenGosub · 2025-02-07T09:56:03 1738922163

Indeed, this is also my experience. I have tried a lot of things and where quality is more important than quantity, I doubt there are many tools that can come close to Llamaparse.

rendaw · 2025-02-06T10:13:37 1738836817

All your examples are exquisitely clean digital renders of digital documents. How does it fare with real scans (noise, folds) or photos? Receipts?

Or is there a use case for digital non-text pdfs? Are people really generating image and not text-based PDFs? Or is the primary use case extracting structure, rather than text?

rahimnathwani · 2025-02-06T00:00:00 1738800000

Hi Jerry,

How well does llamaparse work on foreign-language documents?

I have pipeline for Arabic-language docs using Azure for OCR and GPT-4o-mini to extract structured information. Would it be worth trying llamaparse to replace part of the pipeline or the whole thing?

freezed8 · 2025-02-06T00:19:35 1738801175

yes! we have foreign language support for better OCR on scans. Here's some more details. Docs: https://docs.cloud.llamaindex.ai/llamaparse/features/parsing... Notebook: https://github.com/run-llama/llama_parse/blob/main/examples/...

rahimnathwani · 2025-02-06T00:21:49 1738801309

What is disable_ocr=True for? Is it for documents that already have a text layer, that you don't want to OCR again?

freezed8 · 2025-02-06T00:51:18 1738803078

yeah disable OCR is for documents where you don't need to OCR a scanned image, it'll just parse out the text

it's faster if True

sensanaty · 2025-02-06T13:36:59 1738849019

There's an error right on your landing page [1] with the parsed document...

It's supposed to say 234.1, not 234.4

https://www.llamaindex.ai/llamaparse

7thpower · 2025-02-06T13:11:52 1738847512

But can it do this table?!:

https://x.com/preston_mos/status/1853931388929511619?s=46

freezed8 · on June 21, 2024

(jerry here from llamaindex)

wait do you have specific examples of "overengineering and overabstracting" from llamaindex? very open to feedback and suggestions on improvement - we've spent a lot of work making sure everything is customizable

freezed8 · on Feb 21, 2024

(jerry here)

Thanks for running through the benchmark! Just to clarify some things: (1) The idea is that LlamaParse's markdown representation lends itself to the rest of LlamaIndex advanced indexing/retrieval abstractions. Recursive retrieval is a fancy retrieval method designed to model documents with embedded objects, but depends on good PDF parsing. Naive PyPDF parsing can't be used with recursive retrieval. Our goal is to demonstrate the e2e RAG capabilities of LlamaParse + advanced retrieval vs. what you can build with a naive PDF parser.

(2). Since we use LLM-based evals, your correctness and relevancy metric look to be consistent and within margin of error (and lower than our llamaparse metrics). The faithfulness score seems way off though and quite high from your side, so not sure what's going on there. maybe hop in our discord and share the results in our channel?

freezed8 · on July 8, 2023

Hey all! Jerry here (from LlamaIndex).

We love the feedback, and one main point especially seems to be around making the docs better: - Improve the organization to better expose both our basic and our advanced capabilities - Improve the documentation around customization (from LLM's to retrievers etc.) - Improve the clarity of our examples/notebooks.

Will have an update in a day or two :)

freezed8 · on June 14, 2023

100%, if the API itself can choose to call a function or an LLM, then it's way easier to build any agent loop without extensive prompt engineering + worrying about errors.

Tweeted about it here as well: https://twitter.com/jerryjliu0/status/1668994580396621827?s=...

bel423 · on June 15, 2023

You still have to worry about errors. You will probably have to add an error handler function that it can call out to. Otherwise the LLM will hallucinate a valid output regardless of the input. You want it to be able to throw an error and say I could produce the output given this format.

freezed8 · on June 6, 2023

Hi all - Jerry (co-founder/CEO) here, here to help answer any questions you might have!

We're building a data framework to unlock the full capabilities of LLMs on top of your private data. We can’t wait for the future - this space is moving so rapidly and there’s so many things we want to do on both the open-source and enterprise side.

Feel free to shoot me a personal note on Twitter/Discord as well.

freezed8 · on April 28, 2023

depending on what questions you're asking, you could check out LlamaIndex query capabilities - can define different index structures for different queries + can plugin to your langchain workflow: https://gpt-index.readthedocs.io/en/latest/use_cases/queries...

freezed8 · on April 20, 2023

would love to have CozoDB be a part of llamaindex too! have a bunch of integrations with existing vector db's https://github.com/jerryjliu/llama_index/tree/main/gpt_index...