The absolute worst place to be right now is in a B tech startup. Not only do you...

CaptainOfCoit · 2025-10-21T16:02:40 1761062560

> And the only lever you have to pull is a lengthy model re-training or fine tuning/development cycle.

Is this really how professionals work on such a problem today?

The times I'd had a tune the responses, we'd gather bad/good examples, chuck it into a .csv/directory, then create an automated pipeline to give us a percentage of success rate for what we expect, then start tuning the prompt, parameters for inference and other things in an automated manner. As we discover more bad cases, add them to the testing pipeline.

Only if it was something that was very wrong would you reach for model re-training or fine-tuning, or when you know up front the model wouldn't be up for the exact task you have in mind.

iamleppert · 2025-10-21T16:10:08 1761063008

Got it, professionals don't fine tune their models and you can do everything via prompt engineering and some script called optimze.py that fiddles with API parameters for your call to OpenAI. So simple!

singron · 2025-10-21T16:45:54 1761065154

It depends. Fine-tuning is a significant productivity drag over in-context learning, so you shouldn't attempt it lightly. If you are working on low-latency tasks or need lower marginal costs, then fine-tuning a small model might be the only way to achieve your goals.

benzible · 2025-10-21T16:19:32 1761063572

Agree for the most part but at the SaaS company I'm at, we've built a feature using LLMs to extract structured data from large unstructured documents. Not something that's been done well in this domain and this solution works better than any other we've tried.

We've kept the LLM constrained to just extracting values with context, and we show the values to end-users in a review UI that shows the source doc and allows them to navigate to exactly the place the doc where a given value was extracted. These are mostly numbers but occasionally the LLM needs to do a bit of reasoning to determine a value (e.g., is this X, Y or Z type of transaction where the exact words X, Y or X will not necessarily appear). Any calculations that can be performed deterministically are done in a later step using a very detailed, domain specific financial model.

This is not a chatbot or other crap shoehorned into the app. Users are very excited about this - it automates painful data entry and allows them to check the source - which they actually do, because they understand the cost of getting the numbers wrong.