More

arkmm · 2026-03-11T17:44:42 1773251082

Didn't know this technique had a name, but I would think a modern compiler could make this optimization on its own, no?

Sesse__ · 2026-03-11T18:07:31 1773252451

No, it's not equivalent for floating point, so a compiler won't do it unless you do -fassociative-math (or a superset, such as -ffast-math), in which case all correctness bets are off.

arkmm · 2026-03-04T19:21:16 1772652076

You can fine tune a small LLM with a few thousand examples in just a few hours for a few dollars. It can be a bit tricky to host, but if you share a rough idea of the volume and whether this needs to be real-time or batched, I could list some of the tradeoffs you'd think about.

Source: Consulted for a few companies to help them finetune a bunch of LLMs. Typical categorical / data extraction use cases would have ~10x fewer errors at 100x lower inference cost than using the OpenAI models at the time.

azath92 · 2026-03-05T09:03:23 1772701403

ok, even that "few thousand examples" heuristic is useful. the usecase would be to run this task over id say somewhere in the order of magnitude of 100k extractions in a run, batched not real time, and we'd be interested in (and already do) reruns regularly with minor tweaks to the extracted blob (1-10 simple fields, nothing complex).

My interest in fine tuning at all is based on an adjacent interest in self hosting small models, although i tested this on aws bedrock for ease of comparison, so my hope is that given we are self hosting, then fine tuning and hosting our tuned model shouldn't be terribly difficult, at least compared to managed finetuning solutions on cloud providers which im generally wary of. Happy for those assumptions to be challenged.

arkmm · 2026-03-04T19:02:03 1772650923

Can you share more details about your use case? The good applications of fine tuning are usually pretty niche, which tends to make people feel like others might not be interested in hearing the details.

As a result it's really hard to read about real-world use cases online. I think a lot of people would love to hear more details - at least I know I would!

faxmeyourcode · 2026-03-10T15:15:44 1773155744

If you treat LLMs as generic transformers, you can fine tune with a ton of examples of input output pairs. For messy input data with lots of examples already built, this is ideal.

At my day job we have experimented with fine tuned transformers for our receipt processing workflow. We take images of receipts, run them through OCR (this step might not even be necessary, but we do it at scale already anyways), and then take the OCR output text blobs and "transform" them into structured receipts with retailer, details like zip code, transaction timestamps, line items, sales taxes, sales, etc.

I trained a small LLM (mistral-7b) via SFT with 1000 (maybe 10,000? I don't remember) examples from receipts in our database from 2019. When I tested the model on receipts from 2020 it hit something like 98% accuracy.

The key that made this work so well is that we had a ton of data (potentially billions of example input/output pairs) and we could easily evaluate the correctness by unpacking the json output and comparing with our source tables.

Note that this isn't running in production, it was an experiment. There are edge cases I didn't consider, and there's a lot more to it in terms of accurately evaling, when to re-train, dealing with net new receipt types, retailers, new languages (we're doing global expansion RN so it's top of mind), general diversity of edge cases in your training data, etc.

arkmm · 2026-03-04T00:43:11 1772584991

Payment fees are crazy when you think about them from the perspective of a merchant in a low margin business. E.g. in retail or restaurants, margins aren't much better than ~10%. If they didn't have to pay ~3% credit card fees, they'd have 30% more profit!

arkmm · 2026-02-26T19:22:07 1772133727

I used to also have this optimistic take, but over time I think the reality is that most people will instead just distrust unknown online sources and fall into the mental shortcuts of confirmation bias and social proof. Net effect will be even more polarization and groupthink.

arkmm · 2026-02-26T00:06:37 1772064397

Get ready for the acquisition offers.

arkmm · 2026-02-11T21:50:28 1770846628

Sorry Ploum, just getting a chance to read this now and comment. Great insights!

arkmm · 2026-02-03T20:13:55 1770149635

this is a really cool insight, going to use this on my team from now on!

arkmm · 2026-01-15T14:01:20 1768485680

They're still very good for finetuned classification, often 10-100x cheaper to run at similar or higher accuracy as a large model - but I think most people just prompt the large model unless they have high volume needs or need to self host.

arkmm · 2026-01-07T21:35:39 1767821739

Maybe a bit off-topic, but how'd you meet your partner while on your adventures?