Using grammar constrained output in llama.cpp - which has been available for age...

snerbles · 2025-09-23T20:00:24 1758657624

A guide on llama.cpp's grammars (nine hours and not a single mention of "GBNF"? HN is slipping) is here:

https://github.com/ggml-org/llama.cpp/blob/master/grammars/R...

There's also a grammar validation tool in the default llama.cpp build, which is much easier to reason about for debugging grammars than having them bounce off the server.

FlyingLawnmower · 2025-09-23T13:45:01 1758635101

If your masking is fast enough, you can make it easily work with spec dec too :). We manage to keep this on CPU. Some details here: https://github.com/guidance-ai/llguidance/blob/main/docs/opt...