The "magic" is done via the JSON schemas that are passed in along with the definition of the tool.
Structured Output APIs (inc. the Tool API) take the schema and build a Context-free Grammar, which is then used during generation to mask which tokens can be output.
FWIW, not saying it's right (as a hunter I wouldn't ever do this myself), but most of the biologists that build the population models, inc. the ones that they use to set the amount of hunting licenses or tags sold, build a certain amount of poaching into their models.
It's a particularly hard problem to solve - the hobby is usually spread through traditional means (you do it if your parents did it), and going all the way back in certain communities this was the main way to get meat, even before it became regulated. It's difficult to stop something that not only puts food on the table for your family, but has been done that way for generations.
This was one of the main contributors to the decline of the turkey population in the lower 48. In the early 1900's, a lot of folks thought turkey's were extinct because of over hunting and poaching, and the National Wild Turkey Foundation took efforts to restore the population for hunting.
> In the early 1900's, a lot of folks thought turkey's were extinct because of over hunting and poaching, and the National Wild Turkey Foundation took efforts to restore the population for hunting.
Well they've definitely recovered in NW Wisconsin. Theyre everywhere and the males won't even move out of the way of cars.
This is one of those joyful concepts you learn about as a homeowner, especially on older homes.
If you have plumbing that's done in different metal materials (copper, steel, lead, etc.) and any of your pipes touch, you have to perform regular maintenance and apply a dielectric grease (another one of those single-use materials that you have to buy and store away) or your pipes could corrode and cause a ton of damage.
We think we stand out from our competitors in the space because we built first for the enterprise case, with consideration for things like data governance, acceptable use, and data privacy and information security that can be deployed in managed easily and reliably in customer-managed environments.
A lot of the products today have similar evaluations and metrics, but they either offer a SAAS solution or require some onerous integration into your application stack.
Because we started w/ the enterprise first, our goal was to get to value as quickly and as easily as possible (to avoid shoulder-surfing over zoom calls because we don't have access to the service), and think this plays out well with our product.
We based our hallucination detection on "groundedness" on a claim-by-claim basis, which evaluates whether the LLM response can be cited in provided context (eg: message history, tool calls, retrieved context from a vector DB, etc.)
We split the response into multiple claims, determine if a claim needs to be evaluated (eg: and isn't just some boilerplate) and then check to see if the claim is referenced in the context.
You're not wrong but suffering isn't comparative. Because it's easier for someone to bounce back or have support in the transition doesn't mean it still doesn't suck.
This would be a shocking opinion, if we weren't in unprecedented times.
I wish we had more empathy and be kinder to people going through rough times, regardless of their wealth or position, or the duopoly they work for even, but it's also hard to completely ignore when the effect and impact is so huge.
Now, if you it makes you physically ill, I also wish you either find help or can get out of the situation you're in. Sincerely.
Anthropic does a good job of breaking down some common architecture around using these components [1] (good outline of this if you prefer video [2]).
"Agent" is definitely an overloaded term - the best framing of this I've seen is aligns more closely with the Anthropic definition. Specifically, an "agent" is a GenAI system that dynamically identifies the tasks ("steps" from the parent comment) without having to be instructed that those are the steps. There are obvious parallels to the reasoning capabilities that we've seen released in the latest cut of the foundation models.
So for example, the "Agent" would first build a plan for how to address the query, dynamically farm out the steps in that plan to other LLM calls, and then evaluate execution for correctness/success.
This sums up as ranging from multiple LLM calls to build a smart features to letting the LLM decide what to do next. I think you can go very far with the former but the latter is more autonompus in unconstrained environments (like chatting with a human etc.)
Neat article - I know the author mentioned this in the post, but I only see this working as long as a few assumptions hold:
* avg tenure / skill level of team is relatively uniform
* team is small with high-touch comms (eg: same/near timezone)
* most importantly - everyone feels accountable and has agency for work others do (eg: codebase is small, relatively simple, etc)
Where I would expect to see this fall apart is when these assumptions drift and holding accountability becomes harder. When folks start to specialize, something becomes complex, or work quality is sacrificed for short-term deliverables, the folks that feel the pain are the defense folks and they dont have agency to drive the improvements.
The incentives for folks on defense are completely different than folks on offense, which can make conversations about what to prioritize difficult in the long term.
These assumptions are most likely important and true in our case, we work out of the same room (in fact we also all live together) and 3/4 are equally skilled (I am not as technical)