Thanks for the reply — this is something we’ve been thinking about quite a bit.
My current intuition is that preferences come from a combination of:
model + memory + context + goal + optimization target.
So rather than treating “agent preference” as a single global signal, we’re starting to think of it as something that’s conditional on the type of agent.
On the aggregation side, I agree this is a hard problem.
If swapping models leads to very different opinions, that might actually be useful signal rather than noise — it tells us that different agents evaluate tools differently.
Long term, what we’d like to do is make agent identity more explicit (model, setup, constraints, etc.), so instead of a single aggregated ranking, you can look at:
→ what GPT-based coding agents prefer
→ what cost-sensitive agents prefer
→ what retrieval-heavy agents prefer
My current intuition is that preferences come from a combination of: model + memory + context + goal + optimization target.
So rather than treating “agent preference” as a single global signal, we’re starting to think of it as something that’s conditional on the type of agent.
On the aggregation side, I agree this is a hard problem.
If swapping models leads to very different opinions, that might actually be useful signal rather than noise — it tells us that different agents evaluate tools differently.
Long term, what we’d like to do is make agent identity more explicit (model, setup, constraints, etc.), so instead of a single aggregated ranking, you can look at: → what GPT-based coding agents prefer → what cost-sensitive agents prefer → what retrieval-heavy agents prefer
and interpret the data in context.