Re classifier routing: text-shape signals (token count, syntactic markers) underspecify the boundary, especially for agent-generated queries. The signal that worked better in our policy-gated tool-call setting was the surrounding intent context the agent was operating under, not the query string itself. An agent in a "fact-check" context emits long, well-formed sentences that actually want exact-match retrieval; an agent in an "open research" context emits surprisingly short queries that need narrative retrieval. If the runtime can read the tool or skill context at query time, routing on that is less ambiguous than text shape. Doesn't help if the wiki is a black-box MCP server with no caller-side context, but it's worth offering an optional context hint in the lookup payload.
The gateway approach (OAuth + RBAC) solves the perimeter problem — who can connect. protect-mcp solves a different layer — what can they do once connected, and how do you prove it.
It wraps any MCP server as a stdio proxy. Per-tool policies (block, rate-limit, require human approval). Every decision gets an Ed25519-signed receipt that's verifiable offline — no callbacks, no accounts.
The two layers stack: your gateway authenticates the caller, protect-mcp constrains which tools they can call and signs the evidence.
The staged autonomy pattern ("trust is earnable") maps directly to what we built with protect-mcp — shadow mode first (log everything, block nothing), then enforce when you've seen enough data to trust the policies.
For the prompt injection concern: protect-mcp wraps MCP tool calls with per-tool policies. Even if the agent gets injected, it can't call tools outside the policy. Every decision is optionally Ed25519-signed and verifiable offline.
This is exactly right. We implemented delegation receipts — Agent A grants scoped authority to Agent B, producing a signed receipt. B's subsequent actions reference A's delegation receipt. An auditor can trace the full chain from human principal to agent action.
The fiduciary analogy is spot on. Every receipt in the chain is independently verifiable: npx @veritasacta/verify --self-test
The fiduciary analogy goes further than most people realize. Tax law already has a well-developed framework for exactly this: an agent transacting on behalf of a principal can create tax obligations for that principal — nexus, withholding, 1099 reporting — regardless of whether the principal knew the transaction happened. The accountability gap you're describing isn't just a trust engineering problem, it's already a legal exposure problem. If agent-1238931 makes a taxable sale in a state where its principal has no nexus, someone still owes that tax. We haven't figured out who yet.