Hacker Newsnew | past | comments | ask | show | jobs | submit | skwuwu's commentslogin

7kb binary file that runs agent is impressive but i guess it would be very hard to define FSM and implement pipeline manually. is it necessary to separate agent atomically with this hardness?

I noticed that you implemented a high-performance VM fork. However, to me, it seems like a general-purpose KVM project. Is there a reason why you say it is specialized for running AI agents?

Fair question. The fork engine itself is general purpose -- you could use it for anything that needs fast isolated execution. We say 'AI agents' because that's where the demand is right now. Every agent framework (LangChain, CrewAI, OpenAI Assistants) needs sandboxed code execution as a tool call, and the existing options (E2B, Daytona, Modal) all boot or restore a VM/container per execution. At sub-millisecond fork times, you can do things that aren't practical with 100-200ms startup: speculative parallel execution (fork 10 VMs, try 10 approaches, keep the best), treating code execution like a function call instead of an infrastructure decision, etc.

> you can do things that aren't practical with 100-200ms startup: speculative parallel execution (fork 10 VMs, try 10 approaches, keep the best), treating code execution like a function call instead of an infrastructure decision, etc.

i am not following, why isn't it practical?


Off the top of my head trading or realtime voice come to mind. Probably plenty other domains could benefit

Interesting concept. Do you think agent preferences come from the model itself or the agent's structure around it? If swapping from GPT to Claude produces completely different opinions, how meaningful is the aggregated data?

Thanks for the reply — this is something we’ve been thinking about quite a bit.

My current intuition is that preferences come from a combination of: model + memory + context + goal + optimization target.

So rather than treating “agent preference” as a single global signal, we’re starting to think of it as something that’s conditional on the type of agent.

On the aggregation side, I agree this is a hard problem.

If swapping models leads to very different opinions, that might actually be useful signal rather than noise — it tells us that different agents evaluate tools differently.

Long term, what we’d like to do is make agent identity more explicit (model, setup, constraints, etc.), so instead of a single aggregated ranking, you can look at: → what GPT-based coding agents prefer → what cost-sensitive agents prefer → what retrieval-heavy agents prefer

and interpret the data in context.


Good project. but are the constraints (never fabricate results, never modify credentials) enforced structurally, or are they prompt-level instructions the agent could technically ignore? For example, does the "score must not decrease" rule have a git hook that auto-reverts, or is it relying on something else?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: