The idea of stateful models/interactions in an enterprise is extremely powerful. Is anyone aware of open source projects that have a similar goal? I'm looking for stateful conversations, with collaborative agent/skill refinement.
To head off the semantics debate: I don't mean a model rewriting its own source code. I'm asking about 'process recursion'—systems that analyze completed work to autonomously generate new agents or heuristics for future tasks.
-ish. I often keep md files around and after a successful task. I ask Codex to write the important bits down. Then, when I come around to a similar task in the future, I have it start at the md file. It's like context that grows and is very localized. It helps when I'm going through multiple repos at multiple levels.
I’m also doing similar with fairly decent results. AGENTS.md grows after each session that resulted in worthwhile knowledge that future sessions can take advantage of. At some point I assume it will be too big, then it’s back to the Stone Age for the new agents, in order to release some context for the actual work.
"We demonstrate that our IsoDDE more than doubles the accuracy of AlphaFold 3 on a challenging protein-ligand structure prediction generalisation benchmark, predicts small molecule binding-affinities with accuracies that exceed gold-standard physics-based methods at a fraction of the time and cost, and is able to accurately identify novel binding pockets on target proteins using only the amino acid sequence as input."
It seems like a key challenge here is not just creating a protein that will bind to a specific site, but also ensuring that off-target binding won't happen. Is this feasible? I'm not familiar with this space, but RefSeq [1] shows 442M proteins and the human protein atlas seems to only cover 17.4k [2]. Do we have comprehensive knowledge of human proteins that would allow us to identify off-site affinities?
For the impatient, here's a transcript summary (from Gemini):
The speaker describes creating a "virtual employee" (dubbed a "replicant") running on a local server with unrestricted, authenticated access to a real productivity stack—including Gmail, Notion, Slack, and WhatsApp. Tasked with podcast production, the agent autonomously researched guests, "vibe coded" its own custom CRM to manage data, sent email invitations, and maintained a work log on a shared calendar. The experiment highlights the agent's ability to build its own internal tools to solve problems and interact with humans via email and LinkedIn without being detected as AI.
He ultimately concludes that for some roles, OpenClaw can do 90%+ of the work autonomously. Jason controversially mentions buying Macs to run Kimi 2.5 locally so they can save on costs. Others argue that hosting an open model on inference optimized hardware in the cloud is a better option, but doing so requires sharing potentially sensitive data.
> The investor Jason Calacanis stayed in touch with Mr. Epstein after his 2008 conviction and three years later helped the financier contact a pair of Bitcoin developers, according to emails included in the documents.
Did Jason ever mentioned this in the episode, can you ask gemini?
I mean... If Jason Calacanis told me the sky was blue, I would be _checking_.
(At some point he seems to have gone from professionally-wrong-about-everything blogger to magical-podcast-thought-leader. I have no idea how this happened.)
The landing page for the demo game "Voxel Velocity" mentions "<Enter> start" at the bottom, but <Enter> actually changes selection. One would think that after 7mm tokens and use of a QA agent, they would catch something like this.
It's interesting, isn't it? On the one hand the game is quite impressive. Although it doesn't have anything particularly novel (and it shouldn't, given the prompt), it still would have taken me several days, probably a week, working nonstop. On the other hand, there's plenty of paper cuts.
I think these subtle issues are just harder to provide a "harness" for, like a compiler or rigorous test suite that lets the LLM converge toward a good (if sometimes inelegant) solution. Probably a finer-tuned QA agent would have changed the final result.
It's also interesting how the functionality of the game barely changes between 60k tokens, 800k tokens, and 7MM tokens. It seems like the additional tokens made the game look more finished, but it plays almost exactly the same in all of them.
I'd bet the initial token usage is all net new while the later token usage probably has reading+regenerating significant portions of the project for individual minor changes/fixes.
E.g. I wouldn't be surprised if identifying the lack of touch screen support on the menu, feeding it in, and then regenerating the menu code sometime between 800k and 7MM took a lot of tokens.
Is this from 2024? It mentions "With global data center demand at 60 GW in 2024"
Also, there is no mention of the latest-gen NVDA chips: 5 RNGD servers generate tokens at 3.5x the rate of a single H100 SXM at 15 kW. This is reduced to 1.5x if you instead use 3 H100 PCIe servers as the benchmark.
Have you experimented with all of these things on the latest models (e.g. Opus 4.5) since Nov 2025? They are significantly better at coding than earlier models.
reply