I worry most about blindspots with this kind of approach. Let's say that this repository goes on for years, at which point the docs folder is several MB in size. Would Codex be able to think outside of the box? Or would the aggregate of the Markdown content fundamentally cover enough ground to prevent it from thinking of novel new approaches to existing problems?
You tell it to update the docs: not append. I've done the same thing with a readme in the root with links to the docs. After every commit, before the push, I have my agent "update all relevant and related docs, add or remove what's needed" or something to that extent. And it works remarkably well. I also have an append only change log it's supposed to add to. Between that, good commit messages, and comprehensive testing, I've built a homebrew OS and updating it is remarkably smooth. Runs a homebrew FTP and HTTP server and can run Wolfenstein. Working on DOOM right now. Close, but sound has been difficult.
Someone else in the comments said to have it make a static website with the info instead with clickable pages and sections so it reads only the content it needs to rather than dumping a long file into context windows. Although I suppose you can have a ToC in the readme too with multiple smaller markdown files as references.
Yep. You’ve got to have it update the docs. After a few sessions, if I forget to request this, opus starts rehashing the same tasks and finds that they are complete - and sometimes still won’t update those docs unless I ask.
Another tip is to condense the doc files into the minimal required. Sometimes I’ll end up with 5 to 6 floating around in various states of staleness. Condensing to 2-3 and removing completed tasks seems to help a lot
It’s not a self coding machine. There is human in the loop, they even added MORE engineers to the team of this project! 7 engineers should be able to collaborate with the AI to find good solutions to problems.
Right — well-suited for what they do: scripts Claude generates to orchestrate subagents for a specific task at scale (audits, migrations, research). Each run completes; that's the design.
The schema here defines the team itself — roles, supervisor/sub-agent hierarchy, access policies, human gates, schedules — as a portable, reusable declaration. Run it once, on a schedule, or keep it deployed. Closer to a Dockerfile than a script.
Vendor-agnostic is part of it. Nothing in the spec ties to Claude or any specific runtime — Apache 2.0, anyone can implement a compatible runtime.
I feel that AI doesn't necessarily replace my thinking, but actually helps to explore deeper - on my behalf - alternative considerations in the approach to solving a problem, which in turn better informs my thinking.
I asked it to build me a weather machine and it's responding to me like a teenager.
AI:
Authentication successful! Go back to the webpage to continue.
Me:
Hey!
AI:
yooo whats good!! u got any weather stuff u wanna chat about or just saying hi?
Me:
how’s the weather in New York City
AI:
ohh nah i don't have real-time data so i can't tell u the actual conditions rn! but if u check like weather.com or just search "nyc weather" u'll get the live stuff instantly
tho if u tell me whats goin on there rn i can get HYPED abt the atmospheric conditions!! like is it humid?? what's the pressure doin??
reply