I’m working on a project that blocks agents from breaking rules. The rules are enforced through hooks and work across Claude Code, Codex, and GitHub Copilot.
Exactly! I have said this a couple of times but it was taken literally as in no capital letters or strong language. Glad to see someone else who shares this perspective.
Not throwing shade at anyone here but the thought has definitely crossed my mind that we are recreating SAFe but for agents when looking at some of the orchestration setups out there. I think that it is better to not force the same hierarchical processes that worked for humans in large organizations onto agents and instead look at what they need to give better results and what their failure modes look like.
I fully agree. Also started using husky before expanding further and created my own hooks. I can’t imagine myself using agents today without them, it would require a lot of babysitting.
how do you verify the work that was just done in the current stage? verify against the output artifacts from the previous stages. for example, if you have a requirement doc, then you can analyse the codebase for current state, and store as a doc. then generate the implementation plan based on the delta between requirements and current state. after implementation, create an implementation summary doc. to verify the implementation in the next stage, compare the implementation summary against the implementation plan, the previous codebase analysis and the original requirements doc, as well as codebase diffs.
so, every stage outputs a source of truth for that stage, which can be used by later stages for verification, alone or together with other artifacts. if you want to read more, here's the recursive-mode development workflow I built: https://recursive-mode.dev/introduction
Exactly! I don’t babysit TDD anymore. I have another agent that does that for me and honestly sometimes catches things I would have missed if I was the babysitting.
Hooks do wonders here. The payload contains a lot of information about the pending action the agent wants to make. Combine that with the most recent n events from the agent’s session history and you have a rich enough context to pass to another agent to validate the action through the SDK.
This way the validation uses the same subscription you’re logged in to, whether you’re using Claude Code, Codex, or Copilot. The validation agent responds with a json format that you can easily parse and return, allowing you to let the action through or block it with direction and guidance. I’m genuinely impressed by how well this works considering how simple it is.
If you’re interested in such deterministic scaffolding/control flow, check out Probity.
I created it to address this exact issue. It is a vendor-neutral ESLint-style policy engine and currently supports Claude Code, Codex, and Copilot.
It uses the agents hooks payloads and session history to enforce the policies. Allowing it to be setup to block commits if a file has been modified since the checks were last run, disallow content or commands using string or regex matching, and enforce TDD without the need of any extra reporter setup and it works with any language.
Creator of TDD Guard here, thanks for the mention!
TDD Guard was built when Claude Code was the only one to offer hooks. Plugins didn't exist and the models were weaker, so the validation context and instructions took more work to get right. This is why it ended up requiring test reporters for different languages.
I have started a new project that does the same TDD enforcement, also through hooks, but without reporters. It works with any test runner, and it is vendor-agnostic, it works with Claude Code, Codex, and GitHub Copilot. The validator also sees recent session history which helps it handle cases like refactoring better.
The TDD instructions are still pretty basic compared to TDD Guard's, which have been dogfooded for a year. One thing I noticed while testing across agents is that some follow TDD a lot better than others, Codex struggled the most with the basic instructions.
I twice worked in a teams where we did not use branches (or PRs). Both were working like that when I joined them.
The first was because we were svn (and maybe even csv before that, but I cannot remember) and that did not support branching easily. That team did switch to git, which did not go with its some struggles, and misconceptions, such as: "Never use rebase."
The second team was already working without branches and releasing a new version of the tool (the Bond3D Slicer for 3D printing) every night. It worked very well. Often we were able to implement and release new features within two or three days allowing the users to continue with their experiments.
When after some years the organization implemented more 'quality assurance' they demanded that we would make monthly releases that were formally tested by the users, we created branches for each release. The idea was that some of the users would test the releases before they were official released, but that testing would often take more than a month, one time even three months, because they were 'too busy' to do the formal review. But at the same time some users were using the daily builds because these builds had the features implemented that they needed. As a result of this, the quality did not improve and a lot of time was wasted, although the formal quality assurance, dictated by some ISO standard, was assured.
I have no experience with moving away from using branches. It might be a good idea to point your manager/team lead/scrum master to dora.dev or the YouTube channel: https://www.youtube.com/@ModernSoftwareEngineeringYT
https://github.com/nizos/probity
reply