This was just a small experiment, but as a teammate and I watched our respective agents (each loaded with its owner’s context/preferences/etc) negotiate directly with one another to schedule and agree on lunch plans on our behalf, there was definitely a feeling that what we were watching was the tip of an iceberg.
Parents in my son's hockey league used to track shots on goal with pen and paper during games, to help coaches both during and after the games. It was tedious and the data was hard to use afterward.
Not a developer, so I used AI to build a free web app to replace it. You tap where on the rink each shot came from, tag whether it was a goal or save, and at the end of the game you get a full report with shot maps, heatmaps, and per-player breakdowns. Works on your phone, no account needed.
It's not fancy tech, but it's pretty rewarding to see it actually in use by the team and now even spreading to other teams in the league :)
First impression is love that! Need to study it more.
We need—or at least I need—a better UI/tool to manage the sequence of edits and collaboration, drafting, rubber-ducking, and evaluation that AI tools provide. Including the prompts and edits is a nice feature, though I would also like more comparison "where we started" vs "where we are now."
I wonder if an animation with the prompts/edits inline and the text morphing might be an interesting UI…fun stuff to think about. Contributions welcome! https://github.com/dvelton/trace
Maybe it’s just personal deja vu, but in current discussions of vibecoding v software engineering I keep seeing parallels from the debates 20 years ago when blogs began proliferating: the bloggers v “real journalism” arguments.
Have been curious what it could look like (and whether it might be an interesting new type of “post” people make) if readers could see the human prompts and pivots and steering of the LLM inline within the final polished AI output.
This hits home for me. Lawyer, not developer here. Implementation was never a hard part for me, it was an impossible part. Now that the time/cost needed to experiment with prototypes has dropped to near zero I've been been spending a lot of time doing exactly what you describe (steering, brainstorming). I find it fun but I do it mainly as a bunch of personal side projects. Can understand how it might feel different for users when the stakes are much higher (like when it's part of the day-to-day in a real job).
Experimented very briefly with a mediation (as opposed to a litigation) framework but it was pre-LLM and it was just a coding/learning experience: https://github.com/dvelton/hotseat-mediator
Cool write-up of your experiment, thanks for sharing. Would be interesting to see how results from one framework (mediation, whose goal is "resolution") differ from the other (litigation, whose goal is, basically, "truth/justice").
That's really cool! That's actually the standpoint we started with. We asked what a collaborative reconciliation of document updates looks like. However, the LLMs seemed to get `swayed` or showed `bias` very easily. This brought up the point about an adversarial element. Even then, context engineering is your best friend.
You kind of have to fine-tune what the objectives are for each persona and how much context they are entitled to, that would ensure an objective court proceeding that has debates in both directions carry equal weight!
I love your point about incentivization. That seems to be a make-or-break element for a reasoning framework such as this.
Didn’t expect to see this on HN. Thanks for checking it out though. Not a developer, just a lawyer at GitHub, this repo was me playing around with the GitHub Spark tool.
reply