What not just record the conversation?
If contains all that is needed. The initial at large scoping, the failed attempts at doing x and not y, how that specific line of code solves that specific edge case, etc.
When it’s time to review, review both code and conversation.
200 “user written messages asking why and what”? Likely a good PR.
15 “yes, yeah, ok, whatever”? Well you might want to give that PR some love.
It feels to me that when we commit, we throw away half, if not most, of the work done by not recording it.
I agree with you overall, yet there’s one flow that works for me.
Instead of speccing out a feature, I let PMs vibe code it.
I then have the exact reference I need to build.
Maybe LLMs oneshotted the right way, maybe it needs fixes, maybe some fundamentals are misunderstood, in any case it’s easier for me to know what I need to build, for the PM to be aware of some limitations (LLMs do the job of pushing back and explaining) and overall for us to have to the point conversations.
It is somewhat orthogonal to what you say, when you focused on dev seniority, so that part stands true.
But I think “PMs armed with an LLM” can, when properly used, add a lot of value to the dev process.
> I agree with you overall, yet there’s one flow that works for me. Instead of speccing out a feature, I let PMs vibe code it. I then have the exact reference I need to build.
Like BDD, but with something more accessible than Cucumber. I'm totally here for that.
It would be nice if people also committed their initial prompt and chat session with the LLM into their codebase. From a corporate standpoint, having that would be excellent business logic as code, if the code is coming from a PM or a stakeholder on the business side of the house. From an engineering standpoint, it would be an excellent addendum to the codebase's documentation.
FWIW, BDD and frameworks like Cucumber don't work at all in my experience. The people that'd need to fill these out don't do it properly (they can't) and then we, devs, are stuck with brittle and un-debuggable stuff that's worse than if we just used regular code to encode what we understood from them.
It's the same reason (most) PMs armed with an LLM still won't get anything usable done. They can't do it properly. They still need devs. But the gaps are shrinking. Some few PMs can get stuff done w/ both Cucumber, could wireframe UX with previous tools and can now do so much easier and better with an LLM.
It would be nice if people also committed their initial prompt and chat session with the LLM into their codebase
I doubt you'd want this. It's a chat session for a reason. It's gonna be huge wall of text, especially if you meant to actually include all the internal prompting the LLM did while it was working. You'd also have all my "no dude, stop bullshitting me! I told to ignore X and use Y and to always double check Z and provide proof".
It would only "work" if every single piece of feature you wrote was 100% written by the LLM from a single, largish and well defined prompt, the LLM works for a few hours and out comes the feature. And even then you have no reproducability (even if you turned around and gave it to the exact same model, no retraining, newer model, system prompt etc.).
There are ways to play around the single wall of test issue.
Mostly, git lfs.
When it comes to “no dude stop etc etc” … that is valuable information. You can extract that and put down rules for agents so that you stop repeating it each time.
Same can be done at PR, so that you can review not just the code but also how you got there.
It’s trivial to go from session to a nicely polished html with side by side conversation.
If you want to try, username at gmail, I have a private repo with it running.
I value critics, sorry for the plug ;)
Oh, on the different models side, i don’t see the advantage of reproducibility, or better, I don’t think I understand what you mean, can you help me see it?
I don't understand how "wall of text" is related to git large file support. The wall of text is a problem for me, the human. Sure, there are ways, like "be brief", caveman etc. In a large repo with lots of different people over time, I can't see how it won't just be wall of text again. It's just too much. TL;DR. And coz DR, the LLM will have buried bullshit in that text, which future session might read and "believe".
As for "no dude", no that can't be put down into rules. Not all of it anyway. We have stuff encoded in the repo wide md file, I have my personal one etc. and the various agents still don't do what we tell them to in all cases or a new model comes out and it no longer works. For example, for finding the root cause of a bug, it's very important to have actual proof and references. It's getting there w/ my instructions in the .md but it doesn't always work and I do have to "dude" it from time to time.
Is that back and forth valuable to have in files that are going to be part of the repo? I very highly doubt it. Having new rules that came out of the back and forth in a checked in AGENTS.md, sure, that is valuable. Or nowadays in individual "skills", like a "root cause finder" skill that can have very specific instructions about being thorough in proving its "found the smoking gun" BS ;)
I've seen enough PR descriptions created by the agent. Fluffy wall of text that looks good but is factually wrong. Seen it way too many times. Too many people just look at whether it looks good and then pass it off as truth. I'm tired of it and making that into "nice HTML" doesn't make it better. It just makes it look even nicer but not more true.
Re: reproducibility. My parent poster (and I guess you as well) wanted to have the prompt/conversation as "documentation". I don't see why that would be helpful. The only reason I could see would be for "reproducibility", which you won't get with an LLM. I don't see why else, but do tell me.
What I can agree could be valuable are the "why"s. I.e. the stuff that already should have been part of the ticket/requirements document. If you want to store that inside the repo as text files, instead of the original tickets or documents, that's fine of course. But I don't see how a "recording of how the code came to be" is valuable. It's like having a recording of all my IDE keystrokes and intermediate code state in pre-LLM days. Not valuable. What's valuable are the requirements and the outcome (i.e. code). Not "the thing in between".
Now don't get me wrong. Recordings of how people code/use their IDE can be a valuable teaching tool. Both as good and bad examples. And the same can be true for an agent coding session.
i misunderstood "wall of text" (i was thinking about bloating repo with it), my solution to understanding is just to create ad-hoc tools to parse the json
i coded a web ui with simple toggles: show me what user said, what llm said (nice to see what I was thinking about, nice to see how LLM came up with solution X, you get tools calls, maybe it found something i didn't think about or viceversa)
you can search/grep (.ie: did i consider idempotency when i build feature X? open session, search/grep idempotency)
you can, up to some point, resume the conversation (yes i know, cache busting makes some usages of this impractical, but in general resuming and asking "when we did this, did we think about that" tends to work... let's say that research is ok, time travel, meh)
overall, one of the advantages of LLMs is to be able to direct then ad data for insights, via standard CLI tools, via specific prompts, or building some mini tools (yeah vibe coding is fine sometimes)
whatever my question, if i have data i can have an answer
LFS helps with the second aspect (buried bullshit). Unless you smudge, you have a pointer, and that is just 3 lines.
You need to learn some ergonomics, but ok, some of us learnt how to use Jira XD
Taking your position a bit further, yes, committing chat sessions implies that you also need to review them so that bullshit doesn't filter through. Milage varies based on your personal preferences, which project you are working on, and many more heuristics.
Some will find it boring, some will think it's good project maintenance, all should be able to find a way to handle this based on their preference.
It is also nice to pointout that cleaning bullshit doesn't need to happen at merge. LFS blobs being stored separately, you can have side flows helping you out, without clogging yoou CI pipelines.
"no dude"-> rules
you can put down SOME rules
usually this happens to me at PR. I am tired fo saying "you should always check X", so i bolt it down "someplace".
I am running the usual motions i suppose most of us try to adopt: put this down in agents.md, in folder x or y, in path-scoped rules using agents, in memory files (i am exporting/importing those too), in subagents that review code before PRs.
in the end it's an unsolved problem at large, but
1. hopefully it will get better, my feeling is that it's just a cambrian explosion, and the fittest will survive... (also, owning the harness should help, i suppose .. i use claude code :D )
2. in a team, having personal styles surface is valuable. "dude don't do that" is quite often .... design. When rules go in the repo, at least we can find an agreement in person, and at PR is not about linking a document you read at onboarding, but finding out why the agent did not respect the rule. To me, that is more grounded in a tool.
3. rules are ... not static? We change our minds? We get better at things? We want to experiment? I am not advocating for a perfect rule system that replaces me, but for a good enough one that removes cruft from my daily job.
I think my approach is actually helpful when it's time to find root causes (YMMV). Via tools that parse sessions, you can see when that specific portion of code has been written with a better granularity. During that bit of the conversation the user was worried about X and asked AI to do Z and AI read this and that file and "thought" this and that and wrote that piece of code.
Maybe the user was making wrong assumptions, maybe the LLM did not read the correct files or instructions, in any case you have a better tool for investigation.
It's up you to decide wether to use it, wether this lead to just solving the bug or also fixing instructions, etc, i am just saying that it actually helps to have some measure of the context on which this change made sense.
"Fluffy wall of text that looks good but is factually wrong. "
it might be good or bad, right or wrong, but what is in the sessions is the truth of what happened.
PR desc are horrible, i share that feeling with you, but having the story of how that thing happened is just not the same as "final summary of what we did in the past X hours".
As a sideline: LFS doesn't really pollute your repo once you get to learn its ergonomics. Having chats in LFS also lets you approach this
Reproducibility...
To me those conversation are basically the history for decisions taken while implementing. They are documentation.
The real problem with docs was that no-one has ever liked writing them, nor it was easy to implement a standardization around them.
If you just record/log, there';s no extra effort needed, and once there tools and LLMS are pretty good at helping us extract insights.
I am also assuming, there is a correlation between quality in the conversation and in the code. I know, i'm being hand-wavey, but overall i think critical thinking is what makes code better, and being able to see if/which it has been applied can be a good proxy.
I ask for forgivness already: I not going down the rabbit hole of quantifying quality etc. It's a broad statement that should be taken with a grain of salt.
If you want to go abstract, you can think of coding as going from thoughts to 0-1 in bits. We have high(er) level languages that help us organize thoughts to help us so that we can better keep them in our cognitive flow/load.
LLMs are an upper layer, that scrambles the code and make it more difficult to grasp.
But the reasoning behind the code is now available, and quite easy to parse.
I think this is the core point to me.
Code is an intermediate artifact between thinking and bits.
Now we have a second artifact: the conversation/decision that led to that code.
Why are we not storing it?
disclaimer:
I am, of course, mildly in love with my own project and ideas, so possibly i like this too much just because i built it. IKEA effect or whatever.
US dependency did bring a lot of value to a lot (albeit not all) of Europeans in past, specifically 1938-1988.
If you were born, raised and lived in that timespan, you might have developed a deep seated and hard to break habit to rely on that dependency for security and lifestyle/wealth.
Also, that same lifestyle is based on ignoring externalities applied to commons and/or events happening “somewhere else”, even when factually proven.
Little wonder and tiny bit ironic that the same principle has embedded itself so deeply, that it holds true even when the damage is inward, just a few indirections away.
On your side, yes, I think that “people in Europe” intuitively understand that, it just needs time to blossom.
The reputation/trust damage self inflicted by the current US administration is triggering a pushback that will expand into the future.
As a point in case, it will lead to reconsidering assumptions on habits that many generations of US businesses and diplomats have built.
Many in this thread point at difference instances of services that should be decoupled.
Connecting the dots, the larger picture looks painfully obvious to me: Silicon Valley never was a partner to be trusted, and certainly not after they built or bent every business to rely on an ad ecosystem that exploits users.
That original sin, on which a huge portion of Wall Street rests, is now at the center of discussions.
Hence, the EU will build tools to address this because it has to, but consumers will flock to them especially from the US, since at this point no one can trust SV companies on data privacy (since Snowdens at least), no one can trust the US administration to protect citizens (since Trump at least), and about half of the US is scared about what’s going on deeply enough (the emotional push needed to break the habit).
They will move their data it the EU (where else? China?).
This will be compounded by the fact that everyone tries to build better LLMs and to get AGI, while forgetting that LLMs work on data pipelines.
> The reputation/trust damage self inflicted by the current US administration is triggering a pushback that will expand into the future.
This barely even seems like the relevant part. If Google was founded in Japan and Apple in Brazil, it would still be foolish to entrench them as a dependency. It would barely even be better to do it with a local company.
> They will move their data it the EU (where else? China?).
This feels like hopium. Network effects are powerful and as long as the internet is actually global, there are really only two options: 1) Centralized megacorps, and then the US ones have both the US apparatus behind them and the incumbency advantage, or 2) open protocols where no corporation of any nation is a gatekeeper.
So for Europeans to get the hooks of the US incumbents out of them, their best chance by far is the second one, and that one is also mostly to the advantage of the Americans who aren't the existing incumbents, which is why it works. Start making phones with open hardware and social networks with open protocols and you can get people outside of your own country to use them because they don't much like the incumbents either, and that's how you reclaim the network effect. Try to clone the US megacorps without the US apparatus to get them established in other countries and they don't because they're wary of foreign central control, which in turn means you don't get the network effect and you lose.
But then it's not so much that data ends up in "the EU" as that it's on your own device and then backed up or distributed as encrypted chunks in a distributed network which isn't tied to any specific jurisdiction.
Relying on open protocols to make all the difference is much more potent hopium than what GP wrote.
Open protocols are kind of thing techies do when in cooperative mode, when industry isn't looking. But this is not this kind of problem - this is an economic, geopolitical problem. It's not about your local school moving off Windows to Linux, it's about the European corporations moving off Azure to some other cloud solution offered by European corporations (do we even have any?).
I'll grant it, the turmoil of such transitions is a perfect moment for pushing for open protocols, federated solutions, etc. - the industry is distracted, there's more space to sneak in some good solution before everyone notices, and EU has cultural and political tradition of pushing towards FLOSS (even if largely just as an alternative to Microsoft) and associated values/memetic complex. But open anything won't save the day - more corporations will.
It's a blind spot for some software folks, because they forget that FLOSS is an exception here; everything else in the real world - including computing hardware and supporting power and network infrastructure - plays by rules of market economy, with proprietary solutions and clear structures of ownership.
It makes no sense to try and fight this here - but it does make sense to go along with the flow and improve things by pushing for more globally optimal solutions, especially that EU is known to be favorable to using openness in protocols and standards as a policy vehicle, both internally and externally.
> It's not about your local school moving off Windows to Linux, it's about the European corporations moving off Azure to some other cloud solution offered by European corporations (do we even have any?).
But why is it about that? Why isn't it about e.g. governments in Europe funding the development of Linux virtualization so that it's simple to buy some hardware, put it in the back office and have an interface to it which is as easy for people to use as the incumbent cloud providers?
The vast majority of companies don't need "flexible scalability" etc. They have modest and finite loads and only ended up "in the cloud" because for ten seconds it seemed like having 100 VMs in the cloud was going to be a lot cheaper than having 100 physical servers, until it turns out that you can put those 100 VMs on two physical servers in your own possession and it costs less to do that than the cloud providers charge and then you keep control of your data and infrastructure.
> everything else in the real world - including computing hardware and supporting power and network infrastructure - plays by rules of market economy, with proprietary solutions and clear structures of ownership.
This is pretty wrong. Hardware companies sell hardware. A lot of them will try to lock you into their shitty software if you let them, but that is neither required nor desired. And some of the better ones don't, e.g. there isn't that much lock-in happening with AMD or Intel servers. We just need that to be happening for phones. And smart hardware companies can fully understand "commoditize your complement" as being in their own interest while still making a profit selling the hardware that isn't locked to any particular software.
> It makes no sense to try and fight this here
It's not clear what you're even suggesting.
Suppose you want Europeans to have access to a phone platform that isn't controlled by an American megacorp.
If they release a domestic proprietary one then other countries won't want any part of it. They don't want to be under the heel of a European megacorp any more than an American one, and indeed many will be suspicious of it and actively try to thwart adoption. And then you lose the network effect and can't get traction.
Whereas if you do something like require phone hardware to allow the user to replace the OS, and then fund development of open source phone operating systems and make sure they're required to be supported within your jurisdiction, then they can easily spread outside of your jurisdiction because people aren't nearly as suspicious and oppositional to something where you've precommitted to not putting people on the enshittification treadmill. And then everybody gets out from under the thumb of those corporations.
great counterpoint! (no i'm not an LLM, it is a actually a crucial perspective)
i especially agree with
> But open anything won't save the day - more corporations will.
i am not advocating for a pure "open source will save the world"
there are just a few points i'd like you to consider, and hopefully give me insights i can learn from
* other than code, open source has also given us governance "experiments" capable of running critical systems. As another poster was mentioning, the risk is to fallback on "big corps", usually run by "big man", and we are back to zero. The hope? expectations? is that the open source governance ecosystem has tackled this space in enough dimensions to be able to build something over this.
I am looking specifically at the area around licenses (mariadb, redis, ...) and just overall governance frameworks, as in "deteach business ownership from ethical frameworks"
* in order to build anything this big/reliable, without megacorp budgets, you can just ... pay FLOSS? They are one of the 2 majorly screwed groups by the current SV setup (with PLENTY of cavaets,amongst them that SV is a huge open soure contributor)
The other one being content creators.
Slogan? "For this to succeed, you need the best coders and the best marketing departments in the world"
Looks to me like incentives are aligned towards them being available.
Talking broadly on a systemic level: details need refinement, and space beyond this single message.
* EU (the political instituion) desperately needs this. An innovative tech ecosystem (not startup, not product) driven by "european values" that puts them on the spot. Start with redefining it: there are no users, but citizens.
Something effectively out-innovating SV, not just trying to get on par.
The risk of "being bought out/copied" doesn't really apply, since (as I said in my original comment) the discriminator is existential: US companies cannot be trusted because they built the existing system.
Any attempt to block this (stop users from getting their data back) is going to be challenged by the EU (GDPR violations cannot be brought to court by citizens, only by nation's data authorities, which means a citizen gets big guns and doesn't ned to pay).
Also, go on and explain that to all you other (US and not) users.
* A EU cloud provider doesn't have to provide the same services an US provides. That would hardly be innovative.
You also don't need to focus on corporations. Provide data storage for citizens, that will be the basis to build a privacy focus cloud, and then business might want that.
There is a possible continuation into "advantages of storage&privacy based vs compute", that i skip.
But essentially, to me it seems that an open source, true, "give me back my data" business driven initiative has never been as actionable as now.
I short, such a project can make 2 bold statements
"We are more innovative than SV"
"We have better freedoms than the US"
> But then it's not so much that data ends up in "the EU" as that it's on your own device and then backed up or distributed as encrypted chunks in a distributed network which isn't tied to any specific jurisdiction.
100%
i launched into a long trajectory from the comment i was originally answering to, and stopped short
i think-of? dream-of? try-to-build? what you just said
my "in the EU" claim is mostly around legislation (EU art 8 vs US CLOUDS act vs vs China approach to citizen's data)
the legislation is there, since GDPR
it's a matter of tools
since corps built tools, they "forgot" to add the third button on cookie banners: "give me back my data" ... (and fourth: "delete it")
but the legal framework is there, as well as most of the tooling (google takeout, and so on from all other major players)
it's not that pipelines for moving data from US corps to inidividual do not exists, it's more that, up to now, whenever i was talking about "data rights" to people, even in tech, i got yawns back
now we have a "perfect storm": distrust towards US (administration, collpasing onto US businesses) + global uncertainty towards AI (where lots of people just perceive something happening but lack any tool that gives them control over it)
this is what i perceive as a tectonic shift that can be used innovatively, by EU businesses, hopefully leveraging open
for completeness, i have indeed wrapped "EU" as the spearhead for this, given the incentives to build it, but yes, central authority over this should live inside of each citizen nation framework (see, Japan and South Korea, both providing legal frameworks for data protection)
Focusing on SaaS rather than B2C.
A clear advantage of getting users build their own UI is that processes emerge as a consequence.
Specific roles within a team don’t use every feature in an UI, and often compose a series of actions in a workflow.
Letting them build the UI to aggregate and automate leads to being able to extract business knowledge in the UI as well as the reasoning the user has with AI about what to build.
Put that in a SaaS for an office and the outcome is the true representation of work being done in that office, plus clear signals about edge cases (aka “the user is not using his custom built flow, why?)
Stability etc can be handled post-hoc: once a customized ui proves some benefits (via user adoption, or whatever you think efficiently measures productivity gains), it can be formalized by a human coder, who gets the full picture and has all the domain knowledge baked in, as long as you don’t capture UIs only but also the reasoning that built it.
Back to article: smart to think this in terms of browser, since that crosses the boundary between SaaS
i too have discovered that feature chats are surely a winner (as well as a pre-requirement for parallelization)
in a similar vein, i match github project issues to md files committed to repo
essentially, the github issue content is just a link to the md file in the repo
also, epics are folders with links (+ a readme that gets updated after each task)
i am very happy about it too
it's also very fast and handy to reference either from claude using @
.ie: did you consider what has been done @
other major improvements that worked for me were
- DOC_INDEX.md build around the concept of "read this if you are working on X (infra, db, frontend, domain, ....)"
- COMMON_TASKS.md (if you need to do X read Y, if you need to add a new frontend component read HOW_TO_ADD_A_COMPONENT.md )
common tasks tend to be increase quality when they are epxpressed in a checklist format
i suppose, gradually and the suddenly?
each "fix" to incorrect reasoning/solution doesn't just solve the current instance, it also ends up in a rule-based system that will be used in future
initially, being in the loop is necessary, once you find yourself "just approving" you can be relaxed and think back
or, more likely, initially you need fine-grained tasks; as reliability grows, tasks can become more complex
"parallelizing" allows single (sub)agents with ad-hoc responsibilities to rely on separate "institutionalized" context/rules, .ie: architecture-agent and coder-agent can talk to each others and solve a decision-conflict based on wether one is making the decision based on concrete rules you have added, or hallucinating decisions
i have seen a friend build a rule based system and have been impressed at how well LLM work within that context
I'm commenting while agents run in project trying to achieve something similar to this.
I feel like "we all" are trying to do something similar, in different ways, and in a fast moving space (i use claude code and didn't even know subagents were a thing).
My gut feeling from past experiences is that we have git, but now git-flow, yet: a standardized approach that is simple to learn and implement across teams.
Once (if?) someone will just "get it right", and has a reliable way to break this down do the point that engineer(s) can efficiently review specs and code against expectations, it'll be the moment where being a coder will have a different meaning, at large.
So far, all projects i've seen end up building "frameworks" to match each person internal workflow. That's great and can be very effective for the single person (it is for me), but unless that can be shared across teams, throughput will still be limited (when compared that of a team of engs, with the same tools).
Also, refactoring a project to fully leverage AI workflows might be inefficient, if compared to a rebuild from scratch to implement that from zero, since building docs for context in pair with development cannot be backported: it's likely already lost in time, and accrued as technical debt.
When it’s time to review, review both code and conversation. 200 “user written messages asking why and what”? Likely a good PR. 15 “yes, yeah, ok, whatever”? Well you might want to give that PR some love.
It feels to me that when we commit, we throw away half, if not most, of the work done by not recording it.
reply