We're also not seeing much difference in real throughput at an agency. Everyone is getting decent results, output wise but it just doesn't seem to change the outcomes that much. There is also a mixed incentive at an agency, because a reduction in hours spent is a reduction in revenue.
It will be interesting to see how it all plays out, but I suspect if cost continues to increase and output only improves incrementally from here, that the cost will be the final decider rather than the competence.
I could see it being a thing we use only sometimes, for some things, but ultimately remain reliant on developers to get the work through the pipeline.
Very similar taste in games, and I had heard of it but wrote it off as a simple puzzle game, kind of in the mobile-game-esque throwaway genre. I must have been mistaken.
There is a limit to how far one needs to abstract personally.
I don't layer my utensils for example, because a spoon is fit for purpose and reliable.
But if I needed to eat multiple different bowls at once maybe I would need to.
For my personal use case, git is fit for purpose and reliable, even for complex refactoring. I don't find myself in any circumstances where I think, gosh, if only I could have many layers of this going on at once.
Even if you're working on one single thread of development, jj is easier and more flexible than git though. That it works better for super complicated workflows is just a bonus.
jj reduces mental overhead by mapping far more cleanly and intuitively to the way people tend to work.
This is a little weird at first when you’ve been used to a decade and a half of contorting your mental model to fit git. But it genuinely is one of those tools that’s both easier and more powerful. The entire reason people are looking at these new workflows is because jj makes things so much easier and more straightforward that we can explore new workflows that remove or reduce the complexity of things that just weren’t even remotely plausible in git.
A huge one for me: successive PRs that roll out some thing to dev/staging/prod. You can do the work all at once, split it into three commits that progressively roll out, and make a PR for each. This doesn’t sound impressive until you have to fix something in the dev PR. In git, this would be a massive pain in the ass. In jj, it’s basically a no-op. You fix dev, and everything downstream is updated to include the fix automatically. It’s nearly zero effort.
Another is when you are working on a feature and in doing so need to add a capability to somewhere else and fix two bugs in other places. You could just do all of this in one PR, but now the whole thing has to b reviewed as a larger package. With jj, it’s trivial to pull out the three separate changes into three branches, continue your work on a merge of those three branches, and open PRs for each separate change. When two of them merge cleanly and another needs further changes, you just do it and there’s zero friction from the tool. Meanwhile just the thought of this in git gives me anxiety. It reduces my mental overhead, my effort, and gives overburdened coworkers bite-sized PRs that can be reviewed in seconds instead of a bigger one that needs time set aside. And I don’t ever end up in a situation where I need to stop working on the thing I am trying to do because my team hasn’t had the bandwidth to review and merge my PRs. I’ve been dozens of commits and several stacked branches ahead of what’s been merged and it doesn’t even slightly matter.
Can you fly with stuff like this? I only wonder because of the battery setup. Very cool, I would personally use a regular track pad over the ball as I prefer as little mouse interaction as possible and it would stay out of the way better.
You are right I think, a lot of clients need a design as a practicality rather than a differentiating factor, they will be more than happy with a generic design output for next to nothing.
Many companies are not really based around their web design being a big factor in their business. Some very influential companies in my area look like they last updated their website in 1998.
It's just surprising, since it's objectively better to own the platform, and the company has a mind boggling amount of money, and allegedly coding agents capable of 10xing developer output. Why would they not be able to do it in house? It shouldn't be a capacity or capability issue.
That makes me think it's just another higher level money game, and there will be some weird investments in which neither company does anything of material value in exchange except spin some number wheels.
I definitely find your last point is true for me. The more work I am doing with AI the more I am expecting it to do, similar to how you can expect more over time from a junior you are delegating to and training. However the model isn't learning or improving the same way, so your trust is quickly broken.
As you note, the developer's input is still driving the model quite a bit so if the developer is contributing less and less as they trust more, the results would get worse.
> However the model isn't learning or improving the same way, so your trust is quickly broken.
One other failure mode that I've seen in my own work while I've been learning: the things that you put into AGENTS.md/CLAUDE.md/local "memories" can improve performance or degrade performance, depending on the instructions. And unless you're actively quantitatively reviewing and considering when performance is improving or degrading, you probably won't pick up that two sentences that you added to CLAUDE.md two weeks ago are why things seem to have suddenly gotten worse.
> similar to how you can expect more over time from a junior you are delegating to and training
That's the really interesting bit. Both Claude and Codex have learned some of my preferences by me explicitly saying things like "Do not use emojis to indicate task completion in our plan files, stick to ASCII text only". But when you accidentally "teach" them something that has a negative impact on performance, they're not very likely to push back, unlike a junior engineer who will either ignore your dumb instruction or hopefully bring it up.
> As you note, the developer's input is still driving the model quite a bit so if the developer is contributing less and less as they trust more, the results would get worse.
That is definitely a thing too. There have been a few times that I have "let my guard down" so to speak and haven't deeply considered the implications of every commit. Usually this hasn't been a big deal, but there have been a few really ugly architectural decisions that have made it through the gate and had to get cleaned up later. It's largely complacency, like you point out, as well as burnout trying to keep up with reviewing and really contemplating/grokking the large volume of code output that's possible with these tools.
Your version of the last point is a bit softer I think — parent was putting it down to “loss of talent” but yours captures the gaps vs natural human interaction patterns which seems more likely, especially on such short timescales.
I confusingly say both. First I say that the ratio of work coming from the model is increasing, and when I am clarifying I say “your talent keeps deteriorating”. You correctly point out these are distinct, and maybe this distinction is important, although I personally don‘t think so. The resulting code would be the same either way.
Personally I can see the case for both interpretation to be true at the same time, and maybe that is precisely why I confused them so eagerly in my initial post.
I almost exclusively use the royal We. "We are working on a new feature and we need it to meet these requirements...", "it looks like we missed a bug, let's take another look at.."
I also talk this way with people because I feel it makes it clear we're collaborating and fault doesn't really matter. I feel it lets junior memberstake more ownership of the successes as well. If we ever get juniors again.
I am not convinced that more context will be useful, practical use of current models at 1mil context window shows they get less effective as the window grows. Given model progress is slowing as well, perhaps we end up reaching a balance of context size and competency sooner than expected.
Stuff in more code. Stuff in more system prompt. Stuff in raw utf8 characters instead of tokens to fix strawberries. Stuff in WAY more reasoning steps.
Given the current tech, I also doubt there will be practical uses and I hope we’ll see the opposite of what I wrote. But given the current industry, I fully trust them so somehow fill their hardware.
Market history shows us than when the cost of something goes down, we do more with the same amount, not the same thing with less. But I deeply hope to be wrong here and the memory market will relax.
It will be interesting to see how it all plays out, but I suspect if cost continues to increase and output only improves incrementally from here, that the cost will be the final decider rather than the competence.
I could see it being a thing we use only sometimes, for some things, but ultimately remain reliant on developers to get the work through the pipeline.
reply