Is it the models tho? With every release (mutlimodal etc) its just a well crafted layer of business logic between the user and the LLM. Sometimes I feel like we lose track of what the LLM does, and what the API before it does.
It's 100% the models. Terminal bench is a good indication for this. There the agents get "just a terminal tool", and yet they still can solve lots and lots of tasks. Last year you needed lots of glue, and two years ago you needed monstrosities like langchain that worked maybe once in a blue moon, if you didn't look funny at it.
Check out the exercise from the swe-agent people who released a mini agent that's "terminal in a loop" and that started to get close to the engineered agents this year.
Its definitely a mix, we have been codeveloping better models and frameworks/systems to improve the outputs. Now we have llms.txt, MCP servers, structured outputs, better context management systems and augemented retreival through file indexing, search, and documentation indexing.
But these raw models (which i test through direct api calls) are much better. The biggest change with regards to price was through mixture of experts which allowed keeping quality very similar and dropping compute 10x. (This is what allowed deepseek v3 to have similar quality to gpt-4o at such a lower price.)
This same tech has most likely been applied to these new models and now we have 1T-100T? parameter models with the same cost as 4o through mixture of experts. (this is what I'd guess at least)
"A well crafted layer of business logic" just doesn't exist. The amount of "business logic" involved in frontier LLMs is surprisingly low, and mostly comes down to prompting and how tools like search or memory are implemented.
Things like RAG never quite took off in frontier labs, and the agentic scaffolding they use is quite barebones. They bet on improving the model's own capabilities instead, and they're winning on that bet.
So how would you go and explain how an output of tokens can call a function, or even generate an image since that requires a whole different kind of compute? It’s still a layer between the model which acts as a parser to enable these capabilities.
Maybe “business” is a bad term for it, but the actual output of the model still needs to be interpreted.
Maybe I am way out of line here since this is not my field, and I am doing my best to understand these layers. But in your terms you are maybe speaking of the model as an application?
The logic of all of those things is really, really simple.
An LLM emits a "tool call" token, then it emits the actual tool call as normal text, and then it ends the token stream. The scaffolding sees that a "tool call" token was emitted, parses the call text, runs the tool accordingly, flings the tool output back into the LLM as text, and resumes inference.
It's very simple. You can write basic tool call scaffolding for an LLM in, like, 200 lines. But, of course, you need to train the LLM itself to actually use tools well. Which is the hard part. The AI is what does all the heavy lifting.
Image generation, at the low end, is just another tool call that's prompted by the LLM with text. At the high end, it's a type of multimodal output - the LLM itself is trained to be able to emit non-text tokens that are then converted into image or audio data. In this system, it's AI doing the heavy lifting once again.
This is great! I’ve been diving deep into local models that can run on this kind of hardware. Been building this exact same thing, but for complete recordings of meetings and such because, why not? I can even run a low-end model with ollama to refine and summaries the transcription. Even combining with smaller embedding models for a modern, semantic search. It has surprised me how well this works, and how fast it actually is locally.
Hopefully we will see even more locally run AI models in the future with a complete package.
I’ve used h3 for a game. Since they align with an unique hex, I can ensure that one cell grid aligns and is placed on the same place in the world, where players could then compete on.
Yes, I agree totally. But portfolios served a great purpose to expose what you can and what you know to the world and recruiters. Which would then lead to a opportunity to talk about the projects.
We might be going back to a more oldschool approach where talking directly and presenting themselves would be more of a value again. It have always been an higher value, but now it will be kinda more forced I believe.
Another route would be that portfolios become more blog-based, talking about different solutions and problems for each project, as you are saying.
I agree, that would also require engineers to become more invested into core domain problems, which would then lead to more specialised skills (deeper, not broader). My guess is that not everyone actually likes this, but as for now most of the current state points to that direction.
I am kinda in the same boat (but I do not write articles), spending most of my free time either learning or developing.
Frankly, I love it. It makes me happy, so why change it? If I feel burnt out, I usually switch to something else for a short time, but I can mostly switch between reading/coding/watching tech influencers.
So if you do not feel unhealthy (exercise can always help to take a natural break anyway), keep on learning and developing :)
I actually foreshadowed this about 6 months ago when Lovable was starting to gain heat here in Sweden. It was quite obvious that for a start and prototype, these kind of applications are good. But if you are going full SaaS, hire an engineer to set the app in stone.
This might be a new branch of software developers on the rise. Those that take sloppy AI code and turn them into real applications for customers.
Looks clean and all, but is this “multitask on steroid” actually requested, or is it just another nice demo for how agents can perform tasks etc? I just feel stressed that I have multiple tasks ongoing that I would require to switch context between every time anything needs input or fails to deliver what I fundamentally requested.
This seems to be the wet dream of mgmt leader types. It's not enough that AI/LLM tools make tasks more efficient, or quicker - it's their people can then be just as busy with these tools.
My feeling is: buzz off, this is like orgs now where the quickest people just get more work piled on for little gain.
I gladly use tools that give me some breathing room, and gives me some time to think about improvements, the future, etc. But will actively drag my feet if mgmt starts mandating these tools be used to shovel more tasks, tickets, or busy work at me.
Interesting post, but this perspective seems to be the main focus, like all the time. I find this statement to be completely wrong usage of AI:
“This is especially noteworthy because I don’t actually know Python. Yes, with 25+ years of software development experience, I could probably write a few lines of working Python code if pressed — but I don’t truly know the language. I lack the muscle memory and intimate knowledge of its conventions and best practices.”
You should not use AI to just “do” the hard job, since as many have mentioned, it does it poorly and sloppy. Use AI to quickly learn the advantages and disadvantages of the language, then you do not have to navigate through documentation to learn everything, just validate what the AI outputs. All is contextual, and since you know what you want in high level, use AI to help you understand the language.
This costs speed yes, but I have more control and gain knowledge about the language I chose.
I agree 100%, but in this very specific case, I really just wanted a working one-off solution that I'm not going to spend much time on going forward, AND I wanted to use it as an excuse to see how far I can go with AI tooling in a tech stack I don't know.
That being said, using AI as a teacher can be a wonderful experience. For us seniors, but also and probably more importantly, for eager and non-lazy juniors.
I have one such junior on my team who currently speed-runs through the craft because he uses AI to explain EVERYTHING to him: What is this pattern? Why should I use it? What are the downsides? And so on.
Of course I also still tutor him, as this is a main part of my job, but the availability of an AI that knows so much and always has time for him and never gets tired etc is just fantastic.
Excellent insight, and that explains a lot of your decisions. Your junior example is a prime example of why AI can be such an awesome tool, just used correctly. Just awesome!