Those 2019-2020 models are absolute trash. I don’t know what happened. My 2016 MBPro smokes the few we have bouncing around at work. They started falling apart like year 3, and my MBPro was the first iteration of their newer builds with the butterfly keyboard/non-optional Touch Bar!
The problem isn't that it can't write good code. It's that the guy prompting it often doesn't know enough to tell the difference. Way too many vibe coders these days who can generate a PR in 5 seconds, but can’t explain a single line of it.
That’s 100% the trick to it all. I don’t always write code using LLMs. Sometimes I do. The thing that LLMs have unlocked for me is the motivation to put together really solid design documentation for features before implementing them. I’ve been doing this long enough that I’ve usually got a pretty good idea of how I want it to work and where the gotchas are, and pre-LLMs would “vibe code” in the sense that I would write code based on my own gut feeling of how it should be structured and work. Sometimes with some sketches on paper first.
Now… especially for critical functionality/shared plumbing, I’m going to be writing a Markdown spec for it and I’m going to be getting Claude or Codex to do review it with me before I pass it around to the team. I’m going to miss details that the LLM is going to catch. The LLM is going to miss details that I’m going to catch. Together, after a few iterations, we end up with a rock solid plan, complete with incremental implementation phases that either I or an LLM can execute on in bite-sized chunks and review.
The LLM isn’t contributing garbage, the user is by (likely) not testing/verifying it meets all requirements. I haven’t yet used an LLM which didn’t require some handholding to get to a good code contribution on projects with any complexity.
Is it inevitable though? Open-weight models large enough to come close to an API model are insanely expensive to run for con/prosumers. I'd put the “expensive” bar at ≥24GB since that's already well into 4 digits, which gives you quite many months of a subscription, not including the power will for >400W continuous.
Color me pessimistic, but this feels like a pipe dream.
A decent amount of software developers and gamers do spend 3000 USD on a PC. That kind of hardware is going go get more and more capable over time wrt genAI models.
Of course there will always be a gap to frontier closed hosted models. It is not an either or proposition.
In UI, I’m pretty sure that replacement is already here. We’ll be lucky if at least backend stays a place where people still care about the actual source.
I'd say the opposite, the frontend code is so complex these days that you can't escape the source code.
If you stick to tailwind + server side rendered pages you can probably go pretty far with just AI and no code knowledge but once you introduce modern TS tooling, I don't think it's enough anymore.
LLMs really are stunningly good at finding vulnerabilities in code, which is why, with closed-source code, you can and probably will use them to make your code as secure as possible.
But you won't keep the doors open for others to use them against it.
So it is, unfortunately, understandable in a way...
I'm not a security expert but can't close source applications be vulnerable and exploited too? I feel like using close source as a defense is just giving you a false sense of security.
Finding a vulnerability in a black box is drastically different from finding one in a white box. This isn’t about whether there is a vulnerability or not, but about the likelihood of it being found.
It's a meaningful difference for SaaS. Most likely an attacker doesn't have access to your running binary let alone source code, and if they probe it like a pentester would it will be noisy and blocked/flagged by your WAF.
What is being phrased as obscurity is one of the approaches to security as long as you are able to keep the code safe. Your passwords, security keys are just random combination of strings, the fact that they are obscure from everyone is what provides you the security
Decompilation and you are back to the level of security you started with. OpenSSH is open for a good reason. Please acknowledge your error. Are you AI?
OTOH, their position seems to be "many LLMs make shallow bugs" is unhelpful; same as many eyes make shallow bugs considered unhelpful.
What seems genuinely needed by the open source economy to both surface these latent vulns and tamp down finding-slop is a new https://bughook.github.com/your/repo/ that these big LLMs (Mythos, etc.) support. Mythos understands if it's been used to find an vuln, and back end auto-reports verified findings the git service can feed to a Dependabot type tool.
Even better, price up Mythos to cover running a background verifier that gets the project, revalidates the issue, before that bughook.
Meanwhile, train it on these findings, so its future self doesn't create them.
LLM like humans can find vulnerabilities in black boxes. We already established 30 years ago that open source is usually more secure than closed source and that security by obscurity doesn't work.
What a nightmare. ‘Mad hot’ even on… just being alive.
reply