It's weird to see the expectation that the result should be perfect.
All said and done, that its even possible is remarkable. Maybe these all go into training the next Opus or Sonnet and we start getting models that can create efficient compilers from scratch. That would be something!
"It's like if a squirrel started playing chess and instead of "holy shit this squirrel can play chess!" most people responded with "But his elo rating sucks""
It's more like "We were promised, over and over again, that the squirrel would be autonomous grand master level. We spent insane amounts of money, labour, and opportunity costs of human progress on this. Now, here's a very expensive squirrel, that still needs guidance from a human grandmaster, and most of it's moves are just replications of existing games. Oh, it also can't move the pieces by itself, so it depends on Piece Mover library."
even a squirrel that needs guidance from a human grandmaster, is heavily inspired by existing games, and who can use Piece Mover library is incredible. 5 years ago the squirrel was just a squirrel. then it was able to make legal moves. now it can play a whole game from start to finish, with help. that is incredible
Any way you slice it: LLMs provide real utility today, right now. Even yesterday, before Opus/Codex were updated. So the money was not all for naught. It seems very plausible given the progress made so far that this new industry will continue to deliver significant productivity gains.
If you want to worry about something, let's worry about what happens to humanity when the world we've become accustomed to is yanked out from underneath us in a span of 10-20 years.
For reference, I use LLMs daily for coding. I do think they are useful.
I am speaking about corporations and sales tactics, because this VERY experiment was done by exactly such a corporation. How about you think about how "this whole thing works", and apply it to their post? What did they not write? How many worse experiments did they not post about to not jeopardize investments?
I don't find this impressive, because it doesn't do anything I'd want, anything I'd need, anything the world needs, and it doesn't do anything new compared to my personal experience. Which, just to reiterate, is that LLMs are useful, just not nowhere close to as world shattering/ending as the CEOs are selling it. Acknowledging that has nothing to do with being a luddite.
To be a bit pedantic, I'm not accusing you of being a Luddite. That would mean that you were fundamentally opposed to a new technology that's obviously more useful.
Instead, in my opinion you are not giving enough grace to what is being demonstrated today.
This is my analogy: you're seeing electrical demonstrations in front of your very eyes, but because the charlatans who are funding the research haven't quite figured out how to harness it, you're dismissing the wonder. "That's all well and good, but my beeswax candles and gas lamps light my apartment just fine."
It is very impressive indeed, but impressiveness is not the same as usefulness.
If important further features can’t get implemented anymore
The usefulness is pretty limited.
And usefulness further needs to be weighed against cost.
This is really questionable outcome. So you'll have your own custom OS riddled with holes that AI won't be capable of fixing because the context and complexity became so high that running any small bug fix would cost thousands of dollars in tokens.
Is this how tech field ends? Overengineered brittle black-box monstrosities that nobody understands because important thing for business was "it does A, B, and C" and it doesn't matter how.
But the Squirrel is only playing chess because someone stuffed the pieces with food and it has learned that the only way to release it is by moving them around in some weird patterns.
But people have been telling us for years that the squirrel was going to improve at chess at an exponential rate and take over the world through sheer chess-mastery.
>It's weird to see the expectation that the result should be perfect.
Given that they spent $20k on it and it's basically just advertising targeted at convincing greedy execs to fire as many of us as they can, yeah it should be fucking perfect.
A symptom of the increasing backlash against generative AI (both in creative industries and in coding) is that any flaw in the resulting product is predicate to call it AI slop, even if it's very explicitly upfront that it's an experimental demo/proof of concept and not the NEXT BIG THING being hyped by influencers. That nuance is dead even outside of social media.
AI companies set that expectation when their CEOs ran around telling anyone who would listen that their product is a generational paradigm shift that will completely restructure both labor markets and human cognition itself. There is no nuance in their own PR, so why should they benefit from any when their product can't meet those expectations?
Because it leads to poor and nonconstructive discourse that doesn't educate anyone about the implications of the tech, which is expected on social media but has annoyingly leaked to Hacker News.
There's been more than enough drive-by comments from new accounts/green names even in this HN submission alone.
It cannot be overstated how absurd the marketing campaign for AI was. OpenAI and Anthropic have convinced half the world that AI is going to become a literal god. They deserve to eat a lot of shit for those outright lies.
Maybe the general population will be willing to have a more constructive discussions about this tech once the trillion dollar companies stop pillaging everything they see in front of them and cease acting like sociopaths whose only objectives seem to be concentrating power, generating dissidence and harvesting wealth.
When we were trying to build our own agents we put quite a bit of effort on evals which was useful.
But switching over to using coding agents we never did the same. Feels like building an eval set will be an important part of what engg orgs do going forward.
Sorry off topic question but has Docker come up with a easy to use dev solution. I always end up with using Devcontainer: it solves the sandboxed, ready to use dev env.
But the actual experience with developing on VSCode with Dev Containers is not great. It's laggy and slow.
My one experience with dev containers put me off of dev containers... but standard `docker compose` is just great for me.
I worked at a company where we were trying to test code with our product and, for a time, everyone on the team was given a mandate to go out and find X number of open source projects to test against, every week.
Independently, every member of the (small) team settled on only trying to test repos where you could do:
get clone repo && cd repo && docker compose up
Everything else was just a nightmare to boot up their environment in a reasonable amount of time.
Really? I work across multiple vscode projects (locally), some use dev-containers and others don't. I have never noticed any difference in experience across the two.
I have also used them remotely (ssh and using tailscale) and noticed a little lag, but nothing really distracting.
No, on Windows it is very quick too. On WSL2 compiling Rust programs are almost as fast as Linux on bare metal. However the files need to live inside the Linux filesystem. Sharing with Windows drives actually compiles slower than native Windows.
If you are building natively, yes. However the original comment is about Dev Containers which runs under WSL2.
If you open a native Windows folder in VSCode and activate the Dev Container, it will use the special drvfs mounts that communicate via Plan9 to host Windows OS to access native Windows files from the Docker distro. Since it is a network layer accross two kernels, it is slow as hell.
First of all it supports containers natively, Windows own ones, and Linux on WSL.
Secondly, because Microsoft did not want to invent their own thing, the OS APIs are exposed the same way as Docker daemon would expect them.
Finally, with the goal to improving Kubernetes support and the ongoing changes for container runtimes in the industry, nowadays it exposes several touch points.
I am often frustrated by PDF issues such as how complicated it is to create one.
But reading the article I realized PDFs have become ubiquitous because of its insistence on backwards compatibility. Maybe for some things it's good to move this slow.
The article is wrong, the PDF spec has introduced breaking changes plenty of times. It’s done slowly and conservatively though, particularly now that the format is an ISO spec.
The PDF format is versioned, and in the past new versions have introduced things like new types of encryption. It’s quite probable that a v1.7 compliant PDF won’t open on a reader app written when v1.3 was the latest standard.
Haha something that I want to try out. I have started using voice input more and more instead of typing and now I am on my second app and second TTS model, namely Handy and Parakeet V3.
Parakeet is pretty good, but there are times it struggles. Would be interesting to see how Qwen compares once Handy has it in.
Now its mostly about models getting better.
reply