Hacker Newsnew | past | comments | ask | show | jobs | submit | dakolli's commentslogin

Its because our culture worships pieces of paper the government tells us is worth something.

Nope, people seek it out because government tells them to pay taxes _or else_.

Money is just a physical representation of the ability to get what you want. The problem is not money. It’s the fact that we live in a “me” society.

You're obviously just a loser trying to rage bait. But China outclasses the West on "brains" and hours worked by an order of magnitude. In <10 years they've started entire industries where the west had half a century of r&d as a moat and have reached parity with the West in almost all cases in less than a decade.

It was a rhetorical question mate.

Half of the engineers at openAI and Anthropic are Asian.


You do understand that China copied the R&D from west. It cannot be that China invented all the technology in < 10 years where other countries had to research for decades. One example is that Germany was the leading research in solar panels, but China was able to replicate and mass produce it, but without initial German investment on r&d, it would have taken decades longer for China to obtain that technology

Okay dude, whatever makes you feel good. I don't want to interrupt your cope, you're right, China can't compete with the West.

I don’t care if China can compete with the west or not. I was just stating the facts.

The only real world task benchmark I know of is Scale Labs RLI

https://labs.scale.com/leaderboard/rli

Its clear to me these models are useless on any real world task, a 4% pass rate on $20-30/hr Upwork tasks. This whole trend of agentic engineering is a giant money grab.


Missing some recent models on that list, but I think most crucially, the harness is fixed —- one of the major learnings of the last few months is that harness and eval (“looping” and support / tooling around it) is really critical. I would guess these numbers are the floor.

For instance, some of these tasks include creating videos, and one of the common reported failure mode is truncated videos, or not all videos being created. This sort of failure mode is currently best managed by an outer evaluation loop; no frontier model will, when managed by an eval loop, submit work like this right now.


> these models are useless on any real world task

I beg to differ. They are not perfect but immensively useful today.


Its just lies by OpenAI dude, these people are just trying to IPO so they can buy a 100m yacht.

One shot prompting/tooling is the only reasonable way to use an llm in my opinion. You should not be having an LLM operating for hours creating thousands of lines of new code that you can never review or maintain. You can actually be highly productive modifying a single file or two at a time, ideally as focused and little context as possible, without the llm being given full permission to add as much context as possible along the way to maximize revenue for the developers of the harness.

The agentic engineering paradigm is just a narrative trend pushed by AI companies to get people to 10x their token consumption per prompt. It plays into people's laziness and addiction to dopamine too causing addict like behavior in people that fall prey to this trend.


I disagree fundamentally.

If I do that, I'm literally slower then just doing the change without sufficiently specifying it to the model.

I can see how a junior dev or generally someone that's not particularly knowledgeable about the language or framework they're working with may benefit from such usage, but for experienced people there is very little value in that approach.

I say this because I've just had to face this decision this month with Copilot introducing the usage based billing. I attempted to scale back my usage, first with non-opus - output essentially became discardable as it continually hallucinated no existing fields in the responses of Apis etc... Then my scoping the changes smaller and smaller, until I ultimately gave up and reduced usage to just generating tests.


I agree. And at work it has been producing some of the worst GUI test cases I have ever seen.

What is tested often makes no sense at all, completely implausible edge cases are tested on internals, while it doesn't create tests for the overall application using user events.

And some things in these test cases are downright ridiculous: instead of instantiating your classes, it sets up some barebones fake objects reimplementing some of the behavior of your actual class, then ignores the TypeScript errors via force cast or similar.

Then it proceeds to slap some test ids on the output, stubs components and dependencies more or less randomly, adds some assertions on test ids and calls it a day.

Apparently that's good enough for many colleagues to open a MR for that garbage.

That said, at home with SOTA models I happily hand large units of work to it, outsource much of the thinking, and get workable results. I think this is the future.


I disagree, fundamentally.

I see little value in throwing a ton of context at an llm and waiting 10-20 minutes for a coin flip on whether or not its going to produce junk. I'd rather do quick 60 second turns, get most of the way there and fix the rest myself if I have to. I'd rather honestly just not use them.


Well the point was that id rather spend 30 seconds doing it myself then formulate a prompt with enough context for the model to implement it within 60 seconds. Also these numbers are unrealistic.

Everyone that I've ever interacted with and claims to prompt in "seconds" actually needs multiple minutes to think about the solution they want the model to implement - and then need twice as long to formulate that into a sentence which provides the model enough context to actually do that

So the more realistic estimates are "I'd rather spend the 2 minutes just implementing the minor change myself, instead of spending 1.5 minutes thinking about it, then 2.5 minutes writing the prompt and then waiting 1 minute for it to finish"


I would agree with all those points, and my numbers are a little off. I really just don't want to use any of it. I'm more excited about fast FIM autocomplete that works well, something like cursor tab without cursor. If something can increase my wpm and take strain off my fingers that would be nice. At this point latency and accuracy is terrible though.

The trick is to do something else in those 20 minutes (or, ideally, even longer).

That's the main value I've been getting out of coding agents. I have them do (comparatively) simpler tasks or explorative tasks in the background while I'm in a meeting, doing code reviews, or otherwise working on something else.


it wont happen, its all a money grab.

I think that LLMs will stay, but I also think we've plateaued and that big companies will fail and fall and we will have another years long "halt" of any real advancements coming to the public.

Similar to how ML was all the hype about 12 years ago and then it submerged again for a couple of years.


> we will have another years long "halt" of any real advancements coming to the public

One can hope. Probably an unpopular take here but I'm tired boss.

The software world has a huge backlog of things that can all be done with the tech we currently have, no breakthrough advancements needed, but none of it will get prioritized when we're all forced to run on the new and shiny treadmill. Ever since LLM hype its like the javascript culture of a new framework every 10 minutes has infected every other vertical of software development and I'm exhausted.


Yep, I see it.

This is probably the dumbest possible way to do it. Just buy tokens through open router and you could run it all month 24/7 at 100tps for practically nothing. There are tons of ways to pay for things without giving your personal information.

  100/s*month*(.14/million) = $37
$37 for the input tokens for Deepseek V4 Flash if you miss cache all the time.

A decent deal but Flash is quite dumb and you still have to pay for output tokens


Cool concept. but...

Vibe coders need to be forced to spend one day learning basic CSS before they're allowed to use an LLM to make a website and the internet would be a lot more pleasant as we move forward with slopification.. It doesn't have to be sloppy, and doesn't take all that much studying to at least be able to steer an llm in the right direction to make something look nice. At this point everything is just the same 3 colors and a centered flex column with weird spacing.


What subscription?

I mean't topup. They don't have subsciptions.

They'll get so much money, all the 60 year old billionaires in SF are so desperate not to die.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: