Hacker Newsnew | past | comments | ask | show | jobs | submit | xvector's commentslogin

Spent a lot of time with "open models." None of them come close. They are benchmaxxed. But you won't hear many of the open model fans on HN admit this.

The open model mentality is also just so bizarre to me. You're going to use an inferior model to save, what, a couple hundred bucks a month? Is your time really worth that little?

No one working on a serious project at a serious company is downgrading their agent's intelligence for a marginal cost saving. Downgrading your model is like downgrading the toilet paper on your yacht.


> The open model mentality is also just so bizarre to me. You're going to use an inferior model to save, what, a couple hundred bucks a month? Is your time really worth that little?

I agree that people who claim that open models are as good as claude/openai/z are lying, delusional, or not doing very much. I've tried them all, included GLM 5.1.

GLM is not bad but the hardware needed will never recoup the ROI vs just using a commercial provider through its API.

That being said, you're being reductive here. For many use cases local models offer advantages that can't obtained through a commercial API : Privacy, ownership of the entire stack, predictability. They can't be rugpulled, they can't snitch on you. They will not give you 503.

Those advantages are very valuable for things like a local assistant, as an agent, for data extraction, for translations, for games (role playing and whatnot), etc.

That being said I know that many people are like you, they don't give a second thought about privacy. They'd plug Anthropic to their brain if they could. So I understand the sentiment. I just think that you should in turn try to understand why someone would use an open model.


Glm 5.1 getting 5% on ARC-AGI 2 private is all anyone needs to know.

The cost is so small relative to the increase. The cost whining on HN is bizarre to me. Feels like everyone here is on an individual plan and has no understanding of what margins look like for actual business.

Meta pays $750k+ TC and makes far more profit/eng, do you think they care about $5k/eng/mo in inference? A 1.1x increase would be so significant that it would justify the cost easily, especially when you can just compress comps to make up for it


What? You don't think businesses do financial planning and calculations for profit margins?

Do you really think they go on vibes - "welp, this AI thing seems to improve developer performance, I guess. Heck, what's an extra 5k per developer anyways, amirite".

Well, maybe they really do in your neck of the woods. Explains a lot, I guess.


Yes most companies do in fact operate like this. There are tens of thousands of companies that will pay more for the best thing and call it at that, because the cost is dwarfed by what even marginal gains in quality unlock for the business.

I too am finding 4.7 a significant upgrade, it's hard to go back to 4.6 for me. I don't understand everyone calling it a disappointment but clowning on Anthropic is the trendy move these days.

And what's missing in all these token count complaints is that 4.7 is actually cheaper overall anyways because it produces fewer output tokens.


HN is getting ridiculous. You cannot seriously be complaining about Opus token usage on the Pro plan.

Compared to the usage you get on OpenAI's $20 plan tho?

Adaptive thinking is optional

Not when you want extended thinking - you select extended thinking and opus decides if you get it with apativenthinking.

"With Opus 4.6, extended thinking was a toggle you managed: turn it on for hard stuff, off for quick stuff. If you left it on, every question paid the thinking tax whether it needed to or not. Now, with Opus 4.7, extended thinking becomes adaptive thinking. "

https://claude.com/resources/tutorials/working-with-claude-o...


...are you talking about the app? Come on. The app is for quick queries. You should be using Claude Code or Cowork.

I've gotten quite a bit of work done on claude.ai and the mobile app though. It's been good for code review. The GitHub connector is a bit clunky but it works.

No, it is not.

https://code.claude.com/docs/en/model-config

> Opus 4.7 always uses adaptive reasoning. The fixed thinking budget mode and CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING do not apply to it.


This is just a case of mass social psychosis. Claude is the same as it has ever been. Just look at historical benchmarks: https://marginlab.ai/trackers/claude-code-historical-perform...

This mindset trivializes the immense achievements of "the common man" over the course of millennia.

Many of those achievements were achieved through physical violence. The 5-day work week, for example. We don't work 7 days because people kept shooting bosses until the bosses agreed to compromise on 5 days.

We'd have never progressed as a species with your mentality. Change is painful and it's part and parcel of progress.

Humans would be suffering far more today if we weren't willing to accept short term pains for progress.


> We'd have never progressed as a species with your mentality.

Please avoid swipes like this on HN. The guidelines make it clear we're trying for something better here. https://news.ycombinator.com/newsguidelines.html


Change and progress like the people of France deciding they had enough of injustice and nobles' impunity, then? A little short-term pain for social progress? We agree.

Look where France is now. Can't afford their own retirement.

If that's the worst problem they have, that still sounds like things worked out pretty well compared to most places.

That sounds suspiciously like a "ends justify the means" argument.

It's easy to say we need to be willing to accept short term pains when it's someone else who has to bear the brunt of them.


Are you willing to stand by this argument and give up your career?

Will you eat your words when major vuln disclosures come out 3-4 months from now?

Will you eat your words when you find out major vuln disclosures have been happening for decades?

They obviously meant on an unprecedented scale.

Sure, and healthy skepticism before proof is a sign of wisdom.

Which makes taking claims from companies at face value…?


And there's a reason they've achieved precisely zero penetration amongst normies.

A chat app is useless if your friends and family won't use it.


I've actually had good luck getting friends and family on Matrix, after they'd previously used either Facebook Messenger or Hangouts.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: