More

briga · 2025-11-18T17:27:56 1763486876

Every big new model release we see benchmarks like ARC and Humanity's Last Exam climbing higher and higher. My question is, how do we know that these benchmarks are not a part of the training set used for these models? It could easily have been trained to memorize the answers. Even if the datasets haven't been copy pasted directly, I'm sure it has leaked onto the internet to some extent.

But I am looking forward to trying it out. I find Gemini to be great as handling large-context tasks, and Google's inference costs seem to be among the cheapest.

stephc_int13 · 2025-11-18T17:43:56 1763487836

Even if the benchmark themselves are kept secret, the process to create them is not that difficult and anyone with a small team of engineers could make a replica in their own labs to train their models on.

Given the nature of how those models work, you don't need exact replicas.

briga · 2025-11-18T17:18:07 1763486287

Maybe it has guard rails against such things? That would be my main guess on the Zizian one.

briga · 2025-11-08T19:36:27 1762630587

There's a checkbox on whether you want to use it or not in the settings page, does this not change these settings?

I don't feel opposed to them changing the browser in principle--certainly there have been many improvements to web browsers over the years. Is privacy the concern here?

johnh-hn · 2025-11-08T19:49:30 1762631370

If the checkbox you're referring to is the "Use AI to suggest tabs and a name for tab groups" one, then I can't see what setting it changes. It's not the browser.ml.enable flag. I tried unchecking it, restarting the browser, and that flag was unaffected. This is in version 144.0.2.

Searching for "AI" shows one other setting: "Quickly access bookmarks, tabs from your phone, AI chatbots, and more without leaving your main view." But I'd already disabled that apparently. Despite that, there are plenty of flags that were enabled mentioned in the article.

dawnerd · 2025-11-08T19:49:46 1762631386

Last I checked there wasn’t and you still had to fiddle with a few about:config options to actually turn off all the ai stuff. I would be fine with it if it was just a settings page rather than hidden settings.

briga · 2025-11-08T00:39:56 1762562396

Is this the future technology that anyone wants?

juris · 2025-11-08T00:48:58 1762562938

if only to screen suitable material for the presidency.

Terr_ · 2025-11-08T00:52:29 1762563149

System output: "Person. Woman. Man. Camera. TV."

briga · 2025-10-21T02:11:58 1761012718

One thing I think this article overlooks is that Argentina was a superpower, at least before the Panama canal was built. Before that, pretty much all shipping between the Atlantic and the Pacific had to go south around Argentina and Chile. Buenos Aires was one of the best stops along that route, and so it became one of the richest places on earth. After the Panama canal was built most of this traffic dropped off, and so did Argentina's fortunes. It's just so far away from everywhere that it has never been as geographically significant since.

more_corn · 2025-10-21T02:53:39 1761015219

Seems like Argentina was wealthy till the 1940s the Panama Canal was completed in 1914. I visited buenos Aries twenty years ago and it reminded me of Paris. Grand old architecture, big buildings wide avenues. Something happened in the latter half of the 20th century that caused it to decline and stagnate. I always thought it was dictatorships, civil unrest and hyperinflation, but maybe those are symptoms and not causes.

noir_lord · 2025-10-21T10:01:21 1761040881

Militarily they where powerful however they bought that power they didn't build it (UK was primary supplier of their battleships when they had their arms races with Chile and Brazil respectively) so it was a bit of a glass hammer situation.

https://www.youtube.com/watch?v=gnl-wtl5-1E Drachinifel covers part of it here as well as he usually does.

briga · 2025-09-20T18:40:30 1758393630

I have a theory: all these people reporting degrading model quality over time aren't actually seeing model quality deteriorate. What they are actually doing is discovering that these models aren't as powerful as they initially thought (ie. expanding their sample size for judging how good the model is). The probabilistic nature of LLM produces a lot of confused thinking about how good a model is, just because a model produces nine excellent responses doesn't mean the tenth response won't be garbage.

vintermann · 2025-09-20T18:59:11 1758394751

They test specific prompts with temperature 0. It is of course possible that all their tests prompts were lucky, but still then, shouldn't you see an immediate drop followed by a flat or increasing line?

Also, from what I understand from the article, it's not a difficult task but an easily machine checkable one, i.e. whether the output conforms to a specific format.

Spivak · 2025-09-20T21:02:27 1758402147

If it was random luck, wouldn't you expect about half the answers to be better? Assuming the OP isn't lying I don't think there's much room for luck when you get all the questions wrong on a T/F test.

lostmsu · 2025-09-21T04:00:24 1758427224

With T=0 on the same model you should get the same exact output text. If they are not getting it, other environmental factors invalidate the test result.

nothrabannosir · 2025-09-20T18:59:22 1758394762

TFA is about someone running the same test suite with 0 temperature and fixed inputs and fixtures on the same model over months on end.

What’s missing is the actual evidence. Which I would love of course. But assuming they’re not actively lying, this is not as subjective as you suggest.

chaos_emergent · 2025-09-20T18:45:28 1758393928

Yes exactly, my theory is that the novelty of a new generation of LLMs’ performances tends to cause an inflation in peoples’ perceptions of the model, with a reversion to a better calibrated expectation over time. If the developer reported numerical evaluations that drifted over time, I’d be more convinced of model change.

zzzeek · 2025-09-20T18:58:48 1758394728

your theory does not hold up for this specific article as they carefully explained they are sending identical inputs into the model each time and observing progressively worse results with other variables unchanged. (though to be fair, others have noted they provided no replication details as to how they arrived at these results.)

gtsop · 2025-09-20T20:42:25 1758400945

I see your point but no, it's getting objectively worse. I have a similar experience of casually using chatgpt for various use cases, when 5 dropped i noticed it was very fast but oddly got some details off. As time moved on it became both slower and the output deteriorated.

yieldcrv · 2025-09-20T18:46:40 1758394000

fta: “I am glad I have proof of this with the test system”

I think they have receipts, but did not post them there

Aurornis · 2025-09-20T18:58:26 1758394706

A lot of the claims I’ve seen have claimed to have proof, but details are never shared.

Even a simple graph of the output would be better than nothing, but instead it’s just an empty claim.

yieldcrv · 2025-09-20T19:41:27 1758397287

That's been my experience too

but I use local models and sometimes the same ones for years already, and the consistency and expectations there is noteworthy, while I also have doubts about the quality consistency I have from closed models in the cloud. I don't see these kind of complaints from people using local models, which undermines the idea that people were just wowed three months ago and less impressed now.

so perhaps it's just a matter of transparency

but I think there is consistent fine tuning occuring, alongside filters added and removed in an opaque way in front of the model

colordrops · 2025-09-20T18:46:32 1758393992

Did any of you read the article? They have a test framework that objectively shows the model getting worse over time.

Aurornis · 2025-09-20T18:59:42 1758394782

I read the article. No proof was included. Not even a graph of declining results.

colordrops · 2025-09-21T16:55:54 1758473754

Ok fair, but not including the data is not the same as the article saying it was subjective "feel".

briga · 2025-08-14T05:53:17 1755150797

I've been reading a lot of (human-written) books lately, and one thing this has made abundantly clear to me is that AI writing just doesn't stack up. For one AI writing is often completely wrong about the details. But it also just tends to be bland and superficial. If you want a 5-minute summary of something, sure, it can do a passable job. But if I want something substantial and carefully thought out, I'll choose a book written by a human expert every time.

Maybe this will change at some point in the future, but for now there's no way I would substitute a well-written book on a subject for AI slop. These models are trained on human-written material anyway, why not just go straight to the source?

briga · 2025-08-09T17:34:42 1754760882

Certainly we are not perfect, but I think overall Canada has done more for the world to uphold human rights and freedoms than otherwise. When the government does act against "individual freedom", it is usually for the good of larger society. For instance because of firearm restrictions Canadian citizens are (or used to be?) free from getting shot with firearms on the streets. Is it a perfectly free society? No, but for the most part people here have it pretty good. I'd wager most of the immigrants moving here are much more free than they were in their home countries

SilverElfin · 2025-08-09T17:48:08 1754761688

> When the government does act against "individual freedom", it is usually for the good of larger society.

This line of thinking can be used to justify anything. That’s why it’s important to protect the individual and their rights, even in face of what a majority - who can be unjust - wants. And speech in particular, is so fundamental to the idea of freedom, that it should be almost absolutely protected. A constitutional guarantee of free speech and privacy is critical.

briga · 2025-08-09T17:53:33 1754762013

You're posting that reply on a message board that, strictly speaking, does not have free speech. If I started flaming you my post would get removed pretty quickly. This forum is heavily moderated. Does that make it a better, or worse, place for discussion?

SilverElfin · 2025-08-09T18:02:57 1754762577

It’s one among a large number of forums you can choose from, not a monopoly, with no restriction by the government, who has a monopoly on violence and the ability to take away your time or money.

briga · 2025-08-09T18:11:03 1754763063

By that line of thinking, Canada is just one country you can choose to live in. Certainly people here have the choice to move to other countries that they think has more freedom. I have a hard time thinking of any other countries that entirely fit that criteria at the moment.

SilverElfin · 2025-08-09T18:43:22 1754765002

You are arguing in bad faith. People as individuals, including Canadians, deserve freedom of speech, without threat of fines or jail time. Most people can’t just move to another country. They can however move to another website as an alternative to HN.

briga · 2025-08-09T19:32:27 1754767947

I have never once received threats of fines or jail time for their speech, nor have any of the Canadians I know. Are you aware that freedom of opinion and expression is very clearly spelled out in the Canadian charter of rights?

ygjb · 2025-08-12T17:11:05 1755018665

Canadians have Freedom of Expression, which is a stronger protection than Freedom of Speech, but the Canadian legal interpretation of that freedom allows for constraints based on hate, obscenity, and a few legal constraints that are common across most "Free Speech" jurisdictions (libel, defamation, etc).

There are cases where people have been charged, fined, and even jailed for "expression", but those are largely limited to cases where folks are promoting violence against specific groups (including hate speech, for example, teaching holocaust denialism, promoting anti-semitism or pro-racist ideologies calling for violence).

There are certainly cases where there has been government over-reach, but that is why we have courts, and in general, the courts in Canada tend towards a more broad interpretation of Freedom of Expression. Are there specific cases in Canada that you can cite where people haven't enjoyed Freedom of Expression (including freedom of speech, which is protected under the broader umbrella of Expression).

dismalaf · 2025-08-09T21:16:19 1754774179

Our murder rate is several times higher than European nations with less restrictive gun laws...

briga · 2025-08-09T22:33:04 1754778784

And we also share an extremely long land border with the US, go figure

briga · 2025-06-14T04:04:51 1749873891

This is a nice "just so" explanation, but I don't think it is telling the full story, or even most of it. Sure tax policy probably has an impact, but so do interest rates, AI, tariffs, inflation, geopolitical turmoil, rampant speculation and hype cycles, etc. If this tax policy is so important why didn't it save the dot com crash from happening? Why are tech industries outside the US seeing similar hiring downturns? It's a boom and bust industry, we're in the bust, and it seems unlikely that one bad tax policy is the culprit.

briga · 2025-05-07T07:48:51 1746604131

Interesting how there is no mention of how the training data for this was collected. This does sound quite a bit better than Meta's MusicGen, but then again that model was also trained on a small licensed dataset.

platers · 2025-05-07T15:39:54 1746632394

It sounds very similar to suno v3.5 (including the audio quality) Likely they trained on suno generations.