More

pants2 · 2026-04-26T23:12:13 1777245133

Also Discord - tons of people use Discord as a social network and keep up with friends. I must have 5 friend groups that have their own Discords with some overlap.

pants2 · 2026-04-26T18:04:23 1777226663

So did you disclose this responsibly? Posting about it publicly first is asking for that sensitive data to be leaked. Might as well hack and repost that PII yourself.

g48ywsJk6w48 · 2026-04-26T19:04:11 1777230251

This is not a data leakage. They deliberately included 999 of their customers' email addresses in publicly accessible JavaScript code in order to test certain features on them.

pants2 · 2026-04-27T00:18:32 1777249112

Certainly that wasn't intentional to broadcast to the public? Sounds like a textbook data leak.

> A data leak is the unauthorized, often unintentional exposure of sensitive, confidential, or personal information to an external party, usually resulting from weak infrastructure, human error, or system errors.

pants2 · 2026-04-26T18:00:59 1777226459

Consider medical device software. Often embedded C code, needs to be rigorously documented and tested, has longer development cycles, and certainly no attitudes of "bugs are fine, ship it and we'll patch later."

pants2 · 2026-04-26T14:39:04 1777214344

Have you seen https://www.4dv.ai/

RobotToaster · 2026-04-26T18:13:30 1777227210

Doesn't give much information about how they were generated

pants2 · 2026-04-27T00:19:22 1777249162

https://youtu.be/X8yRlA7jqEQ?t=1190

pants2 · 2026-04-24T18:55:56 1777056956

Is anyone here actually using pro models through the API? I'd be very curious what the use-case is.

chadash · 2026-04-24T19:07:28 1777057648

Yes. High value work where cost (mostly) doesn't matter. For example, if I need to look over a legal doc for possible mistakes (part of a workflow i have), it doesn't matter (in my case) whether it costs $0.01 or $10.00, since it's a somewhat infrequent event. So i'll pay $9.99 more, even if the model is only slightly better.

bogtog · 2026-04-24T19:42:57 1777059777

I'm surprised I never heard people talking about using -Pro variants, even though their rates ($125-175/M?) aren't drastically larger than old Opus ($75/M), which people seemed to use

freedomben · 2026-04-24T19:11:52 1777057912

Indeed, even just Terms of Service and Privacy Policy work. Infrequent enough that cost isn't an issue, but model quality absolutely is

ComputerGuru · 2026-04-24T19:07:17 1777057637

Yes? The same reason you would use it via the tooling.

pants2 · 2026-04-23T21:44:28 1776980668

And "valet" is supposed to rhyme with "ballot" not "ballet" but you'll still sound like an idiot if you say "take your car to the val-it"

gnabgib · 2026-04-24T00:24:37 1776990277

What's your reference? Cambridge: /ˈvæl.eɪ/ https://dictionary.cambridge.org/pronunciation/english/valet

(britannica[0], merriam-webster[1])

[0]: https://www.britannica.com/dictionary/eb/audio?word=va%2Alet...

[1]: https://www.merriam-webster.com/dictionary/valet

pants2 · 2026-04-24T01:06:17 1776992777

Your Merriam Webster source has "val-it" as the first pronunciation (but I think in this case both are correct and valit is less common)

gnabgib · 2026-04-24T01:11:44 1776993104

It does.. and I've never heard anyone say it that way (and I appreciate that you chose the only dictionary that gave anything close to your argument).. but that's still nothing like "ballot".

aksss · 2026-04-23T22:30:23 1776983423

Drink some clarit with the valit over a good filit.

Deebster · 2026-04-24T00:44:01 1776991441

Jeeves (the gentleman's personal gentleman) is a valet that would be pronounced VAL-et.

pants2 · 2026-04-23T20:06:47 1776974807

Labs still aren't publishing ARC-AGI-3 scores, even though it's been out for some time. Is it because the numbers are too embarrassing?

tedsanders · 2026-04-24T01:11:48 1776993108

Honest answer is that it isn't done running yet. It takes some human bandwidth and time to run, so results weren't ready by this morning. We don't know what the score will be, but will probably go up on the leaderboard sometime soon. I personally don't put a lot of stock in the ARC-AGI evals, as it's not relevant to most work that people do, but should still be interesting to see as a measure of reasoning ability.

(I work at OpenAI.)

AG25 · 2026-04-23T21:00:54 1776978054

GPT-5.5 was just released and OpenAI didnt mention ARC AGI 3 at all, their score probably sucks.

kilroy123 · 2026-04-23T20:11:27 1776975087

To be fair, there's not much to report. Isn't it pretty much at 0?

pants2 · 2026-04-23T21:39:18 1776980358

Opus-4.6 with 0.5% currently leads GPT-5.4 with 0.2%[1].

Seems meaningful even if the absolute numbers are very low. That's sort of the excitement of it.

2. https://arcprize.org/leaderboard

pants2 · 2026-04-23T06:38:24 1776926304

Especially these days you can SSH to a baremetal server and just tell Claude to set up Postgres. Job done. You don't need autoscaling because you can afford a server that's 5X faster from the start.

i5heu · 2026-04-23T06:41:34 1776926494

You just use docker.

It is like 4 lines of config for Postgres, the only line you need to change is on which path Postgres should store the data.

spockz · 2026-04-23T07:29:39 1776929379

You also probably want the Postgres storage on a different (set) of disks.

Maybe change the filesystem?

pants2 · 2026-04-22T04:11:41 1776831101

For closed-source, I'd expect defenders to have a greater advantage because they can run Mythos on the source code, while attackers only get an opaque API/protocol to try messing with.

ygjb · 2026-04-22T18:08:37 1776881317

There is definitely a closed-source defender advantage where an attacker doesn't have access to the code, binary, or environment that can be instrumented (so basically, running in the cloud), but there have been several very effective technical demonstrations of LLM guided or agentic approaches to assessing the security of closed source tools, and I have had some successes personally using LLMs with tool use to manage binary analysis tools to perform reverse engineering of closed source packages.

For many attack scenarios the boundary is really if you can establish an effective canary or oracle for determining if a change in input results in a change in output, once you have that, it's simply a matter of scaling your testing or attack (for fuzzing, for blind injection, or any other number of attacks that depend on getting signal from a service).

sanxiyn · 2026-04-22T04:39:28 1776832768

To some extent yes, but models are good at reverse engineering such that it isn't as great advantage as you might think.

pants2 · 2026-04-21T20:07:17 1776802037

The second 4K image definitely has a raccoon on the left there! Nice.