More

kingstnap · 2026-04-26T10:57:46 1777201066

I think that AI can sometimes help a lot. But I think doing it correctly is a tightrope and one misstep can easily have terrible results.

First issue is this result from reinforcement learning that tells you that you really want to be doing a large fraction of stuff stuff on policy when possible.

It's true of RL agents, but I think it's actually just a universal learning result that applies to humans. Sure you could ask AI to solve a difficult math problem step by step, and what it can expose you to is tricks you had no idea about and the general method of solving such a problem.

But there is something about the work that you produced without external influence (the on policy epispde) that is sort of irreplaceably important.

The second is that there is something about the speed and conciseness of information AI presents to you. It seems like a super power but there are two problems I have with it.

A) It's too fast. Unless you are artificially slowing yourself down by reading like one sentence per minute there is something about how quickly all you want gets presented to you that seems to have a strong in one ear out the other sort of effect. You need to slow down. You need to appreciate the details.

B) It's also often too consise. There is something about doing research yourself that lets you stumble upon something new that you might not have thought was helpful. Lots of times I've found lots of amazing nuggets on missteps and tangents.

There are more issues as well, but these are the major two I get concerned about. Like you need to be cognizant of the work not being done when you are using AI to do research. And imo it's deeply problematic for young students who have literally never done the hard work of trying to answer questions themselves. Because they might not realize the problem.

kingstnap · 2026-04-23T19:18:49 1776971929

I feel like devs generally spend someone else's money on tokens. Either their employers or OpenAIs when they use a codex subscription.

If I put on my schizo hat. Something they might be doing is increasing the losses on their monthly codex subscriptions, to show that the API has a higher margin than before (the codex account massively in the negative, but the API account now having huge margins).

I've never seen an OpenAI investor pitch deck. But my guess is that API margins is one of the big ones they try to sell people on since Sama talks about it on Twitter.

I would be interested in hearing the insider stuff. Like if this model is genuinely like twice as expensive to serve or something.

vineyardmike · 2026-04-23T20:52:13 1776977533

You can't build a business on per-seat subscriptions when you advertise making workers obsolete. API pricing with sustainable margins are the only way forward if you genuinely think you're going to cause (or accelerate) reduction in clients' headcount.

Additionally, the value generated by the best models with high-thinking and lots of context window is way higher than the cheap and tiny models, so you need to provide a "gateway drug" that lets people experience the best you offer.

CryptoBanker · 2026-04-24T04:29:06 1777004946

> You can't build a business on per-seat subscriptions when you advertise making workers obsolete.

On the other hand I would argue that most workers' salaries are more like subscriptions than API type pricing (which would be more like an hourly contractor)

ewrs · 2026-04-23T19:21:38 1776972098

Yeah and the increase in operating expenses is going to make managers start asking hard questions - this is good. It means eventually there will be budgets put in place - this will force OAI and Anthropic to innovate harder. Then we will see how things pan out. Ultimately a firm is not going to pay rent to these firms if the benefits dont exceed the costs.

dist-epoch · 2026-04-23T20:12:32 1776975152

> Ultimately a firm is not going to pay rent to these firms if the benefits dont exceed the costs.

This is also true for the humans. They will need to provide more benefits than the coding agents cost.

eiksjs · 2026-04-23T20:35:20 1776976520

Humans are needed to use agents and these agents are not showing to be fully autonomous and require constant human review. In fact all you are getting is a splurge of stuff, people not thinking deeper anymore and the creation of more bottle necks and exacerbating the ones that already exist in an org.

You sound like elon with the fsd will be here next year. Many cars have the self driving feature - most drivers don’t use it. Oh why is that I wonder.

mrwaffle · 2026-04-24T04:22:38 1777004558

Meaning that you believe they're not trying their "hardest" to innovate? They must be slacking then.

girvo · 2026-04-23T21:09:03 1776978543

Budgets are already happening

mitjam · 2026-04-23T20:04:35 1776974675

The difference between sub and api price makes it hard to create competitive solutions on the app level.

irthomasthomas · 2026-04-23T20:31:12 1776976272

This was something I worried about after openai started building apps as well as models. Now all of the labs make no secret of the fact that they are going after the whole software industry. Its going to be hard to maintain functioning fair markets unless governments step in.

kingstnap · 2026-04-23T18:31:42 1776969102

The meshes look interesting, but the gameplay is very basic. The tank one seems more sophisticated with the flying ships and whatnot.

What's strange is that this Pietro Schirano dude seems to write incredibly cargo cult prompts.

  Game created by Pietro Schirano, CEO of MagicPath

  Prompt: Create a 3D game using three.js. It should be a UFO shooter where I control a tank and shoot down UFOs flying overhead.
  - Think step by step, take a deep breath. Repeat the question back before answering.
  - Imagine you're writing an instruction message for a junior developer who's going to go build this. Can you write something extremely clear and specific for them, including which files they should look at for the change and which ones need to be fixed?
  -Then write all the code. Make the game low-poly but beautiful.
  - Remember, you are an agent: please keep going until the user's query is completely resolved before ending your turn and yielding back to the user. Decompose the user's query into all required sub-requests and confirm that each one is completed. Do not stop after completing only part of the request. Only terminate your turn when you are sure the problem is solved. You must be prepared to answer multiple queries and only finish the call once the user has confirmed they're done.
  - You must plan extensively in accordance with the workflow steps before making subsequent function calls, and reflect extensively on the outcomes of each function call, ensuring the user's query and related sub-requests are completely resolved.

torginus · 2026-04-23T19:57:54 1776974274

It's weird how people pep talk the AI - if my Jira tickets looked like this, I would throw a fit.

I guess these people think they have special prompt engineering skills, and doing it like this is better than giving the AI a dry list of requirements (fwiw, they might be even right)

mattgreenrocks · 2026-04-23T20:15:29 1776975329

It’s not surprising to me that the same crowd that cheers for the demise of software engineering skills invented its own notion of AI prompting skills.

Too bad they can veer sharply into cringe territory pretty fast: “as an accomplished Senior Principal Engineer at a FAANG with 22 years of experience, create a todo list app.” It’s like interactive fanfiction.

dr_kiszonka · 2026-04-24T01:46:48 1776995208

That's quite similar to the AI Studio's prompt. You are a world-class frontend engineer...

eiksjs · 2026-04-23T20:33:15 1776976395

Indeed it is so utterly cringe.

eloisant · 2026-04-23T22:04:39 1776981879

Yes, this is cargo cult.

This remind me of so called "optimization" hacks that people keep applying years after their languages get improved to make them unnecessary or even harmful.

Maybe at one point it helped to write prompts in this weird way, but with all the progress going on both in the models and the harness if it's not obsolete yet it will soon be. Just crufts that consumes tokens and fills the context window for nothing.

irthomasthomas · 2026-04-23T18:57:08 1776970628

> Think Step By Step

What is this, 2023?

I feel like this was generated by a model tapping in to 2023 notions of prompt engineering.

skirano · 2026-04-23T19:53:54 1776974034

Pietro here, I just published a video of it: https://x.com/skirano/status/2047403025094905964?s=20

tantalor · 2026-04-23T18:57:30 1776970650

It comes across as an elaborate, sparkly motivational cat poster.

*BELIEVE!* https://www.youtube.com/watch?v=D2CRtES2K3E

skolskoly · 2026-04-23T23:03:59 1776985439

https://m.media-amazon.com/images/I/71MTbRmLY8L._AC_UF894,10...

ahoka · 2026-04-23T19:57:48 1776974268

"take a deep breath"

OMFG

jameshart · 2026-04-24T04:33:21 1777005201

Claude would check to see if it had any breathing skills, if it doesn't find any it would start installing npm modules for breathing.

bredren · 2026-04-23T19:24:53 1776972293

The prompt did not specify advanced gameplay.

I do not see instructions to assist in task decomposition and agent ~"motivation" to stay aligned over long periods as cargo culting.

See up thread for anecdotes [1].

> Decompose the user's query into all required sub-requests and confirm that each one is completed. Do not stop after completing only part of the request. Only terminate your turn when you are sure the problem is solved.

I see this as a portrayal of the strength of 5.5, since it suggests the ability to be assigned this clearly important role to ~one shot requests like this.

I've been using a cli-ai-first task tool I wrote to process complex "parent" or "umberella" into decomposed subtasks and then execute on them.

This has allowed my workflows to float above the ups and downs of model performance.

That said, having the AI do the planning for a big request like this internally is not good outside a demo.

Because, you want the planning of the AI to be part of the historical context and available for forensics due to stalls, unwound details or other unexpected issues at any point along the way.

[1] https://news.ycombinator.com/item?id=47879819

kingstnap · 2026-04-22T20:51:40 1776891100

I mean Google ain't paying for Chromium development just for the fun of it...

kingstnap · 2026-04-22T13:52:28 1776865948

> Will hear it from us, not a screenshot on X or Reddit.

Has this ever been true? You will almost always see some anecdotal screenshot a long time before any company would rat on themselves.

Yes the random screenshots include a lot of false positives. But official comms have a lot of their own problems given how companies behave nowadays.

kingstnap · 2026-04-22T02:50:00 1776826200

No they are taking the massive L. Thats why they paused new sign ups.

Just for context to the insanity, they allow recursive subagents to I believe its 5 levels deep.

You can make a prompt and tell copilot to dig through a code base, have one sub agent per file, and one Recursive subagent per function, to do some complex codebase wide audit. If you use Opus 4.7 to do this it consumes a grand total of 0.5% of a Pro+ plan.

Thats why this paragraph is here:

> it’s now common for a handful of requests to incur costs that exceed the plan price

fg137 · 2026-04-22T11:21:20 1776856880

I wonder how many of those requests are "necessary" or end up being more correct/efficient than a single agent linearly go through the tasks.

kingstnap · 2026-04-21T20:01:12 1776801672

Their VR tech is pretty nice. No one sells anything anywhere near as cheap and good as the Quest 3S.

kingstnap · 2026-04-20T20:49:26 1776718166

Speculative decoding doesn't degrade output quality. The distribution it produces is exactly the same if you do it correctly. The original paper on it clearly talks about this. [0]

Speculative decoding is the same as speculative execution on CPUs. As long as you walk back on an incorrect prediction (i.e. the speculated tokens weren't accepted) then everything is mathematically exactly the same. It just uses more parallelism (specificslly higher arithmetic intensity).

[0] https://arxiv.org/abs/2211.17192

kingstnap · 2026-04-20T19:49:10 1776714550

The way revenue works on these platforms is fundamentally broken. Revenue should be considered per user to artists they listened to per month. This is actually how YouTube Premium works.

Right now the way that revenue split works is you pool together all the cash from humans and hand it to whoever has the most bots.

dimaor · 2026-04-20T20:36:33 1776717393

you described the spotify revenue model, AFAIK deezer does _user to artists_. based on an article I've read 2 years ago.

kingstnap · 2026-04-20T08:38:58 1776674338

Slack is a synonym for redundancy but is also a synonym for inefficiency.