So is this a minimal upgrade before the M6 Macbook Pros w/ OLED & a redesign later this year?
It doesn't even look like they added cellular as an option with their own C1X chip (getting around the licensing / cost issues since it's their own chip now).
I use a massive OLED monitor as my workhorse and I’d say money and expectations are better spent on established OLED manufacturers and a large screen vs. a laptop screen. Based on the common job roles HN users have, a large OLED main monitor will probably offer more value than the laptop screen that will probably spend most of its time as a side monitor or just turned off while connected to large monitors. The HDMI 2.1 and other display output gains bring more benefits with pixel output and framerate increases. Just my two cents.
I'm a bit confused by this branding (never even noticed that there was a 5.2-Instant), it's not a super fast 1000tok/s Cerebras based model which they have for codex-spark, it's just 5.2 w/out the router / "non-thinking" mode?
I feel like openai is going to get right back to where they were pre GPT-5 with a ton of different options and no one knows which model to use for what.
Yeah, for a while ChatGPT Plus has been powered by two series of models under the hood.
One series is the Instant series, which is faster and more tuned to ChatGPT, but less accurate.
The second series is the Thinking series, which is more accurate and more tuned to professional knowledge work, but slower (because it uses more reasoning tokens).
We'd also prefer to have simple experience with just one option, but picking just one would pull back the pareto frontier for some group of people/preferences. So for now we continue to serve two models, with manual control for people who want to choose and an imperfect auto switcher for people who don't want to be bothered. Could change down the road - we'll see.
By the way, I imagine you know this, but the product split is not obvious, even to my 20-something kids that are Plus subscribers - I saw one of them chatting with the instant model recently and I was like "No!! Never do that!!" and they did not understand they were getting the (I'm sorry to say) much less capable model.
I think it's confusing enough it's a brand harm. I offer no solutions, unfortunately. I guess you could do a little posthoc analysis for plus subscribers on up and determine if they'd benefit from default Thinking mode; that could be done relatively cheaply at low utilization times. But maybe you need this to keep utilization where it's at -- either way, I think it ends up meaning my kids prefer Claude. Which is fine; they wouldn't prefer Haiku if it was the default, but they don't get Haiku, they get Sonnet or Opus.
I agree -- we're on the ChatGPT Enterprise plan at work and every time someone complains about it screwing up a task it turns out they were using the instant model. There needs to be a way to disable it at the bare minimum.
You could perhaps show the "instant" reply right away and provide a button labeled "Think longer and give me a better answer" that starts the thinking model and eventually replaces the answer.
For this to work well, the instant reply must be truly instant and the button must always be visible and at the same position in the screen (i.e. either at the top or bottom, of the answer, scrolling such that it is also at the top or bottom of the screen), and once the thinking answer is displayed, there should be a small icon button to show the previous instant answer.
That's assuming that the instant answer is even directionally correct. A misleading instant answer could pollute the context and lead the thinking model astray.
Can the context of the pre-revision, Instant response be simply be discarded -- or forked or branched or [insert appropriate nomenclature here] -- instead of being included as potential poison?
(It seems absurd that to consider that there may be no undo button that the machine can push.)
For those who are unaware, this is exactly what Grok does. The default is an auto mode, then when you ask a question it starts researching (which is visible to the user) and if it's using the expert mode but you don't really need all that jazz, it has a "Quick Answer" button right above the prom entry field, and if it's using a "Quick Answer" mode then it has "Expert" button and the same place, and you are able to toggle between them mid answer and it will adjust the model (or model parameters, I'm not sure how it works under the hood).
It's pretty good with the auto chooser, but I appreciate the manual choice available so in-your-face and especially not having it restart the query completely but rather convert the output to either Quick or Expert.
This is on the Web UI, can't speak for other harnesses. I do find that it's quite good with the citations and has a fairly generous free tier, even on Expert mode. (As for who sits at the top, I am indeed put off by Musk's clear interference in several cases involving Grok, nor do my personal values align with the majority of his, but today's Grok is definitely less MechaHitler and more reliable than it was before.)
Thanks for clarifying! I guess the default for most users is going to be to use the router / auto switcher which is fine since most people won't change the default.
Just noting that I'm not against differentiation in products, but it gets very confusing for users when there's too many options (in the case of the consumer ChatGPT at least this is still more limited than in pre-GPT 5 days). The issue is that there's differentiation at what I pay monthly (free vs plus vs pro) and also at the model layer - which essentially becomes this matrix of different options / limits per model (and we're not even getting into capabilities).
For someone who uses codex as well, there are 5 models there when I use /model (on Plus plan, spark is only available for Pro plan users), limits also tied to my same consumer ChatGPT plan.
I imagine the model differentiation is only going to get worse as well since with more fine tuned use cases, there will be many different models (ie health care answers, etc.) - is it really on the user to figure out what to use? The only saving grace is that it's not as bad as Intel or AMD cpu naming schemes / cloud provider instance naming, but that's a very low bar.
Auto will never work, because for the exact same prompt sometimes you want a quick answer because it's not something very important to you, and sometimes you want the answer to be as accurate as possible, even if you have to wait 10 minutes.
In my case it would be more useful to have a slider of how much I'm willing to wait. For example instant, or think up to 1 minute, or think up to 15 minutes.
That's pretty close to what they have. They just named them Instant, Thinking (Standard), and Thinking (Extended), and they're discrete presets instead of a slider.
Yeah I use that, but it's not really a solution that allows to only have auto. It doesn't help when it chooses Instant instead of Thinking, and it's also much slower than using Instant outright because the Skip button doesn't immediately show, and it's generally slow to restart.
I've long suspected as much, but I always found the API model name <-> ChatGPT UI selector <-> actual model used correspondence very confusing, and whether I was actually switching models or just some parameters of the harness/model invocation.
> One series is the Instant series, which is faster and more tuned to ChatGPT, but less accurate.
That's putting it mildly. In my experience, the "instant/chat" model is absolute slop tier, while the "thinking" one is genuinely useful and also has a much more palatable tone (even for things not really requiring a lot of thought).
Fortunately, the latter clearly identifies itself with an absurd amout of emoji reminiscent of other early chatbots that shall not be named, so I know how to detect and avoid it.
Before GPT-5 was launched, and after sama had said they would unify the ordinary and reasoning models, I think we all expected more than an (auto-)switcher, we expected some small innovation (smaller than the ordinary-to-reasoning one, but still a significant one) that would make both kinds of replies be in a way generated by a single model (don't know exactly how, I expected OpenAI to surprise us with something that would feel obvious in retrospect).
The model doesn't even need to be exposed in the UI. Let the user specify "use model foobar-4" or "use a coding model" or "use a middle-tier attorney model".
VIM does this well: no UI, magic incantations to use features.
Forgiveness but while you're here can you look into why the Notion connector in chat doesn't have the capability to write pages but the MCP (which I use via Codex) can? it looks like it's entirely possible, just mostly a missing action in the connector.
It's because people like choice and control, and "5.2" vs "5.2 thinking" is confusing. Making them "5.2 instant" and "5.2 thinking" is less confusing to more people. Their competitors already do this (Gemini 3 Fast & Gemini 3 Thinking).
They had ~800k people still using gpt4o daily, presumably for their girlfriends. They need to address them somehow. Plus, serving "thinking" models is much more expensive than "instant" models. So they want to keep the horny people hornying on their platform, but at a cheaper cost.
Will need to wait for real benchmarks, but based on OpenAI marketing Instant is their latency optimized offering. For voice interface, you don't actually need high tok/s because speech is slow, time to first token matters much more.
Reminder that OpenAI serves a lot of customers for free, most of the people I know use the free tier. There is a big limit on thinking queries on free tier, so a decent non thinking model is probably a positive ROI for them.
This seems like it’s in response to the congressional testimony last week to clarify some things about their remote assistance systems.
It’s interesting that they only have 70 people for this - I can understand the outside the US ones for nighttime assistance and they need to be able to scale for other countries too in the future.
What I’m still wondering is what is limiting the scaling for Waymo - just cars or also the sensor systems? They’ve had their new test vehicles in SF for a while but I still think that most customers only get their Jaguars right now (and still limited on highway driving to specific customers in the Bay Area).
> What I’m still wondering is what is limiting the scaling for Waymo
I'm also very curious about this. Probably a mix of many things: training the driver to handle tricky conditions better (e.g. flooded roads), getting more Ohai vehicles imported and configured, configuring the backlog of Jaguar iPace and trucking them out to new markets, mapping roads and non-customer testing in new markets, getting regulatory approval/cooperation in other market (e.g. DC), finding depot space, hiring maintenance team, etc.
This GitHub readme was helpful in understanding their motivation, cheers for sharing it.
> Integrating agents into it prevents fragmentation of their service and allows them to keep ownership of their interface, branding and connection with their users
Looking at the contrived examples given, I just don't see how they're achieving this. In fact it looks like creating MCP specific tools will achieve exactly the opposite. There will immediately be two ways to accomplish a thing and this will result in a drift over time as developers need to take into account two ways of interacting with a component on screen. There should be no difference, but there will be.
Having the LLM interpret and understand a page context would be much more in line with assistive technologies. It would require site owners to provide a more useful interface for people in need of assistance.
> Having the LLM interpret and understand a page context
The problem is fundamentally that it's difficult to create structured data that's easily presentable to both humans and machines. Consider: ARIA doesn't really help llms. What you're suggesting is much more in line with microformats and schema.org, both of which were essentially complete failures.
LLMs can already read web pages, just not efficiently. It's not an understanding problem, it's a usability problem. You can give a computer a schema and ask it to make valid API calls and it'll do a pretty decent job. You can't tell a blind person or their screen reader to do that. It's a different problem space entirely.
I'm currently a solo bootstrapped founder, have done short stints in the past - 1 year in 2022, then became cofounder of a funded startup for a year. Now doing it again.
Question is how you stay motivated to keep at it - looks like it took about 4 years before you made similar to your Google salary, did family pressure or external pressure ever impact you? Or is it mainly just keep your eyes on the longer term goal?
I'm also quite lucky that I was aiming for lean-FIRE before I left Facebook, so I have the luxury of being able to keep at it, but sometimes it is demotivating seeing peers / others.
> Question is how you stay motivated to keep at it - looks like it took about 4 years before you made similar to your Google salary, did family pressure or external pressure ever impact you? Or is it mainly just keep your eyes on the longer term goal?
I found it helpful to go in with low expectations.
I was listening to a lot of podcasts about bootstrapping while I was still at Google in 2017-2018, and even the big success stories usually had 5+ years of failing or succeeding only marginally. So, I went in with the expectation that I'd probably fail for the first 5 years, and so there wasn't that feeling of disappointment from not earning much the first few years.
I also had a lot of lucky conditions that made it easy to take the risk at the time, including no family to support, lots of savings, low expenses.
> I'm also quite lucky that I was aiming for lean-FIRE before I left Facebook, so I have the luxury of being able to keep at it, but sometimes it is demotivating seeing peers / others.
Yeah, honestly I do sometimes think, "Wow, if I'd stayed at Google and kept getting that comp (which was about 50% equity IIRC), that would be a lot of money." But I also am very pleased with my life now, and I know I wouldn't have enjoyed my job nearly as much for the last 8 years had I stayed an employee. And that's a huge amount of my life to not do what I'd like to do.
Already have my own JS engine & the basics of three.js and pixi.js 8 working, roadmap to v1.0.0 posted in github issues. Aiming to show it to folks at GDC in March.
So in theory it should be possible, but it might require customizing the Dawn or wgpu-native builds if they don't support it (this is providing the JS bindings / wrapper around those two implementations of wgpu.h). But I've already added a special C++ method to handle draco compression natively, adding some mystral native only methods is not out of the question (however, I would want to ensure that usage of those via JS is always feature flagged so that it doesn't break when run on web).
Did you write your WebGPU chessboard using the raw JS APIs? Ideally it should work, but I just fixed up some missing APIs to get Three.js working in v0.1.0, so if there are issues, then please open up an issue on github - will try to get it working so we close any gaps.
Here's a dawn implementation with support for ray tracing that was implemented a number of years ago but never integrated into browsers. Perhaps it will help?
Yes, chessboard3d.app is written with raw JS APIs and raw WebGPU. It does use the rapier physics library, which uses WASM, which might be an issue? It implements its own ray tracing but would probably run 10x faster with hardware ray tracing support.
I think you'd get a lot of attention if you had hardware ray tracing, since that's only currently available in DirectX 12 and Vulkan, requiring implementation in native desktop platforms. FWIW, if the path looks feasible, I would be interested in contributing.
WASM shouldn't be an issue since the draco decoder uses it - but it may only work with V8 (for quickjs builds it wouldn't work, but the default builds use V8+dawn). Obviously with an alpha runtime, there may be bugs.
I think it would be super cool to have some sort of extension before WebGPU (web) has it. I was taking a look at the prior example & it seems like there's good ongoing discussion linked here about it: https://github.com/gpuweb/gpuweb/issues/535. Also I believe that Metal has hardware ray tracing support now too?
Re: Implementation, a few options exist - a separate Dawn fork with RT is one path (though Dawn builds are slow, 1-2 hours on CI). Another approach would be exposing custom native bindings directly from MystralNative alongside the WebGPU APIs - that might make iteration much faster for testing feasibility. The JS API would need to be feature-flagged so the same code gracefully falls back when running on web (did this for a native draco impl too that avoids having to load wasm: https://mystralengine.github.io/mystralnative/docs/api/nativ...).
Followup comment about Apple disallowing JIT - will need to confirm if JSC is allowed to JIT or only inside of a webview. I was able to get JSC + wgpu-native rendering in an iOS build, but would need to confirm if it can pass app review.
There's 2 other performance things that you can do by controlling the runtime though - add special perf methods (which I did for draco decoding - there is currently one __mystralNativeDecodeDracoAsync API that is non standard), but the docs clearly lay out that you should feature gate it if you're going to use it so you don't break web builds: https://mystralengine.github.io/mystralnative/docs/api/nativ...
The other thing is more experimental - writing an AOT compiler for a subset of Typescript to convert it into C++ then just compile your code ("MystralScript") - this would be similar to Unity's C# AOT compiler and kinda be it's own separate project, but there is some prior work with porffor, AssemblyScript, and Static Hermes here, so it's not completely just a research project.
Is AssemblyScript good for games though? last I checked it lacks too much features for game-code coming directly from TS but might be better now? No idea how well static hermes behaves today (but probably far better due to RN heritage).
I've been down the TS->C++ road a few times myself and the big issue often comes up with how "strict" you can keep your TS code for real-life games as well as how slow/messy the official TS compiler has been (and real-life taking time from efforts).
It's better now, but I think one should probably directly target the GO port of the TS compiler (both for performance and go being a slightly stricter language probably better suited for compilers).
I guess, the point is that the TS->C++ compilation thing is potentially a rabbit-hole, theoretically not too bad, but TS has moved quickly and been hard to keep up with without using the official compiler, and even then a "game-oriented" typescript mode wants to have a slightly different semantic model from the official one so you need either a mapping over the regular type-inference engine, a separate on or a parallell one.
Mapping regular TS to "game-variants", the biggest issue is how to handle numbers efficiently, even if you go full-double there is a need to have conversion-point checking everywhere doubles go into unions with any other type (meaning you need boxing or a "fatter" union struct). And that's not even accounting for any vector-type accelerations.
AssemblyScript was just mentioned as some prior work, I don't think that AssemblyScript would work as is for games.
I realize the major issues with TS->C++ though (or any language to C++, Facebook has prior work converting php to C++ https://en.wikipedia.org/wiki/HipHop_for_PHP that was eventually deprecated in favor of HHVM). I think that iteratively improving the JS engine (Mystral.js the one that is not open source yet but is why MystralNative exists) to work with the compiler would be the first step and ensuring that games and examples built on top with a subset of TS is a starting point here. I don't think that the goal for MystralScript should be to support Three.js or any other engine to begin with as that would end up going down the same compatibility pits that hiphop did.
Being able to update the entire stack here is actually very useful - in theory parts of mystral.js could just be embedded into mystralnative (separate build flags, probably not a standard build) avoiding any TS->C++ compilation for core engine work & then ensuring that games built on top are using the strict subset of TS that does work well with the AOT compilation system. One option for numbers is actually using comment annotations (similar to how JSDoc types work for typescript compiler, specifically using annotations in comments to make sure that the web builds don't change).
Re: TS compiler - I do have some basics started here and I am already seeing that tests are pretty slow. I don't think that the tsgo compiler has a similar API though for parsing & emitters right now, so as much as I would like to switch to it (I have for my web projects & the speed is awesome), I don't think I can yet until the API work is clarified: https://github.com/microsoft/typescript-go/discussions/455
I remember reading about Ejecta a long time ago! I had completely forgotten about it, but it is similar! The funny thing is to support UI elements, I had to also support canvas2d through Skia (although not 100% yet), so maybe impact could even work at some point (would require extensive testing obviously).
AWS has information about their UAE data centers, but haven't seen any confirmation from Amazon itself that amazon.com is having issues.