I think spatial tokens could help, but they're not really necessary. Lots of physics/physical tasks can be solved with pencil and paper.
On the other hand, it's amazing that a 512x512 image can be represented by 85 tokens (as in OAI's API), or 263 tokens per second for video (with Gemini). It's as if the memory vs compute tradeoff has morphed into a memory vs embedding question.
This dichotomy reminds me of the "Apple Rotators - can you rotate the Apple in your head" question. The spatial embeddings will likely solve dynamics questions a lot more intuitively (ie, without extended thinking).
We're also working on this space at FlyShirley - training pilots to fly then training Shirley to fly - where we benefit from established simulation tools. Looking forward to trying Fei Fei's models!
They're building on a false premise that human equivalent performance using cameras is acceptable.
That's the whole point of AI - when you can think really fast, the world is really slow. You simulate things. Even with lifetimes of data, the cars still will fail in visual scenarios where error bars on ground truth shoot through the roof.
Elon seems to believe his cars will fail in similar ways to humans because they use cameras. False premise. As Waymo scales, human just isn't good enough, except for humans.
So, I agree with what you're saying, but that doesn't matter.
The legal standing doesn't care what tech it is behind it. 1000 monkeys for all it matters. The point is level 3 is the most dangerous level because neither the public nor the manuf properly operates in this space.
Definitely curious what circuits light-up from a Neuralese perspective. We want reasoning traces that are both faithful to the thought process and also interpretable. If the other language segments are lighting up meanings much different than their translations, that would raise questions for me.
Yes absolutely this! We're working on these problems at FlyShirley for our pilot training tool. My go-to is: I'm facing 160 degrees and want to face north. What's the quickest way to turn and by how much?
For small models and when attention is "taken up", these sorts of questions really send a model for a loop. Agreed - especially noticeable with small reasoning models.
I just tried this with a smaller "thinking" model (deepseek distill, running locally) and boy are you right. It keeps flipping between which direction it should turn, second guessing its thought process and then getting sidetracked with a different approach.
Hello! On this topic: Could you please look into making the path ‘next/image’ uses for caching images user definable? Currently I can’t use Google app engine standard because the directory it uses is write protected. The only real solution seems to be custom image providers, which is a drag, so I’m on App Engine Flex spending way more than I probably should :-)
I did try, but in a wrong way (try to SVD quantization error to recover quality (I.e. SVD(W - Q(W)))). The lightbulb moment in this paper is to do SVD on W and then quantize the remaining.
Could you reference any youtube videos, blog posts, etc of people you would personally consider to be _really good_ at prompting? Curious what this looks like.
While I can compare good journalists to extremely great and intuitive journalists, I don't have really any references for this in the prompting realm (except for when the Dall-e Cookbook was circulating around).
Sorry for the late response - but I can't. I don't really follow content creators at a level where I can recall names or even what they are working on. If you browse AI-dominated spaces you'll eventually find people who include AI as part of their workflows and have gotten quite proficient at prompting them to get the results they desire very consistently. Most AI stuff enters into my realm of knowledge via AI Twitter, /r/singularity, /r/stablediffusion, and Github's trending tab. I don't go particularly out of my way to find it otherwise.
/r/stablediffusion used to (less so now) have a lot of workflow posts where people would share how they prompt and adjust the knobs/dials of certain settings and models to make what they make. It's not so different from knowing which knobs/dials to adjust in Apophysis to create interesting fractals and renders. They know what the knobs/dials adjust for their AI tools and so are quite proficient at creating amazing things using them.
People who write "jailbreak" prompts are a kind of example. There is some effort put into preventing people from prompting the models and removing the safeguards - and yet there are always people capable of prompting the model into removing its safeguards. It can be surprisingly difficult to do yourself for recent models and the jailbreak prompts themselves are becoming more complex each time.
For art in particular - knowing a wide range of artist names, names of various styles, how certain mediums will look, as well as mix & matching with various weights for the tokens can get you very interesting results. A site like https://generrated.com/ can be good for that as it gives you a quick baseline of how including certain names will change the style of what you generate. If you're trying to hit a certain aesthetic style it can really help. But even that is a tiny drop in a tiny bucket of what is possible. Sometimes it is less about writing an overly detailed prompt but rather knowing the exact keywords to get the style you're aiming for. Being knowledgeable about art history and famous artists throughout the years will help tremendously over someone with little knowledge. If you can't tell a Picasso from a Monet painting you're going to find generating paintings in a specific style much harder than an art buff.
Still, my pocket is my iPhone pocket.
Until they release something the size of the X or smaller, I’m sticking with my iPhone 13 Mini or eventually going for a Razr style Android.
Every year they release something, I go check it out. My love for Apple dies a bit more.