OP here, I didn't write the post, but found it interesting and posted it here.
> So i understand correctly, they spend more even thought They can, optimize and spend less
This is what I understand as well, we could utilise the hw better today and make things more efficient but instead we are focusing on making more. TBH I think both need to happen, money should be spent to make better more performant hw and at the same time squeeze any performance we can from what we already have.
I believe the author is making the point that the companies spending all this money on hardware aren't concerned at all with how the hardware is actually used.
Optimization isn't even being considered because its the total cost spent on hardware that is the goal, not output from the hardware.
I slightly have trouble believing that Mr “Stop wasting tokens by saying please to LLMs” Altman is not considering how his models can be optimized. I suppose the real question is how accurate are the utilization numbers in the article.
I stopped paying attention to any specific thing Sam Altman says a while ago. I've seen too many examples of interviews or off the cuff interactions that make me think very little of him personally.
For example, I could see him saying not to waste tokens on "please" simply because he thinks that is a stupid way to use the LLM. I.e. a judgement on anyone that would say please, not a concern over token use in his data centers.
But can that really be the case? It takes a long time to train and tune the models, any small, even low % digit of squeezing more implies much faster iteration.