When I grew up, the media in my country kept telling us how great the geek culture in the US was, how deep down the stack the geeks were willing to go, and both adults and us kids were left in awe. The entire nation, for what I could tell, routinely reflected why we couldn't be like the US: educating and nurturing generations of geeks to be the best engineers and scientists in the world.
Well, it was quite a reverse culture shock after I moved to the US. I definitely didn't know that "teacher's pet" was a thing, or my coworker, a brilliant engineer who went to a highly reputed public school, was chased off his school bus simply because he used some poetic words, or geeks were not that respected in schools, or a mile wide and an inch deep with great leadership is what the US people revered. In the meantime, I guess other countries more or less picked up the baton of the US culture, and grew their own geeks.
Did you not see American movies? I'm also from a different country, but this part of the American culture is very much front and center in its exports, IMO.
> Did you not see American movies? I'm also from a different country, but this part of the American culture is very much front and center in its exports, IMO.
There exist quite a lot of US-American movies. I would claim that those who are less deeply centered around US-specific cultural traits are typically much more revered in other countries.
There are, of course, exceptions from this rule (for example "The Simpsons", which excessively satirizes the USA's popular culture, but is nevertheless loved in many other countries), but I do think that the general rule of thumb does hold.
In this sense: even if you watch US-American movies, it is rather easy to mostly ignore those who strongly show US-specific cultural traits and habits, in particular if these are considered to be annoying in the other country.
I did watch American movies but somehow I also came away with the same conclusions as a child - I think I was influenced by documentaries that seemed to show such intelligence and hard work in the realm of the sciences and technology and assumed the whole culture supported it - films were supposed to be fiction after all.
as an adult across the pond, I'm very disappointed by the US.
Not really. Our English education was so bad that I could barely understand spoken English and I knew fewer than 5000 words when graduating high school. Ironically, I did fabulously well in English exams, which said a lot about our English education back then.
It comes in waves. In the 90s there was a monopoly so rapacious that the Department of Justice had to very nearly break it up before it strangled the Web in its crib. Grift and nepotism and cronyism and distorted markets were the norm then too.
This is the first time it’s been so bad since, but I’m optimistic: we’re actually in a better position now because serious foreign tech dynamos don’t give a fuck about American mafia nepotism, we can’t just keep this in the family.
I don’t know if this DeepSeek App Store thing will be the match that lights this thing up, but the grass is very tall and very, very dry.
Large portions of the world are anti-intellectual, at the same time intellectuals are often much worse than the average person and frequently do deserve scorn as a class of people in society.
China is not anti-intellectual in the same way. I remember watching Xi’s new year speech a few years ago and he was highlighting specific scientific achievements, talking about astronauts, it was very different from a US presidential speech
Well Biden can't really give a speech due to being undeniably old to the point of incoherence (what led to him dropping his candidacy). Trump did Operation Warpspeed and created Space Force, and openly talked about that, he also openly supports Elon Musk running his own scientific enterprises. So I'm not sure what the argument is for America not talking about its own scientific achievements.
That's just it: American tech companies aren't staffed by Americans anymore. Liang Wenfang, the founder of DeepSeek, made the point that he values developing domestic Chinese talent over importing foreign experts. It seems to have worked.
"It seems to have worked" I am sure he's a great manager, but whats really working for him is crazy high number of graduates and PHDs in China who are unemployed or underemployed
That would sound a lot like the languishing native STEM talent in the US if it weren't also for the fact that most of the DeepSeek team doesn't have PhDs.
I should clarify: the US STEM workforce has historically been very white and very male. That's the body of talent that has been languishing, and the data proves it.
> That's a complete non sequitur. Why would it make sense for new hires to be overwhelmingly disproportionately oversampled from minority groups according to current demographics?
Because that is not what this stat is measuring. They are doing (demos of new employees - demos of retiring employees), not just demos of new employees. Hence why it is a misleading statistic. They word it very carefully to not say 94% of new hires are minorities.
source: the "The Analysis" section of your own source or if you needed it stated more explicitly [0]:
> Before judging whether that’s impressive or excessive or some other adjective, it’s helpful to know what the available pool of new workers looked like. Or, more precisely, what the pool of new workers minus the pool of departing workers looked like. Net change is what we’re able to see.
i'm sure this will have precisely 0 impact on your worldview though
since your reply died - your absolutely wrong about massive oversampling, you can get 95% with basic assumptions about 60% white, 40% minority (matching actual proportions in the population - and actually an underestimate when you consider that new hires are young) and the retiring fortune 500 population (70-90% white).
Haha, what a shoddy headline. "Bypasses" and "industry-standard" have no place here.
CUDA is not an industry standard. Vulcan is an industry standard. They did not bypass CUDA... that's like saying if I use Vulcan I'm bypassing OpenGL. PTX is an alternative low level API provided by Nvidia because of how awful CUDA is for high performance code.
What DeepSeek wrote could only have either been written in PTX or Vulcan.
Any other company could have done this, and low latency traders on Wall Street that use Nvidia write their stuff in PTX for obvious reasons.
OpenAI, was, is, and always will be, absolutely incompetent when it comes to using their hardware effectively... and they're no different than any other company. Reading is not a goddamned super power! Just read the docs!
You can ignore it, the commenter clearly has no idea what they are talking about. PTX is literally the instruction set that Cuda, Vulcan and OpenGL compile to on Nvidia cards in the end. It's assembly for GPUs. And it's infinitely harder to work with. Go to an average technical university and you'll probably find quite a few people who can write Cuda (or OpenGL or Vulcan for that matter). But it would be very surprising if you can find even a single person that can comfortably write PTX.
"Compile to" isn't exactly the correct phrase either.
PTX is not the IL used by Nvidia's drivers, but does compile directly to it with less slop involved. If you had said "PTX's instructions are analogous to writing assembly for CPUs or any other GPUs (ala Clang's AMDGPU target)", that would have probably been the better way.
Arguably, PTX is closer to being the SPIR-V part of their stack (more than just an assembler compiler, but similar in concept). None of Nvidia's tools really ever line up with good analogies with the outside world, the curse of Nvidia's NIH syndrome.
Generally, you're not going to be writing all of your code in PTX, but I find it wild you think people going to "an average technical university" would be unable to use it for the parts they need it for. That says more about you than it does them.
All of Nvidia's docs for this are online, it isn't that hard. Have you tried?
>PTX's instructions are analogous to writing assembly for CPUs
How else would you have understood it? At this level it's literally just pedantics. In the same way you can say C doesn't technically compile to assembly for CPUs. The point is that it's the lower abstraction level that is still (more or less) human readable. But just like in CUDA, you may want to write parts of your code in it if you want to benefit from things that the higher level language doesn't expose. The terminology might seem different, but in practice it is pretty analogous.
This is somewhat untrue as well. HFT because constrained similarly have to optimize on this level akin to HFT crypto doing optimizations not within solidity, nor yul but on opcode in huff. That’s the issue with these big tech companies. Just endless budget and throw bad code into larger distributed clusters to overcompensate.
I wonder if you vould you point me to concrete examples where people write PTX rather than CUDA? I'm asking because I just learned CUDA since it's so much faster than Python!
For various micro-bench reasons I wanted to use a global clock instead of an SM-local one, and I believe this was needed.
Also note that even CUDA has "lower level"-like operations, e.g. warp primitives. PTX itself is super easy to embed in it like asm.
There isn't a lot of easily accessible examples outside of the corporate world.
Open source authors typically shy away from Nvidia closed source APIs, and PTX is tied to how Nvidia hardware works, so you won't see it implemented for other hardware.
To do what Deepseek did, but didn't want to waste your time and money with Nvidia, you'd use Vulkan. Theres more Vulkan in the world than CUDA.
Not in HFT, but I guess maybe for being very fast running optimization solvers and forecast models etc? Essentially compute models for ultimately driving market decisions based on lots of input data
We do a lot of forecasting and solvers where I am, just run them on CPUs though.. but maybe if you’re wanting to compete on speed you would?
> Optimization solvers usually don't benefit from GPUs. I think it's because it's sparse matrices and a sequential series of pivots.
This depends a lot on the problem and the algorithm that is used. For example interior point methods are clearly better suited to be running on GPUs than the primal or dual simplex algorithm.
What it does show is that CUDA leaves serious performance optimization on the table despite its gigantic code base. Using compression to reduce memory bandwidth is a well known trick in quantization, and in other scenarios since forever. There has been little competitive pressure on Nvidia to go further since their software stack leaves the competition in the dust. This time, they may actually need to step up their efforts, due to customer pressure. Good times!
There are already open stacks out there that help. The problem is that Nvidia provide a full stack option: chips, networking, and software, whereas AMD only provide the chips, you then need another company's networking, and then you need to plug together open source software.
I'd also bet that if you buy enough of the Nvidia chips, they'll probably send in a bunch of engineers to get everything working with their full stack. AMD won't be able to do that in the same way because they're not vertically integrated.
Writing a few intrinsics where necessary is not really comparable to the work required do reimplement something like CUDA on AMD (or equivalently deal with ROCm)
This is ridiculous. Since the actual training code for DeepSeek is _not_ public, this is a based only on the technical report, which mentions PTX one (1) time in §3.2.2 Efficient Implementation of Cross-Node All-to-All Communication:
> Specifically, we employ customized PTX (Parallel Thread Execution) instructions and auto-tune the communication chunk size, which significantly reduces the use of the L2 cache and the interference to other SMs.
So they have some intrinsic in some part of their training framework. That's it.
It all feels like an attempt to prevent replication to further tank the market. Not necessarily the technical details, but the reporting thereo, which was spammed all over Reddit and HN
This whole episode is weird. I can’t tell how much of the popular reporting is misinformed and how much has been disinformed. R1 (and sorta V3) are clearly progress, but are definitely not step-function improvements to prior SOTAs.
IIRC this is still relatively hardware agnostic. Can you actually get very far by doing this ? From a quick perusal, DeepSeek also uses Triton in the codebase.
tldr: they wrote in low level code instead of using a higher level framework like their competitors have been doing so they were able to hand tune the performance.
This gives them a few months head start before meta and Google start doing the same thing.
Well, it was quite a reverse culture shock after I moved to the US. I definitely didn't know that "teacher's pet" was a thing, or my coworker, a brilliant engineer who went to a highly reputed public school, was chased off his school bus simply because he used some poetic words, or geeks were not that respected in schools, or a mile wide and an inch deep with great leadership is what the US people revered. In the meantime, I guess other countries more or less picked up the baton of the US culture, and grew their own geeks.