*> I think the only way to start an AI chip company is to start with the softwar...

mlazos · on May 25, 2023

This 95% of models are statically computable thing really shows how much he is trivializing this problem. I’d be interested to see his SW stack compile MaskRCNN. His ISA is massively under-defined and people will not change their model code to run on this accelerator unless his performance beats cuda significantly and even then they still won’t - usability matters more than performance every time. In the end you need a compiler, and it needs to be compatible with an existing framework which is not trivial at all, since they are written in python.

sanxiyn · on May 25, 2023

I agree writing something that needs to be compatible with, say, PyTorch is a significant undertaking, but why is that necessary? I also agree some models like MaskRCNN is not static, and people will not change their model code, but I don't think it matters.

Let's say you want to run LLaMA. LLaMA is a tiny amount of code, say, 300 lines. LLaMA is static. It doesn't matter people will implement LLaMA with PyTorch and not tinygrad, geohot can port LLaMA to tinygrad himself. In fact, he already did, it's in tinygrad repository.

What I am saying is while running all models ever invented is harder than running LLaMA and Stable Diffusion (Stable Diffusion port is also in tinygrad repository), that's not necessarily trivializing the problem. It is noticing that you don't need to solve the full problem, there is enough demand for solving the trivial subset.

While developers will choose usability, users will choose cheap price. If they can run what they want on cheaper hardware, they will. I already have seen this happening: people don't buy NVIDIA to run Leela Chess Zero, they just run it on their hardware. It doesn't matter everyone working on LC0 model is using NVIDIA, that's irrelevant to users. LC0 model is fixed and tiny, people already ported the model to OpenCL, OpenCL port is performant, it runs well on AMD. The same will happen to text and image generation models.

mlazos · on May 26, 2023

Yeah for inference this is true, there could be a viable subset of models. You’re not going to build a viable business on inference though. It’s super cheap already and plenty of hardware can do it ootb with an existing framework as you’re saying. The $$ for selling chips is in training, and researchers trying new architectures are not going to wait for a port of their favorite model in a custom DSL or learn a new language to start prototyping now. You can port models forever, but that isn’t an ecosystem or a cuda compete. OpenCL + AMD != a from scratch company

nonethewiser · on May 25, 2023

Can anyone elaborate on how or why Turing completeness requires these sub optimal patterns?

I recall reading about avoiding Turing completeness for similar reasons to avoid the halting problem.

> Other times these programmers apply the rule of least power—they deliberately use a computer language that is not quite fully Turing-complete. Frequently, these are languages that guarantee all subroutines finish, such as Coq.

https://en.m.wikipedia.org/wiki/Halting_problem

mlazos · on May 25, 2023

It isn’t that there are suboptimal patterns, it’s just the more expressive your language can be at runtime, the less you can reason about statically. An example is data-dependent control flow. If you can’t reason about what branch your code is going to take statically (without your runtime data) it is harder to generate fast code for it.