Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I think the only way to start an AI chip company is to start with the software. The computing in ML is not general purpose computing. 95% of models in use today (including LLMs and image generation) have all their compute and memory accesses statically computable.

> Unfortunately, this advantage is thrown away the minute you have something like CUDA in your stack. Once you are calling in to Turing complete kernels, you can no longer reason about their behavior. You fall back to caching, warp scheduling, and branch prediction.

> tinygrad is a simple framework with a PyTorch like frontend that will take you all the way to the hardware, without allowing terrible Turing completeness to creep in.

I like his thinking here, constraining the software to something less than Turing complete so as to minimize complexity and maximize performance. I hope this approach succeeds as he anticipates.



This 95% of models are statically computable thing really shows how much he is trivializing this problem. I’d be interested to see his SW stack compile MaskRCNN. His ISA is massively under-defined and people will not change their model code to run on this accelerator unless his performance beats cuda significantly and even then they still won’t - usability matters more than performance every time. In the end you need a compiler, and it needs to be compatible with an existing framework which is not trivial at all, since they are written in python.


I agree writing something that needs to be compatible with, say, PyTorch is a significant undertaking, but why is that necessary? I also agree some models like MaskRCNN is not static, and people will not change their model code, but I don't think it matters.

Let's say you want to run LLaMA. LLaMA is a tiny amount of code, say, 300 lines. LLaMA is static. It doesn't matter people will implement LLaMA with PyTorch and not tinygrad, geohot can port LLaMA to tinygrad himself. In fact, he already did, it's in tinygrad repository.

What I am saying is while running all models ever invented is harder than running LLaMA and Stable Diffusion (Stable Diffusion port is also in tinygrad repository), that's not necessarily trivializing the problem. It is noticing that you don't need to solve the full problem, there is enough demand for solving the trivial subset.

While developers will choose usability, users will choose cheap price. If they can run what they want on cheaper hardware, they will. I already have seen this happening: people don't buy NVIDIA to run Leela Chess Zero, they just run it on their hardware. It doesn't matter everyone working on LC0 model is using NVIDIA, that's irrelevant to users. LC0 model is fixed and tiny, people already ported the model to OpenCL, OpenCL port is performant, it runs well on AMD. The same will happen to text and image generation models.


Yeah for inference this is true, there could be a viable subset of models. You’re not going to build a viable business on inference though. It’s super cheap already and plenty of hardware can do it ootb with an existing framework as you’re saying. The $$ for selling chips is in training, and researchers trying new architectures are not going to wait for a port of their favorite model in a custom DSL or learn a new language to start prototyping now. You can port models forever, but that isn’t an ecosystem or a cuda compete. OpenCL + AMD != a from scratch company


Can anyone elaborate on how or why Turing completeness requires these sub optimal patterns?

I recall reading about avoiding Turing completeness for similar reasons to avoid the halting problem.

> Other times these programmers apply the rule of least power—they deliberately use a computer language that is not quite fully Turing-complete. Frequently, these are languages that guarantee all subroutines finish, such as Coq.

https://en.m.wikipedia.org/wiki/Halting_problem


It isn’t that there are suboptimal patterns, it’s just the more expressive your language can be at runtime, the less you can reason about statically. An example is data-dependent control flow. If you can’t reason about what branch your code is going to take statically (without your runtime data) it is harder to generate fast code for it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: