Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

have you seen this: https://chatjimmy.ai/

It's quite impressive what purpose build inference can/will do once everyone stops trying to become kind of the best model.

 help



Wow impressive. What's the story with this?

It's a tech demonstrator for a company that turns models into custom silicon for fast inference. In this case llama3.1-8b https://taalas.com/products/

Is this an ASIC? Or FPGA? Or something even more exotic?

I’m guessing it’s some form of ASIC because I can’t imagine crafting the logic of Llama on silicon is a very quick or easy job. Not that doing it on an ASIC is a piece of cake either.


An ASIC is custom silicon, no?

Anyways, I found this article discussing it a bit more: https://www.eetimes.com/taalas-specializes-to-extremes-for-e...

"Taalas is borrowing some ideas from the structured ASICs of the early 2000s to make its hardwired model-specific chips. Structured ASICs used gate arrays and hardened IP blocks, changing only the interconnect layers to adapt the chip to a specific workload. At the time, this was seen as a more cost-effective alternative to a full-custom ASIC that was more performant than an FPGA."

"Taalas changes only two masks to customize a chip for a specific model, but the two masks can change both model weights and dataflow through the chip. On the HC1, the model and its weights are stored on the chip using a mask-ROM-based recall fabric paired with a (programmable) SRAM, which can be used to hold fine-tuned weights and/or the KV cache. Future generations of chips may split the SRAM onto a separate chip, meaning they could be denser than the HC1."


Taalas hardware implementation of Llama 3.1 8B They claim 16k tok/s vs Cerbras at 2k. https://taalas.com/products/



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: