Tell HN: I run a Google Colab clone in my Basement, and it's great for AI :)

brianhorakh · on April 4, 2022

As an autodidactic, I use a similar approach, curious who else is similar?

- my box "sm3llslik3s0ld3r" is a love hate passion project, but surpringly similar in hardware to yours!

I originally made "sm3lly" (my box) after screwing up and leaving a high priced azure gpu cloud instance running on an unrestricted pay-as-you-go billing scope, that was filtered from my view (separate az-ad).. This was for a almost a week before I caught it resulting in a $7k usd type 3 fun "surprise" hit to my personal r&d budget.

I suspect its a small niche group: those who are doing ai on our own desktop hardware though. I assume most ai persons don't have the systems expertise to do/manage this or they work for a company/school which grants them the resources and don't bring "work" home with them.

Fwiw I also bought all "sm3lly" hardware on taobao from shenzhen directly, so it's not identical to yours but I was picking up pre-loved bitcoin mining GPus on the very cheap side after chinas crypto purge.

fxtentacle · on April 4, 2022

I'm currently waiting for the Ethereum vote to flood the market with used GPUs and then hopefully new ones will become more affordable, too.

And yes, I've also been burned multiple times by runaway AWS costs. Especially since you keep paying for CPU time for instances that are shut down but not fully deleted. That always struck me as odd and most other cloud providers handle that differently.

And yes, I would also assume that most AI persons don't have the systems expertise, which is kinda sad. But then again, most AI tutorials are buggy as hell, presumably for the same reason. For example https://huggingface.co/blog/wav2vec2-with-ngram

(yes, the people building the Huggingface AI toolkit) says:

"The 5-gram correctly includes a "Unknown" or <unk>, as well as a begin-of-sentence, <s> token, but no end-of-sentence, </s> token. This sadly has to be corrected currently after the build."

And then follows a cringe-worthy script to "fix" the missing </s> by rewriting a 100+ GB file in python. But actually, they are just using KenLM the wrong way. Once you put each sentence onto its own line - like the manual says - it produces the </s> just fine. And now I wonder if I still want to trust their python packages to do the right thing...

mherrmann · on April 5, 2022

Re unexpected AWS costs: This happened to me too. I hate AWS for it. The best solution I've found is to set up a "Budget" in the AWS console that alerts me by email when my account spends more than I expect. Just last week this caught a case early that would have likely resulted in an unnecessary 4-figure bill.

f0e4c2f7 · on April 4, 2022

I was going to suggest that you could post the code until I got to the bottom and you already had! Very cool project. I love cloud tooling but in some cases a local machine is something like 1000s of times cheaper.

Best to mix and match and you get outrageous amounts of compute and efficiency for relatively little money.

This reminds me of a similar project that might interest you.

GPT-J is an open source alternative to GPT-3 made by a former Googler. I've read it works suprisingly well. People claim close to GPT-3.

To run it people talk about using services like vast.ai. Don't get me wrong it's cool that exists, but I admit I've been tempted to build out a dedicated machine just to run this model in my basement. You need 24 GB of video memory to run it though so you're probably looking at around $4000 to build it (depending on graphics card prices).

But! For that price you could have your own unmetered GPT-3 knock off in your basement, thereafter also for the price of electricity.

fxtentacle · on April 4, 2022

Thanks for mentioning GPT-J :)

I didn't know about them before. That looks very usable and I'm truly astonished that someone was generous enough to pay for training such a large model and is now giving it away for free.

As for the memory requirements, I'm pretty sure one could split the model into different stages which can then execute one after another => You only need 8GB with 3 slices. I'm currently using something similar to squeeze a ~30GB wav2vec2 into a 12 GB RAM 1080 TI.

f0e4c2f7 · on April 5, 2022

Oh wow - thanks for that tip I hadn't considered it. I actually have a lower end 3000 series card I'll give that a try. Thanks!

justinlloyd · on April 4, 2022

Very cool, stealing this.

Have a dual RTX A5000 in a dual Xeon with 512GB and 30TB of SSD RAID storage, backed up by another couple of hundred TB of spinning rust in my closet. Run Windows and Ubuntu in VMs that share the GPUs, and can spin up new VMs with shared GPUs on a whim. Self-hosted is the way to go if you either have the h/w on hand or can get over the initial buy-in price. I can access the machine remotely via a variety of means, and can even stream directly to my laptop.

Plus I get to play games on it! :-)