What's preventing GPU providers from sending wrong results instead of actually running the computation? For example, send the last computed result? Is this something that the renter has to handle by adding their own checks?
In addition to the problem of the renter crashing your machine or reading your password through DMA, of course.
The incentive is huge, if I spend 2 milliseconds sending you your previous results instead of 2 hours running your new computation, I can (pretend to) run way more computations on the same hardware and collect hundreds of time more money.
NO. That's the worst way to do almost anything on the Internet, and should be considered a last-line defense, if nothing else can be done. Here, it can be. See my comment above.
That's my whole question, do they do random audits, or is it the job of customers to double-check their results for possible attack or compute-theft and report.
It seems wrong to call it a "job of customers". It's like you wrote a Bitcoin client which didn't verified hashes of transactions, "trusting" everything. Or like serving a website with login feature supporting only HTTP, not HTTPS. It is a very basic feature of whatever software would connect to such services.
I don't know how developed is it now, I'm not associated with the startup shown in any way. It's mainly a question to them. However, in terms of wider industry, in general distributed high-performance GPU(-like) computing "for everyone" is in its infancy. 99% of what was already done up to this point was targeted to people who would both buy and supply power "in bulk", not "in retail". Perhaps with a little exception of several excellent projects like Folding@Home and other @home's.
Run 1/10,000 - 1/100,000 of computations locally, and also send them as tasks to be send remotely. If compare yields difference, repeat both. After, say, 10 tries, blacklist the provider. Of course it will take a lot more nuances to implement that, but that's the general idea. It's a no-brainer.
At the moment, we manually verify operators and are currently onboarding some tier-4 operators. Down the line, we'll have a 2-tier system where you can choose whether you want a verified machine or not. From the operator's perspective, everything runs inside Docker, configured with security best-practices.
I've always understood that containers are not proper sandboxes and shouldn't be used for containing untrusted code, no matter the best practices used. Has this changed in recent years? Do you have documentation for what sorts of best practices you're using and why they are sufficient for executing untrusted code?
You are correct from my knowledge. I would expect that if the container is set to not run as root you might be able to enforce fine meaningful security but I’d still run it in a VM if feasible.
Having done a little bit of work in the area[1], I think you should publicly document exactly what those best-practices are. Are the workloads running in a networkless container? Do you limit IO? Do you limit disk usage? Answering these in detail would help you gain customer trust on both sides.
probably very basic... so don't run it on anything that has your own data on it (if you're an AI startup, definitely don't run it on your research cluster).
I think they mean don't lease out your research team's GPUs and allow random people to run untrusted code on your cluster, lest they figure out a way to break out of any sandboxing the software has in place and get loose in your network. The company's current answer to that concern is "everything runs inside Docker, configured with security best-practices", which is less than inspiring.