What's the security story here?

remram · on May 5, 2024

What's preventing GPU providers from sending wrong results instead of actually running the computation? For example, send the last computed result? Is this something that the renter has to handle by adding their own checks?

In addition to the problem of the renter crashing your machine or reading your password through DMA, of course.

latchkey · on May 5, 2024

What incentive would a GPU provider have to spend time figuring out what result to send for some custom application?

remram · on May 5, 2024

The incentive is huge, if I spend 2 milliseconds sending you your previous results instead of 2 hours running your new computation, I can (pretend to) run way more computations on the same hardware and collect hundreds of time more money.

latchkey · on May 5, 2024

At the risk of being exiled off the platform and earning nothing. Don't forget, there is a bit of KYC with Stripe.

krapht · on May 5, 2024

ID verification before you can host and random audits from gpudeploy.

greenish_shores · on May 5, 2024

NO. That's the worst way to do almost anything on the Internet, and should be considered a last-line defense, if nothing else can be done. Here, it can be. See my comment above.

remram · on May 5, 2024

That's my whole question, do they do random audits, or is it the job of customers to double-check their results for possible attack or compute-theft and report.

greenish_shores · on May 5, 2024

It seems wrong to call it a "job of customers". It's like you wrote a Bitcoin client which didn't verified hashes of transactions, "trusting" everything. Or like serving a website with login feature supporting only HTTP, not HTTPS. It is a very basic feature of whatever software would connect to such services.

remram · on May 5, 2024

So it is the job of the customer to write their own Bitcoin or HTTPS client, in your metaphor.

greenish_shores · on May 5, 2024

Every technology was (very) underdeveloped at some early point in its evolution.

remram · on May 6, 2024

That's what I asked, how developed is it now. Why so defensive?

greenish_shores · on May 6, 2024

I don't know how developed is it now, I'm not associated with the startup shown in any way. It's mainly a question to them. However, in terms of wider industry, in general distributed high-performance GPU(-like) computing "for everyone" is in its infancy. 99% of what was already done up to this point was targeted to people who would both buy and supply power "in bulk", not "in retail". Perhaps with a little exception of several excellent projects like Folding@Home and other @home's.

greenish_shores · on May 5, 2024

Run 1/10,000 - 1/100,000 of computations locally, and also send them as tasks to be send remotely. If compare yields difference, repeat both. After, say, 10 tries, blacklist the provider. Of course it will take a lot more nuances to implement that, but that's the general idea. It's a no-brainer.

remram · on May 5, 2024

Sounds like a lot of work that I would expect the paid service to help with.

greenish_shores · on May 13, 2024

Yeah, it's "they" who should do that, of course.

greenish_shores · on May 5, 2024

Linux supports IOMMU on most platforms.

remram · on May 5, 2024

I fail to see how this relates. If you can't trust the provider, why does it matter whether they say they have IOMMU or not.

greenish_shores · on May 5, 2024

It relates to the "get your passwords over DMA part".

lschneider · on May 5, 2024

At the moment, we manually verify operators and are currently onboarding some tier-4 operators. Down the line, we'll have a 2-tier system where you can choose whether you want a verified machine or not. From the operator's perspective, everything runs inside Docker, configured with security best-practices.

lolinder · on May 5, 2024

I've always understood that containers are not proper sandboxes and shouldn't be used for containing untrusted code, no matter the best practices used. Has this changed in recent years? Do you have documentation for what sorts of best practices you're using and why they are sufficient for executing untrusted code?

gavindean90 · on May 5, 2024

You are correct from my knowledge. I would expect that if the container is set to not run as root you might be able to enforce fine meaningful security but I’d still run it in a VM if feasible.

janosd · on May 5, 2024

Having done a little bit of work in the area[1], I think you should publicly document exactly what those best-practices are. Are the workloads running in a networkless container? Do you limit IO? Do you limit disk usage? Answering these in detail would help you gain customer trust on both sides.

[1]: https://containerssh.io/v0.5/reference/docker/#securing-dock...

flaminHotSpeedo · on May 5, 2024

So you don't have real security for operators, is what you're saying.

Containers are not, and will never be, a secure isolation boundary.

htrp · on May 4, 2024

probably very basic... so don't run it on anything that has your own data on it (if you're an AI startup, definitely don't run it on your research cluster).

Traubenfuchs · on May 5, 2024

> definitely don't run it on your research cluster

...what‘s the threat, actually? GPU time sellers stealing your secret sauce?

lolinder · on May 5, 2024

I think they mean don't lease out your research team's GPUs and allow random people to run untrusted code on your cluster, lest they figure out a way to break out of any sandboxing the software has in place and get loose in your network. The company's current answer to that concern is "everything runs inside Docker, configured with security best-practices", which is less than inspiring.

https://news.ycombinator.com/item?id=40261591