Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Maybe you want to conduct experiments that the cloud API doesn't allow for.

Perhaps you'd like to plug it into a toolchain that runs faster than API calls can be passed over the network? -- eventually your edge hardware is going to be able to infer a lot faster than the 50ms+ per call to the cloud.

Maybe you would like to prevent the monopolists from gaining sole control of what may be the most impactful technology of the century.

Or perhaps you don't want to share your data with Microsoft & Other Evils (formerly known as dont be evil).

You might just like to work offline. Whole towns go offline, sometimes for days, just because of bad weather. Nevermind war and infrastructure crises.

Or possibly you don't like that The Cloud model has a fervent, unshakeable belief in the propaganda of its masters. Maybe that propaganda will change one day, and not in your favor. Maybe you'd like to avoid that.

There are many more reasons in the possibility space than my limited imagination allows for.



It is not like strong models are at a point where you can 100% trust their output. It is always necessary to review LLM generated text before using it.

I'd rather have a weaker model which I can always rely on being available than a strong model which is hosted by a third party service that can be shut down at any time.


> I'd rather have a weaker model which I can always rely on being available than a strong model which is hosted by a third party service that can be shut down at any time.

Every LLM project I’ve worked with has an abstraction layer for calling hosted LLMs. It’s trivial to implement another adapter to call a different LLM. It’s often does as a fallback, failover strategy.

There are also services that will merge different providers into a unified API call if you don’t want to handle the complexity on the client.

It’s really not a problem.


Suppose you live outside of America and the supermajority of LLM companies are American. You want to ask a question about whisky distillation or abortion or anything else that's legal in your jurisdiction but not in the US, but the LLM won't answer.

You've got a plethora of cloud providers, all of them aligned to a foreign country's laws and customs.

If you can choose between Anthropic, OpenAI, Google, and some others... well, that's really not a choice at all. They're all in California. What good does that do an Austrian or an Australian?


Personally I found the biggest problem for local models is the lack of integrationa, it can't search the web, it can't use wolfram alpha for math, etc

LLMs are great as routers, only rarely are they good doing something on their own.


> eventually your edge hardware is going to be able to infer a lot faster than the 50ms+ per call to the cloud.

This is interesting. Is that based on any upcoming technology improvement already in the works?


GP is likely referring to network latency here. There's a tradeoff between smaller GPUs/etc at home that have no latency to use and beefier hardware in the cloud that have a minimum latency to use.


Sure, but if the model takes multiple seconds to execute, then even 100 milliseconds of network latency seems more or less irrelevant


Comms is also the greatest battery drain for a remote edge system. Local inference can allow for longer operation, or operation with no network infra.


Excellent points and being able to use available hardware in unison is amazing and I guess we are not far away from botnets utilising this kind of technology like they did with mining coins.


Also hosted models are often censored and refuse talking about various topics.


Aren't services like runpod solve half of these concerns?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: