Maybe you want to conduct experiments that the cloud API doesn't allow for. Perh...

tarruda · on July 16, 2024

It is not like strong models are at a point where you can 100% trust their output. It is always necessary to review LLM generated text before using it.

I'd rather have a weaker model which I can always rely on being available than a strong model which is hosted by a third party service that can be shut down at any time.

Aurornis · on July 16, 2024

> I'd rather have a weaker model which I can always rely on being available than a strong model which is hosted by a third party service that can be shut down at any time.

Every LLM project I’ve worked with has an abstraction layer for calling hosted LLMs. It’s trivial to implement another adapter to call a different LLM. It’s often does as a fallback, failover strategy.

There are also services that will merge different providers into a unified API call if you don’t want to handle the complexity on the client.

It’s really not a problem.

PostOnce · on July 16, 2024

Suppose you live outside of America and the supermajority of LLM companies are American. You want to ask a question about whisky distillation or abortion or anything else that's legal in your jurisdiction but not in the US, but the LLM won't answer.

You've got a plethora of cloud providers, all of them aligned to a foreign country's laws and customs.

If you can choose between Anthropic, OpenAI, Google, and some others... well, that's really not a choice at all. They're all in California. What good does that do an Austrian or an Australian?

jacooper · on July 17, 2024

Personally I found the biggest problem for local models is the lack of integrationa, it can't search the web, it can't use wolfram alpha for math, etc

LLMs are great as routers, only rarely are they good doing something on their own.

gtirloni · on July 16, 2024

> eventually your edge hardware is going to be able to infer a lot faster than the 50ms+ per call to the cloud.

This is interesting. Is that based on any upcoming technology improvement already in the works?

a_t48 · on July 16, 2024

GP is likely referring to network latency here. There's a tradeoff between smaller GPUs/etc at home that have no latency to use and beefier hardware in the cloud that have a minimum latency to use.

yjftsjthsd-h · on July 16, 2024

Sure, but if the model takes multiple seconds to execute, then even 100 milliseconds of network latency seems more or less irrelevant

datameta · on July 16, 2024

Comms is also the greatest battery drain for a remote edge system. Local inference can allow for longer operation, or operation with no network infra.

sharpshadow · on July 16, 2024

Excellent points and being able to use available hardware in unison is amazing and I guess we are not far away from botnets utilising this kind of technology like they did with mining coins.

neop1x · on July 17, 2024

Also hosted models are often censored and refuse talking about various topics.

jumpCastle · on July 16, 2024

Aren't services like runpod solve half of these concerns?