Certainly the undisputed winners will be the very few firms with enough engineer...

herval · on Oct 12, 2023

This was definitely a theory that made people burn tons of money on the past couple of years, but I don’t think it holds water. These models are getting obsolete so fast, and there’s so many open ones, I doubt any one’s privately trained model can stay relevant for long

flir · on Oct 13, 2023

The data is the moat.

(If you can train your internally deployed LLM on data none of your competitors have, that's an advantage).

visarga · on Oct 13, 2023

It's not anymore. If the model is publicly accessible, its skills can be distilled by performing some API calls and recording input-output pairs. This scheme works so well it has become the main mode to prepare data for small models. Model skills leak.

flir · on Oct 13, 2023

I agree, publicly deployed models seem to be easy to train from. I did say "internally deployed LLM" though. agentcoops said "...where the models in question increase the productivity of workers in their non-ai-related profit centers" above, that's the bit I was thinking about. I think private models, either trained from scratch or fine-tuned, are going to be a big deal though they won't make the PR splash that public models make.

nvm0n2 · on Oct 13, 2023

The conclusion for that seems to be that it just yields a model that has the surface look and feel of GPT3 or 4 but without the depth, so the experience quickly becomes unsatisfactory once you go out of the fine tuning dataset.

lmm · on Oct 13, 2023

You may not need to train a model to make use of your data though. Maybe a cheap fine tune would work just as well. Maybe just having the data well indexed and/or part of the prompt context is good enough.

shon · on Oct 13, 2023

In that case, X.AI, powered by X/Twitter/Tesla data and possibly Facebook (both closed, and somewhat hard to crawl inside) have the largest moat.

edmundsauto · on Oct 13, 2023

I don’t think they necessarily will be allowed to train on their data unless they get explicit permission. They will try, but the way I see privacy revaluations is that users will have to authorize specific uses of their data and not be surprised by any application.

This could be one of the more interesting privacy fights of the next decade.

I’m sure there are easy cynical takes about how they will just shrink wrap the EULA, and maybe they will. But in a good privacy environment, users should never be surprised and have control over how their data is used. And I think we’ve made some progress there.

htrp · on Oct 13, 2023

> I don’t think they necessarily will be allowed to train on their data unless they get explicit permission. They will try, but the way I see privacy revaluations is that users will have to authorize specific uses of their data and not be surprised by any application.

If there's one company that I don't think cares about user permissions or the law, it'd be Twitter.

The EU officially warned Elon about DSA fines and the response was less than serious.

https://www.cnn.com/2023/10/10/tech/x-europe-israel-misinfor...

thegasman · on Oct 13, 2023

China probably has the most comprehensive data on its users from a surveillance perspective

frozenport · on Oct 13, 2023

Idk. These were trained on pretty public things like Wikipedia.