> The Nemotron-4 340B family includes base, instruct and reward models that form a pipeline to generate synthetic data used for training and refining LLMs.
I feel like everyone is missing this from the announcement. They explicitly are releasing this to help generate synthetic training data. Most big models and APIs have clauses that ban its use to improve other models. Sure it maybe can compete with other big commercial models at normal tasks, but this would be a huge opportunity for ML labs and startups to expand training data of smaller models.
Nvidia must see a limit to the growth of new models (and new demand for training with their GPUs) based on the availability of training data, so they're seeking to provide a tool to bypass those restrictions.
> Most big models and APIs have clauses that ban its use to improve other models.
I will never get over the gall of anything and everything being deemed fair game to use as training data for a model, except you're not allowed to use the output of a model to train your own model without permission, because model output has some kind of exclusive super-copyright apparently.
It's likely unenforceable since there is no copyright and copying it to someone else not in the contract trivially bypasses it. Still hypocritical nonsense though.
>They explicitly are releasing this to help generate synthetic training data
Synthetic training data is basically free money for NVidia; there's only a fixed amount of high-quality original data around, but there's a potential for essentially infinite synthetic data, and more data means more training hours means more GPU demand.
I feel like everyone is missing this from the announcement. They explicitly are releasing this to help generate synthetic training data. Most big models and APIs have clauses that ban its use to improve other models. Sure it maybe can compete with other big commercial models at normal tasks, but this would be a huge opportunity for ML labs and startups to expand training data of smaller models.
Nvidia must see a limit to the growth of new models (and new demand for training with their GPUs) based on the availability of training data, so they're seeking to provide a tool to bypass those restrictions.
All for the low price of 2x A100s...