They absolutely did not spend millions to train it. Credible estimates place the...

kristjansson · on March 31, 2023

I don’t think it’s fair to just ignore the capex part of the model training costs. If we take AWS pricing, the 21 days of training for 65B cited in the llama paper would cost 2.6m at reserved prices. While there’s a lot of AWS profit there, it’s a reasonable first approximation of the TCO of that hardware. Even if real TCO is a third, that’s still nearly a million to train 65B, never mind the staff costs.

RandomBK · on March 31, 2023

Plus there's bound to be false starts, reverts, crashes, etc that bump up the actual reproduction cost. Most training cost estimations take an extremely rosy best-case view assuming everything goes smoothly on the first try and no gpu cycles were wasted.

eldenring · on March 31, 2023

Could I get a source for that? Not that I don't believe you, but my napkin math puts the cost of training the 65b parameter model alone at a lot higher than 100k.