The training costs are amortized over inference. More lifetime queries means better efficiency.
Individual inferences are extremely low impact. Additionally it will be almost impossible to assess the net effect due to the complexity of the downstream interactions.
At 40M 700W GPU hours 160 million queries gets you 175Wh per query. That's less than the energy required to boil a pot of pasta. This is merely an upper bound - it's near certain that many times more queries will be run over the life of the model.
Individual inferences are extremely low impact. Additionally it will be almost impossible to assess the net effect due to the complexity of the downstream interactions.
At 40M 700W GPU hours 160 million queries gets you 175Wh per query. That's less than the energy required to boil a pot of pasta. This is merely an upper bound - it's near certain that many times more queries will be run over the life of the model.