> This (along with batching) is why large local models are a dumb and wasteful idea if you're not serving them at enterprise scale.
Local models are never a dumb idea. The only time it's dumb to use them in an enterprise is if the infra is Mac Studio with M3 Ultra because pp time is terrible.
Local models are never a dumb idea. The only time it's dumb to use them in an enterprise is if the infra is Mac Studio with M3 Ultra because pp time is terrible.