When your agent performs 20 tasks saving seconds here and there becomes a very big deal. I cannot even begin to describe how much time we've spent on optimising code paths to make the overall execution fast.
Last week I was on a call with a customer. They where running OpenAI side-by-side with our solution. I was pleased that we managed to fulfil the request under a minute while OpenAI took 4.5 minutes.
The LLM is not the biggest contributor to latency in my opinion.
Thanks! While I agree with you on "saving seconds" and overall latency argument, according to my understanding, most agentic use cases are asynchronous and VM boot up time may just be a tiny fraction of overall task execution time (e.g., deep research and similar long running tasks in the background).
Last week I was on a call with a customer. They where running OpenAI side-by-side with our solution. I was pleased that we managed to fulfil the request under a minute while OpenAI took 4.5 minutes.
The LLM is not the biggest contributor to latency in my opinion.