It dawned on me that in web software, people talk about req/s from two entirely different perspectives and it's borderline fraud:
req/s from localhost to localhost, and req/s from the Internet to any user.
The latter is actually interesting. People saying you can get 10k req/s from Node.js is stupid. You're not actually getting that on say, a single low-end instance over the Internet, which is what most developers are actually going to do.
Instead, you'll get two orders of magnitude fewer requests per second.
What Amazon is talking about here is most likely non-synthetic, real-world 80k requests per second. Which is actually a decent job.
> People saying you can get 10k req/s from Node.js is stupid.
No, it's not, for exactly the reason you state:
> You're not actually getting that on say, a single low-end instance over the Internet
Some languages are, of course, more efficient, but it doesn't matter - you can get very good performance out of any language/runtime - it's all about your architecture and infrastructure.
Where are you saying the difference would exist? I haven't seen local network tests be worse than localhost (usually it's better since the client uses a lot of CPU itself). Why would Internet latency matter? TCP ACKs should be done by the loadbalancing appliance, so they'll be low-latency for the application. TLS handshakes should also be offloaded to the appliance.
From what I've measured, code I've written performs around the same in production as it ran locally given similar hardware. If you're deploying to a VM with 3000 IOPS and 1/2 a CPU core, obviously it's going to run like garbage. If you wouldn't run your business on a raspberry pi 3, you probably shouldn't be running it on an AWS xlarge instance either.
These aren't requests for TCP ACKs to establish a session, nor even requests for a simple static resource. They're requests for the live status of an inventory of physical goods spread across thousands of distribution centers on six continents that are themselves gaining and losing thousands of products per second. A system that can return a reasonably accurate view of that state 80k times per second is not the same thing as a system that can send 80k http responses with "Hello ${NAME}" per second.
I'm not talking about just establishing a session. My questioning there was just why Internet vs. local would be different. On a local network, I've gotten 70k json CRUD requests out of a netty based service + postgresql with 4 cores and a single SSD.
I imagine search is more complex and expensive than CRUD, but 80k isn't something you can only do with a "hello world" tier application.
It’s not the best metric. Just responding to 80k req/sec with static in memory content is easy nowadays. If there are some complex database queries you have to finangle 80k/sec then that’s the interesting part
About a decade ago Opera Mini did 150k transcoded full pageloads/s (times about 30 inlines per pageload that was the average back then, so about 4.5 million requested/loaded/processed/compressed HTTP resources/s).
(All of the public Google Search numbers I've seen have seemed one or two orders of magnitudes too small. Or maybe most people don't use their search engine/browser as much as I do, so my perspective is skewed...)
From my experience with that scale of traffic (with Opera Mini at the time about 250M MAUs and 150k full pageloads/s):
There is surprisingly little seasonal variance. You have your weekly/daily traffic rhythms based on when your users are awake/active based on their geographical distribution and that's mostly it.
"World events" also have very little impact - they tend to barely make a dent in that massive background noise.
Before we had large volumes of traffic I thought we'd be seeing all sorts of unusual peaks, after a few years I realized growth at scale tends to become boring (but in a good way).
For example, it is just as true for this title to have said "How Amazon uses ... to load 1.6 MM requests per second, from just the search page."
Each search page load, is 1 request to the search backend, but 20x request fanout to the product's key-value store to render the images and titles, etc.