But would that be relevant? In other words, are CDN nodes CPU-limited? Or, does the bulk encryption absorb otherwise unused CPU time? I would have imagined that most Netflix traffic is to clients that Netflix itself controls, like Roku apps or other first-party apps and not web browsers, so Netflix could choose against encryption for most streams.
Yes, without TLS offload, we're CPU limited. For example, on our 400GbE (4x100GbE) servers, we are CPU (really memory bandwidth) limited at ~240Gb/s without NIC TLS offload. Using Mellanox CX6-DX NICs to move most of crypto onto the NIC increases the effective limit to ~375Gb/s.
We encrypt everything we can with TLS to protect the privacy of our members.
Thanks. I've read all the Netflix posts about the difficulty of moving 400gbps through a NUMA system. I imagined that if it was load-store limited then load-encrypt-store could potentially be "free".
The happy path is storage ===>> RAM ==>> NIC, without CPU access. So there is basically a single memory write and a single memory read per byte. That's the non-TLS (and NIC TLS offload) path.
Since TLS is per-connection and the page-cache is per-file, any software crypto needs to encrypt from a common source in the page cache to a per-client buffer (eg, it cannot happen in-place). So this introduces an extra memory read from the page cache and write to the per-client socket buffer. We use the non-temporal version of ISA-L, so as to write full cache lines into the socket buffer, and not pay for an extra memory read as part of a read-modify-write update of a partial cacheline. So software crypto basically doubles the memory bandwidth requirements over no TLS (or NIC TLS offload).
So you're proposing that Netflix have two completely separate pipelines, one for browser traffic and the other for first-party traffic? With separate CDNs, separate delivery mechanisms, separate client-side algorithms, etc?