The fall of datacenter software

aebtebeten · on Aug 25, 2022

"Hardware Offload for All" reminds me of Sutherland's Wheel of Reincarnation[0]: just a few complete turns further along from 1960's Channel Programs[1][2].

[0] http://cva.stanford.edu/classes/cs99s/papers/myer-sutherland...

[1] https://en.wikipedia.org/wiki/Channel_I/O#History https://en.wikipedia.org/wiki/Channel_I/O#Channel_program_ex...

[2] if a supercomputer is a device that turns compute-bound problems into io-bound, is a mainframe a device that turns io-bound problems into compute-bound?

js8 · on Aug 25, 2022

Yeah, I was just thinking this is pretty much where modern IBM mainframes are. Lot of processing is already HW accelerated (I/O, crypto, compression, analytics, with z16 they are adding matrix multiplier for neural networks..).

oneplane · on Aug 25, 2022

The article seems to forget that calling a module "hardware" doesn't mean there's no CPU on that "hardware" with software running on top of it.

Same goes for DPUs, which are essentially just single-board computers.

The datacenter OS has been done, many times, and the only incarnation that has had long-term success was the mainframe for a rather un-diverse set of tasks (record processing as banks, tax offices and insurance companies are doing).

The problems so far have always been the same: a triangle of cost, performance and flexibility where you get to pick two if you are lucky, but with most of those 'mega systems' you really only get to pick one.

What has been a true improvement over the last two decades is the commoditisation of hardware, the scheduling of tasks and services (think in terms of container schedulers and even task schedulers) over that commodified hardware, and much more recently: the improvements in hardware-assisted parallel computing (which includes machine learning and the recent shuffling of FPGA IP).

The reason those have been an improvement is because they aren't vendor-locked, they don't have to be homogenous and they are flexible enough that once you meet the bar where you really need it, you can start out and grow and shrink practically on-demand. (and I'm not talking about 'the cloud' here, but about the size of what scheduling and parallel computing facilities you might need)

More true to the title of the article: datacenter software has fallen because it's the old "buy a license, a support contract and install an instance" model which drives silo'ed technical design. It used to be true that to perform a task at a specified performance a specialised silo to perform that task was required to be able to fulfil a business need at all (think network frame and packet processing). Up to a point, that has been generified and commodified so much that before your baseline performance requirements need you to get a highly specialised silo'ed system/device/appliance, you've already outgrown the 'do it yourself' scenario. (scale: think about one corridor/double row of 48U) At that stage you'll either be offloading to an elastic capacity provider or are in such a niche scenario that common rules don't apply anyway.

beckingz · on Aug 25, 2022

The Iron Triangle: pick at most two. Most likely pick one. End up with none if you're unlucky.

oneplane · on Aug 25, 2022

Ironically, it gets worse the more money you throw at it, even if you picked cost as a preferred focus.

majke · on Aug 25, 2022

“In a few years, datacenter networks will grow an order of magnitude from 40Gb to 400Gb. Systems researchers, including myself, have been preparing for this new world for almost a decade [1,2]. Demikernel is the first fully-featured nanosecond-scale OS: 100ns to get a packet from the NIC to the app in Demikernel’s DPDK TCP stack, processing 1 million packets per second per core, etc [3].”

Meanwhile a typical server is doing couple of gbps effective bandwidth, running hundreds of interconnected internal services from redis via zookeeper, ceph, dozen of docker containers, prometheus, some virtualization (kvm?), dozen of sqlite instances, consul and many many many more small random things.

“Datacenter” i work with is not “datacenter” of doing 1mpps per core.

jeffbee · on Aug 25, 2022

All progress happens at the limit. You may just be running a lot of crud that loafs along, but out there somewhere there is an operator of something that sends a million packets per second per core and they'd rather get 2 million. You don't use your SSD but there is some organization that finds 1 million IOPS to be a little lower than they'd like.

You do benefit from these advances. Your searches get faster, your video streaming remains affordable, etc.

crmd · on Aug 25, 2022

How do you find these limit users? My startup makes low cost block and file storage on AWS that does 2M IOPS per volume with 200 uSec latency, and we’re surprised to find most customers are perfectly happy using gp3 with 3k IOPs and 1-2 ms latency.

bombcar · on Aug 25, 2022

I have a suspicion that many of them are not running on AWS, since they'd like to remove as much cloud/network latency as possible.

8note · on Aug 25, 2022

My parents company is an example - they need that kind of performance in remote northern canada where connecting to AWS would be a couple bytes every couple seconds with a while of a latency.

The server goes travelling instead, along with the people to use it

rmah · on Aug 25, 2022

How can you drive by mentioning this without mentioning your company name?!?!? Our firm uses smallish sizes of AWS EFS storage and we're highly annoyed by the throughput limitations.

8note · on Aug 25, 2022

The limit work I've done is in geophysics, and AWS is a non-starter for data collection and analysis in the bush.

If you can wait a few weeks to upload all the data to AWS, why are you going to care about moving the processing speed from a month to half a month?

The high speed stuff is in dedicated hardware that can travel to where it's needed, whether that's in a base station or with the surveyors

dilyevsky · on Aug 26, 2022

I’d be interested in the company name too (feel free to dm if it’s stealth or something). I’d add that gp3 and other cloud pds implement ec under the hood and as such have very high durability and zero maintenance issues overhead (since they’re managed). And yeah you trade dogshit performance for that privilege

oneplane · on Aug 25, 2022

If your application has to go to disk to fetch something it's already accepted that it'll take time. Things that need to be really fast go to RAM or cache.

For everything else you're already getting into niches.

daveguy · on Aug 25, 2022

What's your startup? I'd be interested to see the cost. Persistence gotchas, encryption / security.

BirAdam · on Aug 25, 2022

This is really pretty cool. I hope it does happen this way or similarly.

I can see this hardware and software stack making tech support at financial exchanges much harder as now trades will be analyzed down to ns instead ms and there’ll be a guy staring at some FIX protocol packet captures yelling into a phone to some guy at Goldman “no sir, you lost because you weren’t first. I don’t care how much money your algotrading machine cost you, it wasn’t first. There’s a speed of light problem here. You want faster trades, move closer.”

kkfx · on Aug 25, 2022

I wonder how many decades more we need to came back to the network of interconnected desktops, Xerox originally called Star Office System...

lupire · on Aug 25, 2022

Namingwise, Xerox had the "Star" personal office system, with machines such as the Star 8010 Information System. They didn't call it "Office". "Star" referred to OS/app environment, one option for the 8010 hardware, And "system" means "single workstation" (that operstied in the network) not the whole network.

"Star Office" was Sun's competitor to Microsoft Office suite.

bayindirh · on Aug 25, 2022

It's actually quite possible with X2Go, including printing and audio support, but NoMachine's NX libs will probably need some love soon.

I have deployed the thing in a 100mbps LAN for quite a few users connected to beefy servers.

Also, I was running the thing for 20 or so researchers to use a powerful server from their homes.

1-6 · on Aug 25, 2022

It starts with open-source mobile devices.

tempe1h1 · on Aug 25, 2022

which means it also ends (stops) there

the issue comes down to the possibility of programming a mobile smartphone using only mobile smartphones (in contrast with PCs)

sp0ck · on Aug 25, 2022

And at the end of all - this tailored and optimized to ns path there will be python app ;-)

BirAdam · on Aug 25, 2022

You can be assured that the Rust zealots will rewrite the Python or JS application in Rust. They won’t sleep properly until all the world is rewritten in Rust.

lanstin · on Aug 25, 2022

I always expectrd this rewrite but so far as i can tell they, unlike GNU in the 80s, are not doing drop in replacements, pushing their adoption curve out decades. It is such a strong truth that people prefer to do new stuff rather than reimplement existing tools.

qsort · on Aug 25, 2022

Well, if you're deploying on an architecture with very specialized machinery, even up to JSON parsing and REST API handling, a scripting language isn't going to lose that much over a compiled one.

Maybe not python, but the future the article talks about is definitely one where "js everywhere" is a compelling approach.

ElevenLathe · on Aug 25, 2022

In a way it harkens back to 80s 8-bit home micros. BASIC on these boxes is incredibly slow, but you can still easily do realtime sprite-based games because all it is doing is manipulating some magic memory locations that are mapped to dedicated video-drawing hardware. Instead of the software needing to blit every frame, all it needs to do is say "Sprite 3 moved left by 3 pixels" every frame. Instead of doing collision detection in software, you just read a magic memory location that tells you which sprites have collided since you last looked.

If you're writing the kind of game that is possible to do in C64 Basic, you don't actually gain much by writing it in assembler instead. What you get from writing in assembler is an expansion of the universe of possible games.

bertday · on Aug 25, 2022

I agree scripting can be done top-level but it’d be shocking for it to be used for device management. These OS ideas are coming back to topics like microkernels, where you want to give applications part of the OS responsibility and hope they don’t crash you. I’d imagine that the scripting language can offload a lot of functions, but it’d just be glue on top of complicated accelerator mini-kernels, which themselves have native code programming e.g., some JSON state machine.

abecedarius · on Aug 25, 2022

I'm reminded of this using LuaJIT: https://github.com/snabbco/snabb

EricE · on Aug 25, 2022

"They cannot continue focusing on software running on general-purpose CPUs but must re-orient themselves towards specialized hardware."

In other words, mainframes.

This trend has already started in general purpose computing - just look at what Apple is doing with their SOCs - dedicated cores for specific functionality as well as their own I/O engines for storage, all integrated into their SOC. Typing this on an M1 Macbook Pro, I can verify that this approach does indeed have a plethora of performance and efficiency benefits!

jayd16 · on Aug 25, 2022

Why is mainframe the metaphor? Aren't GPUs and sound cards a much more approachable analog?

bogomipz · on Aug 25, 2022

The author states:

>"Existing hardware resource isolation is primitive and there are no hardware mechanisms for scheduling CPU cycles (interrupts are too clunky) or memory (likewise for page faults)."

Don't Linux cgroups and cpu shares handle exactly this? Or is acknowledging this and meant to be a critique of CFS and its uses throttling when there is contention?