"Hardware Offload for All" reminds me of Sutherland's Wheel of Reincarnation[0]: just a few complete turns further along from 1960's Channel Programs[1][2].
[2] if a supercomputer is a device that turns compute-bound problems into io-bound, is a mainframe a device that turns io-bound problems into compute-bound?
Yeah, I was just thinking this is pretty much where modern IBM mainframes are. Lot of processing is already HW accelerated (I/O, crypto, compression, analytics, with z16 they are adding matrix multiplier for neural networks..).
The article seems to forget that calling a module "hardware" doesn't mean there's no CPU on that "hardware" with software running on top of it.
Same goes for DPUs, which are essentially just single-board computers.
The datacenter OS has been done, many times, and the only incarnation that has had long-term success was the mainframe for a rather un-diverse set of tasks (record processing as banks, tax offices and insurance companies are doing).
The problems so far have always been the same: a triangle of cost, performance and flexibility where you get to pick two if you are lucky, but with most of those 'mega systems' you really only get to pick one.
What has been a true improvement over the last two decades is the commoditisation of hardware, the scheduling of tasks and services (think in terms of container schedulers and even task schedulers) over that commodified hardware, and much more recently: the improvements in hardware-assisted parallel computing (which includes machine learning and the recent shuffling of FPGA IP).
The reason those have been an improvement is because they aren't vendor-locked, they don't have to be homogenous and they are flexible enough that once you meet the bar where you really need it, you can start out and grow and shrink practically on-demand. (and I'm not talking about 'the cloud' here, but about the size of what scheduling and parallel computing facilities you might need)
More true to the title of the article: datacenter software has fallen because it's the old "buy a license, a support contract and install an instance" model which drives silo'ed technical design. It used to be true that to perform a task at a specified performance a specialised silo to perform that task was required to be able to fulfil a business need at all (think network frame and packet processing). Up to a point, that has been generified and commodified so much that before your baseline performance requirements need you to get a highly specialised silo'ed system/device/appliance, you've already outgrown the 'do it yourself' scenario. (scale: think about one corridor/double row of 48U) At that stage you'll either be offloading to an elastic capacity provider or are in such a niche scenario that common rules don't apply anyway.
“In a few years, datacenter networks will grow an order of magnitude from 40Gb to 400Gb. Systems researchers, including myself, have been preparing for this new world for almost a decade [1,2]. Demikernel is the first fully-featured nanosecond-scale OS: 100ns to get a packet from the NIC to the app in Demikernel’s DPDK TCP stack, processing 1 million packets per second per core, etc [3].”
Meanwhile a typical server is doing couple of gbps effective bandwidth, running hundreds of interconnected internal services from redis via zookeeper, ceph, dozen of docker containers, prometheus, some virtualization (kvm?), dozen of sqlite instances, consul and many many many more small random things.
“Datacenter” i work with is not “datacenter” of doing 1mpps per core.
All progress happens at the limit. You may just be running a lot of crud that loafs along, but out there somewhere there is an operator of something that sends a million packets per second per core and they'd rather get 2 million. You don't use your SSD but there is some organization that finds 1 million IOPS to be a little lower than they'd like.
You do benefit from these advances. Your searches get faster, your video streaming remains affordable, etc.
How do you find these limit users? My startup makes low cost block and file storage on AWS that does 2M IOPS per volume with 200 uSec latency, and we’re surprised to find most customers are perfectly happy using gp3 with 3k IOPs and 1-2 ms latency.
My parents company is an example - they need that kind of performance in remote northern canada where connecting to AWS would be a couple bytes every couple seconds with a while of a latency.
The server goes travelling instead, along with the people to use it
How can you drive by mentioning this without mentioning your company name?!?!? Our firm uses smallish sizes of AWS EFS storage and we're highly annoyed by the throughput limitations.
I’d be interested in the company name too (feel free to dm if it’s stealth or something). I’d add that gp3 and other cloud pds implement ec under the hood and as such have very high durability and zero maintenance issues overhead (since they’re managed). And yeah you trade dogshit performance for that privilege
If your application has to go to disk to fetch something it's already accepted that it'll take time. Things that need to be really fast go to RAM or cache.
For everything else you're already getting into niches.
This is really pretty cool. I hope it does happen this way or similarly.
I can see this hardware and software stack making tech support at financial exchanges much harder as now trades will be analyzed down to ns instead ms and there’ll be a guy staring at some FIX protocol packet captures yelling into a phone to some guy at Goldman “no sir, you lost because you weren’t first. I don’t care how much money your algotrading machine cost you, it wasn’t first. There’s a speed of light problem here. You want faster trades, move closer.”
Namingwise, Xerox had the "Star" personal office system, with machines such as the Star 8010 Information System. They didn't call it "Office". "Star" referred to OS/app environment, one option for the 8010 hardware, And "system" means "single workstation" (that operstied in the network) not the whole network.
"Star Office" was Sun's competitor to Microsoft Office suite.
You can be assured that the Rust zealots will rewrite the Python or JS application in Rust. They won’t sleep properly until all the world is rewritten in Rust.
I always expectrd this rewrite but so far as i can tell they, unlike GNU in the 80s, are not doing drop in replacements, pushing their adoption curve out decades. It is such a strong truth that people prefer to do new stuff rather than reimplement existing tools.
Well, if you're deploying on an architecture with very specialized machinery, even up to JSON parsing and REST API handling, a scripting language isn't going to lose that much over a compiled one.
Maybe not python, but the future the article talks about is definitely one where "js everywhere" is a compelling approach.
In a way it harkens back to 80s 8-bit home micros. BASIC on these boxes is incredibly slow, but you can still easily do realtime sprite-based games because all it is doing is manipulating some magic memory locations that are mapped to dedicated video-drawing hardware. Instead of the software needing to blit every frame, all it needs to do is say "Sprite 3 moved left by 3 pixels" every frame. Instead of doing collision detection in software, you just read a magic memory location that tells you which sprites have collided since you last looked.
If you're writing the kind of game that is possible to do in C64 Basic, you don't actually gain much by writing it in assembler instead. What you get from writing in assembler is an expansion of the universe of possible games.
I agree scripting can be done top-level but it’d be shocking for it to be used for device management. These OS ideas are coming back to topics like microkernels, where you want to give applications part of the OS responsibility and hope they don’t crash you. I’d imagine that the scripting language can offload a lot of functions, but it’d just be glue on top of complicated accelerator mini-kernels, which themselves have native code programming e.g., some JSON state machine.
"They cannot continue focusing on software running on general-purpose CPUs but must re-orient themselves towards specialized hardware."
In other words, mainframes.
This trend has already started in general purpose computing - just look at what Apple is doing with their SOCs - dedicated cores for specific functionality as well as their own I/O engines for storage, all integrated into their SOC. Typing this on an M1 Macbook Pro, I can verify that this approach does indeed have a plethora of performance and efficiency benefits!
>"Existing hardware resource isolation is primitive and there are no hardware mechanisms for scheduling CPU cycles (interrupts are too clunky) or memory (likewise for page faults)."
Don't Linux cgroups and cpu shares handle exactly this? Or is acknowledging this and meant to be a critique of CFS and its uses throttling when there is contention?
[0] http://cva.stanford.edu/classes/cs99s/papers/myer-sutherland...
[1] https://en.wikipedia.org/wiki/Channel_I/O#History https://en.wikipedia.org/wiki/Channel_I/O#Channel_program_ex...
[2] if a supercomputer is a device that turns compute-bound problems into io-bound, is a mainframe a device that turns io-bound problems into compute-bound?