More

prydt · 2025-06-28T03:28:04 1751081284

My interest in Rust comes from getting frustrated with C's type system. Rust has such a nice type system and I really enjoy the ownership semantics around concurrency. I think that C++ written "correctly" looks a lot like Rust and libkj [1] encourages this, but it is not enforced by the language.

[1] https://github.com/capnproto/capnproto/blob/v2/kjdoc/tour.md

prydt · 2025-06-02T07:08:20 1748848100

Have you seen libkj [1]? I've used it and really enjoy working with it. It has a rust-like owned pointer and the whole library uses these smart pointers.

It has proper container classes based on B-trees and its also got an async runtime.

[1] https://github.com/capnproto/capnproto/blob/v2/kjdoc/index.m...

prydt · 2025-01-25T22:52:44 1737845564

One of my favorite papers! This reminds me of Martin Kleppmann's work on Apache Samza and the idea of "turning the database inside out" by hosting the write-ahead log on something like Kafka and then having many different materialized views consume that log.

Seems like a very powerful architecture that is both simple and decouples many concerns.

aebtebeten · 2025-01-26T13:13:01 1737897181

In their 1992 Transaction Processing book*, Gray and Reuter extrapolate h/w and s/w trends forward and predict that the DBMS of their far future would look like a tape robot for backing store with materialised views in main memory.

Substitute streams for tape i/o, and this description of Samza sounds like it could be very similar to that vision.

* as far as I know, their exposition of the WAL and tradeoffs in its implementation has aged well. Any counter opinions?

gsf_emergency_2 · 2025-01-27T22:53:09 1738018389

Thanks!

prydt · 2024-12-25T21:31:12 1735162272

Merry Christmas!

prydt · on Sept 27, 2024

I think the CALM theorem and this whole line of research is so interesting and it is still carried on by the CRDT people. But I would love to see more of this.

I feel like it doesn't get as much attention as it deserves.

prydt · on Sept 27, 2024

I've run a reading group for distributed systems for the last 2 years now and I do think that Raft is a better introduction to Consensus than any Paxos paper I have seen (I mean the Paxos Made Simple paper literally has bugs in it). But when I learned consensus in school, we used Paxos and Multi-Paxos and I do believe that there was a lot to be gained by learning both approaches.

Heidi Howard has several amazing papers about how the differences between Raft and Multi-Paxos are very surface level and that Raft's key contribution is its presentation as well as being a more "complete" presentation since there are so many fragmented different presentations of Multi-Paxos.

As a bonus, one of my favorite papers I have read recently is Compartmentalized Paxos: https://vldb.org/pvldb/vol14/p2203-whittaker.pdf which is just a brilliant piece on how to scale Multi-Paxos

senderista · on Sept 27, 2024

There are several Multi-Paxos papers (some of them dating before Raft) that are intended as guidance for implementers:

https://paper-notes.zhjwpku.com/assets/pdfs/paxos_for_system...

https://www.cs.cornell.edu/home/rvr/Paxos/paxos.pdf

https://www.scs.stanford.edu/~dm/home/papers/paxos.pdf

prydt · on Sept 27, 2024

Ah thank you. That is a good list although I personally dislike the "Paxos Made Moderately Complex" paper... I think it adds too many different roles for very little benefit. When implementing multi-Paxos for class, I used that paper and felt it was more trouble than it needed to be.

I'll check out the other two papers though! Also just looking around and I found this paper https://arxiv.org/pdf/1103.2408 [PDF] which looks useful as well.

nano_o · on Sept 28, 2024

The bug in the Paxos Made Simple paper is that Lamport forgot to mention that, upon accepting a proposal, an acceptor also implicitly promises not to accept any proposal in lower ballots. It's discussed at length here: https://stackoverflow.com/questions/29880949/contradiction-i...

shepherdjerred · on Sept 28, 2024

What’s your reading group?

I took a DS class and (poorly) implemented Paxos a few years ago. I’m curious about how others continue learning about DS.

prydt · on March 29, 2024

I'm curious now. What is ifunc? (Had difficulty finding it through a search)

jeffbee · on March 29, 2024

ifunc is a GNU method of interposing function calls with platform-optimized versions of the function. It is used to detect CPU features at runtime and insert, for example, AVX2-optimized versions of memcmp. It is seen in crypto a lot, because CPUs have many crypto-specific instructions.

However, I don't like it much and I think software should be compiled for the target machine in the first place. My 1 hardened system that is reachable from the public network is based on musl, built mostly with llvm, and with ifunc disabled.

cesarb · on March 29, 2024

> However, I don't like it much and I think software should be compiled for the target machine in the first place.

That means you either have to compile software locally on each machine, or you have a combinatorial explosion of possible features.

Compiling locally has several drawbacks. It needs the full compilation environment installed on every machine, which uses a lot of disk space, and some security people dislike it (because then attackers can also compile software locally on that machine); compiling needs a lot of memory and disk space, and uses a lot of processor time and electric power. It also means that signature schemes which only allow signed code cannot be used (or you need to have the signing key available on the target machine, making it somewhat pointless).

The combinatorial explosion of features has been somewhat tamed lately, by bundling sets of feature into feature levels (x86_64-v1, etc), but that still quadruples the amount of compiled code to be distributed, and newer features still have to be selected at runtime.

myself248 · on March 29, 2024

Compiled _on_ and compiled _for_ are not the same. There must be a way to go to the target machine, get some complete dump of CPU features, copy that to the compile-box, do the build, and copy the resulting binaries back.

derefr · on March 29, 2024

> That means you either have to compile software locally on each machine, or you have a combinatorial explosion of possible features.

Or you just have to buy a lot of the exact same hardware. Secure installations tend to do that.

jeffbee · on March 29, 2024

I don't think you can really say it is "combinatorial" because there's not a mainstream machine with AES-NI but not, say, SSSE3. In any case if there were such a machine you don't need to support it. The one guy with that box can do scratch builds.

afh1 · on March 30, 2024

I have no issues compiling everything on my Gentoo box.

eklitzke · on March 29, 2024

Obviously compiling for the target architecture is best, but for most software (things like crypto libraries excluded) 95% of the benefit of AVX2 is going to come from things like vectorized memcpy/memcmp. Building glibc using ifuncs to provide optimized implementations of these routines gives most users most of the benefit of AVX2 (or whatever other ISA extension) while still distributing binaries that work on older CPU microarchitectures.

jeffbee · on March 29, 2024

ifunc memcpy also makes short copies suck ass on those platforms, since the dispatch cost dominates regardless of the vectorization. It's an open question whether ifunc helps or harms the performance of general use cases.

By "open question" I meant that there is compelling research indicating that GNU memcpy/memcmp is counterproductive, but the general Linux-using public did not get the memo.

https://storage.googleapis.com/gweb-research2023-media/pubto...

"AsmDB: Understanding and Mitigating Front-End Stalls in Warehouse-Scale Computers" Section 4.4 "Memcmp and the perils of micro-optimization"

wiml · on March 29, 2024

On the other hand, it also means that your distro can supply a microarchitecture-specific libc and every program automatically gets the memcpy improvements. (Well, except for the golang/rust people.)

Wasn't this the point of Gentoo, back in the day? It was more about instruction scheduling and register allocation differences, but your system would be built with everything optimized for your uarch.

aaronmdjones · on March 29, 2024

https://sourceware.org/glibc/wiki/GNU_IFUNC

prydt · on March 24, 2024

Love the list and the eBPF tools look super helpful.

prydt · on Dec 24, 2023

LSM trees are a good example of a data structure optimized for memory hardware (both hdds and ssds).

prydt · on Dec 14, 2023

High Scalability has always been a very fun read, and I've enjoyed it a lot over the years. I hope the site still stays up.

Does anyone know any other resources that are about similar topics?

mike503 · on Dec 14, 2023

Same! I didn't realize it was gone (due to all the other noise in the world) until seeing the sale post. I'd love for it to stay online but any new content might lack the same spirit since it'd be a whole different author :)