Some additional nuggets by ScyllaDB co-founder:
- Discord couldn't complete repair with Cassandra. Not the case with Scylla
- Scylla has a lot in common with Cassandra, from a good reason, like the LSM tree, compaction etc. However, Scylla has a unique CPU&IO schedulers which allows us to prioritize the queries over compaction, and defer compaction to the half milisecond where we have enough idle bandwidth. We have plenty of articles about it
- Scylla has a new (1.5 years) tombstone_gc=repair - a much safer mode
- Scylla's new architecture of Raft and tablets was recently launched and is the next big thing for our users. Watch the cool youtube video of those tablet load balancing
My understanding is that the new Rust coalescing will make the situation on-par with the new go driver. However, in the second part of the blog there is a no-coalescing test where go is still faster and allocates less memory. I'm sure that the Rust driver can get there too
GSI, LSI are fully supported already.
Backup is fully supported too (but not identical to DynamoDB).
TTL is in final development mode.
So yes, everything is planned to be fully supported
Shard per core has challenges. Scylla solves it partially by making the client shard aware (topology aware), so the client submits requests directly to the right cpu core using a different port. This way no data gets blocked due to hot keys or imbalance in other shards. You do need to have a good key distribution. We're implementing a new mechanism to split key ranges dynamically. It will be a fast an parallel mechanism.
If the number of connections is not small, Scylla will crank up any other implementation with traditional locking.
At Scylla we initially defined a new filesystem which adheres shard-per-core and thus a single physical hyperthread has the sole access to the data and thus there is no contention.
While it will be better than current XFS, we've made aio improvement to the later over the years and today it's good enough for ScyllaDB.
Practically, even though Scylla has its tcp/ip stack in userspace on top of DPDK, we learned over the years that it's ok to use the less efficient kernel tcp stack. Most of the overhead and the optimizations can still happen within the DB itself as long as it controls the memory, the cache and manages the networking queues
Seastar.io is another one, it's an async engine that utilizes all of the cores in a modern system and is the heart of Scylla's DB. However, it's complex to use it correctly
The advantage of in-memory it's also its biggest disadvantage - it's costly. NVMe can serve almost as fast as ram and is 100x cheaper. Scylla offers single digit msec (usually 1msec) 99% latency and way more cost effective
We at Scylla tune the kernel, drives, stride, clock source automatically and benchmark the disk to figure out what's its maximum parallelism so it won't queue (we can saturate 8 NVMe) and we'll do the queuing within the DB's userspace so we can prioritize real time queries over background tasks: