There are rumblings that the MySQL project is rudderless after Oracle fired the team working on the open-source project in September 2025. Oracle is putting all its energy in its closed-source MySQL Heatwave product. There is a new company that is looking to take over leadership of open-source MySQL but I can't talk about them yet.
The MariaDB Corporation financial problems have also spooked companies and so more of them are looking to switch to Postgres.
> There are rumblings that the MySQL project is rudderless after Oracle fired the team working on the open-source project in September 2025.
Not just the open-source project; 80%+ (depending a bit on when you start counting) of the MySQL team as a whole was let go, and the SVP in charge of MySQL was, eh, “moving to another part of the org to spend more time with his family”. There was never really a separate “MySQL Community Edition team” that you could fire, although of course there were teams that worked mostly or entirely on projects that were not open-sourced.
Wouldn't Oracle need those 80%+ devs if they wanted to shift their efforts into Heatwave? That percentage sounds too huge to me and if true I believe they won't be making any larger investments into Heatwave neither. There's several core teams in MySQL and if you let those people go ... I don't know, I am not sure what to make out of it but that Oracle is completely moving away from MySQL as a strategic component of their business.
So, AI ate the cake ... I always thought that the investment that Oracle needs to make for MySQL is peanuts compared to the Oracle's total revenue and the revenue MySQL is generating. Perhaps the latter is not so true anymore.
> so... you take 10%-30% performance hit _right away_, and you perpetually give up any opportunities to improve the decoder in the future.
The WASM is meant as a backup. If you have the native decoder installed (e.g., as a crate), then a system will prefer to use that. Otherwise, fallback to WASM. A 10-30% performance hit is worth it over not being able to read a file at all.
"Embedding the decoders in each file requires minimal storage (kilobytes) and ensures
compatibility on any platform in case native decoders are unavailable."
The idea that software I write today can decode a data file written in ten years using new encodings is quite appealing.
And the idea that new software written to make use of the new encodings doesn't have to carry the burden of implementing the whole history of encoders for backwards compatibility likewise.
"In case users prefer native decoding speed over Wasm, F3 plans to offer an option to associate a URL with each Wasm binary, pointing to source code or a precompiled library."
They are not suggesting that the code at the url would be automatically downloaded. It would be up to you to get the code and build it into your application like any other library.
Is this relevant in practice? Say I go to a website to download some data, but a malicious actor has injected an evil decoder (that does what exactly?). They could just have injected the wasm into the website I am visiting to get the data!
In fact, wasm was explicitly designed for me to run unverified wasm blobs from random sources safely on my computer.
The backstory is complicated. The plan was to establish a consortium between CMU, Tsinghua, Meta, CWI, VoltronData, Nvidia, and SpiralDB to unify behind a single file format. But that fell through after CMU's lawyers freaked out over Meta's NDA stuff to get access to a preview of Velox Nimble. IANAL, but Meta's NDA seemed reasonable to me. So the plan fell through after about a year, and then everyone released their own format:
On the research side, we (CMU + Tsinghua) weren't interested in developing new encoders and instead wanted to focus on the WASM embedding part. The original idea came as a suggestion from Hannes@DuckDB to Wes McKinney (a co-author with us). We just used Vortex's implementations since they were in Rust and with some tweaks we could get most of them to compile to WASM. Vortex is orthogonal to the F3 project and has the engineering energy necessary to support it. F3 is an academic prototype right now.
I note that the Germans also released their own fileformat this year that also uses WASM. But they WASM-ify the entire file and not individual column groups:
Andrew, it’s always great to read the background from the author on how (and even why!) this all played out. This comment is incredibly helpful for understanding the context of why all these multiple formats were born.
If I could ask you to speculate for a second, how do you think we will go from here to a clear successor to Parquet?
Will one of the new formats absorb the others' features? Will there be a format war a la iceberg vs delta lake vs hudi? Will there be a new consortium now that everyone's formats are out in the wild?
... Are you saying that there's 5 competing "universal" file format projects? Each with different non-compatible approaches? Is this a laughing/crying thing, or a "lots of interesting paths to explore" thing?
Also, back on topic - is your file format encryptable via that WASM embedding?
I would love to bring these benefits to the multidimensional array world, via integration with the Zarr/Icechunk formats somehow (which I work on). But this fragmentation of formats makes it very hard to know where to start.
Yep, another developer enthusiastically proposing mmap as an "easy win" for database design, when in reality it often causes hard-to-debug correctness and performance problems.
To be fair, I use it to share financial time series between multiple processes and as long as there is a single writer it works well. Been in production since several years.
Creating a shared memory buffer by mapping it as a file is not the same as mapping files on disk. The latter has weird and subtle problems, whereas the former just works.
To be clear, I am indeed doing mmap to the same file on disk. Not using shmap. But there is only one thread in one process writing to it and the readers are tolerant to millisecond delays.
No issue if you know what you are doing. Not sure about the author but I know very high perf mmap systems for decades without corruption / issues (in hft/finance/payments).
https://www.cs.cmu.edu/~pavlo/blog/2026/01/2025-databases-re...
reply