The Cloud Storage Triad: Latency, Cost, Durability

camuel · on April 22, 2024

Perhaps we just need an S3 implementation specifically built from the ground up for the data-intensive workloads? Such implementation might use conventional S3 underneath for cost and reliability advantages. Such implementation can also deliver a lot of additional functionality which will also be uniform across cloud vendors like enforcing Iceberg conventions on the server-side rather than on the client side and those pre-conditions uniformly implemented or even a full blown transactions but without breaking S3 semantics? Metering and billing can be also made friendly to those data intensive workloads. The question is: if the overhead of such indirection level, right on the data path, can be made reasonable? This is what we are trying to figure out at Embucket.com