Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Good question -- it depends. For certain workloads, it might look exactly the same! For others, I found that the memory and VM constraints were creating large inefficiencies. Also, many teams simply don't want to manage that level of data infra: managing EMR, instance type optimization, spark optimization (now with GPU configs!), custom images, upgrades, etc.

We take care of that and make it as easy as pie... or so we hope! On top of that, we also deploy an external shuffle service, and deal with other plugins, connectors, etc.

I suppose it's similar to using Databricks Serverless SQL!

Another thing: we ran into an incompatible (i.e. non-accelerated) operation in one of our first real workloads, so we worked with our customer to speed up that workload even more with a small query optimization.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: