You can write some SQL in Spark, but 1) Why would you want to maintain your own ...

Foobar8568 · on March 8, 2022

A lot of articles I read about snowflake involves data vault which is a massive turn off. And when their tech lead (Kent Graziano) is a prominent figure in the DV bullshit...

fritkot · on March 8, 2022

Snowflake and DV have no interdependency whatsoever. Snowflake is just a database. Whether you use DV to model the data inside of it or dimensional modelling or "big wide tables" is completely up to you, there's nothing about it that requires or benefits DV in particular.

secondcoming · on March 8, 2022

What is DV?

belter · on March 8, 2022

Count not knowing what it is, as a blessing. Run if you can.

https://en.wikipedia.org/wiki/Data_vault_modeling

Edit: As the Wikipedia article has no Criticism section I will add some references:

http://kejser.org/the-data-vault-vs-kimball-round-2/

https://timi.eu/blog/data-vaulting-from-a-bad-idea-to-ineffi...

fritkot · on March 8, 2022

It's a data modelling method for data warehouses, it can be used in Snowflake or on any other data management platform.

mohanmcgeek · on March 11, 2022

> Snowflake is probably 10-50x as performant as Spark for data manipulation

Wow is this for a fact? I haven't used either in a while but I saw the blog post from databricks and Spark was more performant than snowflake.

I assumed that's what I'll also get when i run spark on kube

rxin · on March 8, 2022

You should try Databricks, especially the new Photon engine powering Spark. In general more performant than Snowflake in SQL and a lot more flexible. (There are some cases in which Databricks would be slower but the perf is improving rapidly.)

belter · on March 8, 2022

Probably an oversight on your part, but I would argue would be elegant to disclose you are one of the co-founders.

Fiahil · on March 8, 2022

Databricks has an extremely bad API. So, sure, your Spark jobs might be a little bit faster some times, but why would you use it if you can't even read logs of running jobs?

Lucasoato · on March 8, 2022

Databricks is amazing, the Delta Live Table technology is incredible. It's very hard to approach problems like Data Lineage and Data Quality, but that platform does it in the right way.

My only concern is that they offer just a managed cloud product. That's cool for startups, but large enterprises sometimes need more governance and ownership than that.

soulbadguy · on March 8, 2022

Very surprised by this. Do you have a reference ?

fnord123 · on March 8, 2022

Fyi, rxin is co-founder of databricks.

soulbadguy · on March 8, 2022

That explains it

StreamBright · on March 8, 2022

The biggest selling point of Snowflake for most of the customers is that they do not need to maintain the infrastructure.

fritkot · on March 8, 2022

Of course you would say that it's more performant and flexible ...TCP-DS was just a PR ploy