I didn't see this in the blog post, but did you train this from scratch or finetune an existing base model?
If from scratch, quite impressive that the model is capable of understanding natural language prompts (English presumably) from such a small, targeted training set.
I work with large text datasets, and I typically have to go through hundreds of samples to evaluate a dataset's quality and determine if any cleaning or processing needs to be done.
A tool that lets me sample and explore a dataset living in cloud storage, and then share it with others, would be incredibly valuable, but I haven't seen any tools that support long-form non-tabular text data well.
This is also an area that I'm starting to explore with LLMs. I love the idea that you could take a bunch of messy data, tell Datasette Cloud "I want this imported into a table with this schema"... and it does that.
Thank you for open sourcing this. More competition in the budding metrics ecosystem is good for end users.
It seems like you think MetricFlow should be the data mart layer and not just the metrics layer. If that's true...why? Why would I join my fact and dimension tables in metricflow instead of in dbt? One of the value adds of dbt is that it centralizes business logic in a single place. Joins are business logic. The industry seems to be moving towards creating very wide data mart tables in dbt and surfacing them to the semantic layer 1:1, or building the metrics layer on top of them.
I'd say we think MetricFlow should be able to provide consistent, correct answers to reasonable queries end users of the metric model might ask. To do this across the various data warehouse layouts our users are likely to encounter we must necessarily provide support for dimensional joins. This doesn't mean MetricFlow should displace data mart services - to the contrary, I contend MetricFlow works best when layered on top of a warehouse built on centralized logic for managing its data layout. As an example, we generally push our customers to rely on the sql_table data source definition and push any sql_query constructs down to whatever warehouse management layers they have in place.
That said, you need to support joins, at least in some limited scope, in the semantic metric layer for it to be broadly useful. Consider this scenario - you have your dbt models producing wide tables for reasonable measure/dimension queries, and you have MetricFlow configs for the metric and dimension sets available in your data mart. Now imagine you've also got your finance team hooked up to a Google Sheets connector, and they're looking at revenue and unique customers by sales region. Cool, your wide table has that built in, no joins needed.
But what if they want something new? Let's say they want to know how they're doing against the target addressable market in each country. Should they have to submit a ticket to the data engineering team to add customer.country.market_size to your revenue table? Or should they be able to do "select revenue by customer__country__market_size" and get the report they need?
Our position is that we want to facilitate the latter - people getting what they need and knowing, as long as it's been defined properly in the model, that it's going to produce reasonable results. If your particular organization wants all of those joins run through a data mart ticket queue and surfaced as fully denormalized underlying tables that's fine by us, but most likely that's not what you want. You'd rather have some visibility into the types of joins people are requesting and then build out your data mart to more efficiently serve the requests people have on the ground, while still allowing them to ask new questions of the data without a long development feedback loop.
For any dbt users, their reliability package has the best and most comprehensive way to upload artifacts directly to the warehouse after a dbt invocation.
Thank you! We believe that this upload is super valuable and could unlock a lot of additional use cases. We are already working on some of these and will release in the next few weeks.
> And I wonder if a person free of cognitive distortions would even be referred to as human, as the quote goes:
There's a difference between emotional intuition and emotional reasoning (the cognitive distortion in OP's example).
Emotions are extremely valuable for decision-making (e.g. this house ticks all my boxes but do i love it?) and making judgements (e.g. this situation does not feel right to me).
Emotional reasoning is when people distort reality in favor of their (often self-destructive) emotional impulses, discarding physical evidence in favor of their emotions.
Thanks so much for your support from the very beginning! We've only been in market publicly for a little over a year now actually. We figured better late than never (and it still feels early for us!) :)