> YAML, pivoting being done in the frontend, no symmetric aggregates
(one of the maintainers of Lightdash) You touched on some of our most interesting problems here! Would be especially interested to hear about what you liked / didn't like about symmetric aggregates in Looker and how you find dev with YAML. If you have an idea of how you'd like these to look in Lightdash, the team would be really open to making that a reality.
We're really excited to release the first public version of Lightdash!
Lightdash is an open source alternative to looker that lets analysts define data transformations and metrics in one place. Lightdash gives analysts a BI platform built on the open-source tools they already love (dbt).
We believe that the future of the modern data stack lies in having a single source of truth for all your metrics.
Tools like dbt have made it possible for analysts to manage their transformations using SQL. But existing BI tools still hide away lots of extra business logic, meaning that metrics get scattered across the company (you know those 5 different calculations of revenue XD) and data context gets lost between tools.
With Lightdash, your BI tool is fully integrated with your dbt project. This means:
- You define your metrics right beside the rest of your data transformations, in dbt.
- Developing metrics becomes lightning fast: change some SQL or a metric and immediately see your data viz update
- All your dbt metadata (column and table descriptions, lineage, freshness, test results) is kept in sync with lightdash so you don't have to try to maintain it in multiple places.
Lightdash is still in the early days and we've got lots of work to do. Today, Lightdash supports most popular databases and warehouses but is only tested with PostgreSQL and BigQuery - so, if you try it with another database, it'd be great to hear about your experience using it!
We'd love any feedback or to hear about how you're solving BI at your company today :)
Well done! I was looking for an open source LookML a while back and found Rakam[0]. It seems they added the dbt layer after the fact while you started with that concept.
Thanks! With Hubble, we realised that while some companies wanted a separate tool to monitor data quality (e.g. data governance teams in financial integrations) most modern data teams want to test inside their existing code base (e.g. dbt). Also the majority of the company tends to interact with the data through their BI tool and we think that's where adding data quality makes most sense.
For example: flagging a dashboard as out of date, or showing that a report depends on data with failing tests.
There's rich metadata in the transform layer that just isn't getting pulled through to existing reporting/BI/viz tools.
We still have a lot of love for Hubble and data quality monitoring. By connecting dbt and Lightdash, we finally get some of those data monitoring features we always wanted.
Thanks for sharing Rakam, they always stood out for their choice of using dbt as their transform layer, it's really cool.
We're building support for dbt cloud users right now. Using dbt locally has allowed us to piggy-back on dbt's query runner and we're having to build that ourselves to support dbt cloud.
Wrote a simple CLI tool that converts dbt models into looker view files. Once you've built your dbt project, run dbt2looker and copy the files over to looker.
Features:
- Auto-generates a Looker view per dbt model
- Supports dbt model and column-level descriptions
- Automatically maps raw column types to looker types
- Creates dimension groups for datetime/timestamp/date types
- Currently supports: BigQuery, Snowflake, Redshift (postgres to come)
Same experience but reverted to Mac because the hardware is unbeatable.
Had MacBooks since the white clamshells and eventually wanted to go full linux. So bought the X1 Carbon and really regretted it. Screen and trackpad were by far the worst by comparison.
So I’ve returned. Picked up a Mid-2015 15” MBP a few months back and haven't looked back.
> Most of the UK's biggest insurance companies produce policies that explain everything fully in plain English.
Is this an argument in favour of plain English? Insurance policy documents are incredibly hard to understand and full of bloat. They are a near-perfect example of how not to write an accessible, informative, and useful document for the intended audience.
> Sadly, thanks to the bureaucrats of public service industries, local councils, banks, building societies, _insurance companies_ and government departments, we have learnt to accept an official style of writing that is inefficient and often unfriendly.
> But in the last few years, many of these offenders have started to put things right, either rewriting their documents clearly or training their staff in the art of plain English, or both.
So, it’s a work in progress. I agree that the order is confusing though.
I saw that but couldn't square it with the earlier sentence.
In my previous role we interviewed tens of small business owners and nobody knew what was in their docs. I hope insurance companies will be as bold to go beyond simplifying language but also simplifying terms (e.g. Lemonade's https://www.lemonade.com/policy-two)
In the UK, the Plain English Campaign awards certifications to numerous insurance companies, among other businesses and organisations, for writing their policies in Plain English.
The point they are making is that plain English belongs is present in domains as hefty and complex as insurance.
Why is "doing software engineering" not "doing science"?
Anybody who has conducted experimental research will say they spent 80% of the time using a hammer or a spanner. Repairing faulty lasers or power supplies. This process of reliable and repeatable experimentation is the basis of science itself.
Computational experiments must be held to the same standards as physical experiments. They must be reproducible and they should be publicly available (if publicly funded).
Yes we can run the whole stack on-prem. We realised very early that on-prem would be needed for many users. So we've made it easy to spin up Hubble in a k8s cluster in your cloud or on bare metal.
Yes, we store the historical value of each test so you can always scroll back through time and see the state of the data warehouse at any given point.
For example, if you have a test that counts the number of rows "COUNT(*)" - that value will be recorded. So you can look back an hour/day/week and see how many rows the table had without executing any SQL. These values are stored in a time series db, so querying history is fast.
Our tech stack: monolith backend in python + postgres + react. The test themselves are all SQL queries and run in the data warehouse.
Yeah we called this project hubble long before we were worried about SEO.
Actually, the name does relate back to Edwin Hubble. We previously worked together on an internal data tool called Telescope (it was used for annotating medical images for computer vision). The telescope project slowly evolved into the product we have today. So we changed the name to our favourite telescope. I have a fondness for the Hubble telescope: there was a huge poster of it on the way into the computational physics dept. and takes me back to the grad school days!
The main thing is to be mindful of keywords you target. Don't do as another commenter suggested and target hubble data[0] unless you apply what you make to actual Hubble data. Like AWS did with its Open Data thing that comes up for that keyword.
The telescope is older than the web and is what every single person on the planet with some access to space-related media thinks of when they think of Hubble. Think long tail, not one or two keywords. Hubble data is out unless you go with a telescope-related project, but you already rank indirectly for hubble data warehouse.
As the person you may be referring to, I'd like to clarify that I was not in any way suggesting they target "hubble data." It was just an example of how a user might modify their search if they were looking for this company but found telescope content instead.
There's no sense in doing SEO for your company name, unless you're at the point where competitors are trying to outrank you for your own company name. (Which is a pretty good tactic, actually: https://www.gkogan.co/blog/alternative-pages/.) So don't target "hubble," don't target "hubble data," don't target "hubble the YC company I saw on HN a while back," don't worry about it. Try and catch the people searching for use cases or solutions instead.
(one of the maintainers of Lightdash) You touched on some of our most interesting problems here! Would be especially interested to hear about what you liked / didn't like about symmetric aggregates in Looker and how you find dev with YAML. If you have an idea of how you'd like these to look in Lightdash, the team would be really open to making that a reality.
For pivoting in the backend, this is coming! Issue here: https://github.com/lightdash/lightdash/issues/2907