Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

SQLite is really great at crunching data! I definitely prefer it over pandas in most cases, as SQL is naturally fit to joins, aggregates etc. Also SQLite works natively with JSON, which is a huge time saver.


I prefer it over pandas for joins etc too. My workflow is (1) do the simple stuff in Python using pandas (2) for some of the complex stuff, I just start creating sqlite tables. If you have datasette installed, you can also view the tables (choosing to write intermediate ones for greater debuggability) pretty easily in your browser.


datasette is a wonderful piece of software! Simon Willison has created a great tooling around SQLite. Can't imagine how much time and energy he has invested in these projects.


Honest question because I haven't messed with Datasette much beyond skimming the home page - how does it improve on a general SQL client like DataGrip or Squirrel or DBeaver?


Obviously I'm biased, so I'd love to hear answers to this from other people (plus I've not really used any of those alternatives much).

Datasette is very "webby". Queries you execute end up in your URL bar as ?sql= parameters, which means you can navigate to them in your history, bookmark them, share links with other people (if your Datasette is shared) and open them in new tabs.

It also does web-style tricks like turning foreign key references into hyperlinks through to the associated records.

Datasette's table browsing feature has faceting, which is enormously powerful. I don't know if those alternatives have this feature or not, but I use this constantly. Demo here (the owner, country_long and primary_fuel columns): https://global-power-plants.datasettes.com/global-power-plan...

Datasette's plugin system is pretty unique too. You can install plugins like https://datasette.io/plugins/datasette-cluster-map and https://datasette.io/plugins/datasette-vega to add visualizations, which again are bookmarkable and hence easy to share with other people.

All of that said, I don't really see Datasette as competing with existing SQL clients. It's intended more as a tool for exploratory data analysis - I've put very little work into running UPDATE/INSERT statements for example, it's much more about turning a set of relational data into something people can interactively explore.


I work in a research organization where I am responsible for crunching data and produce reports highlighting the most "notable" results. Not that the other data is uninteresting, but the volume is such that Excel cannot handle it and even distributing it can be challenging for non-computer-technical folks without dedicated solutions.

Instead, I can dump all of the processed results into a table, create some views highlighting analysis X vs Y, and share links that give others the ability to ask questions I had not even considered. Now the user is empowered to ask anything and they do not need to engage me for "simple questions". Everybody wins. I believe there is also an extension that allows you to generate and save new queries through the web interface.

It is not a tool for a professional analyst, but a means to collaborate with others. There are heavier/more feature rich alternatives, but Datasette is my favorite tool for getting results out the door without hassle (can run it off of a laptop after a pip install).


It's so great to hear people using it like this!

https://datasette.io/plugins/datasette-saved-queries is the plugin for storing queries - it's pretty basic, there's lots of scope for improving the story around that.


I have had such enthusiastic feedback from granting people access to the ~full dataset. They have been conditioned to expect whatever subset can fit inside an email or a powerpoint slide. I feel a little embarrassed when people fawn over the utility because it is so easy to get running.

Have not yet had a chance to try the idea, but I am toying with using render-images to bake in pre-built plots + markdown for reporting the output. Queryable report in a file. Dynamic Vega plotting (RShiny-ish) is also in the back of my mind, but that feels too close to magic.

It is an incredibly useful tool, and I appreciate the workflows you have enabled.


I use datasette, datagrip and excel to process sales data.

- datasette to surface data via rest to excel (power query)

- datagrip to get the data how I want it, the json1 extension is so so much easier to work with than power query and for my use cases extremely fast.

This gives you (arguably) the best data grid in the world (excel) but without the horrible experience of building a pipeline in power query that will eventually become too slow and/or randomly crash and hang.

I would really, really like an in the box Regex extension so I can create SQLite views without the crazy lengths I have to go with SQL to, for example split a comma delimited list in a field.


I get the feeling we’ve been trained to expect less, with poor, incredibly slow legacy products. But really for many use cases all the opensource relational databases give instant results.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: