mchav's comments

mchav · 2026-04-22T13:54:31 1776866071

I think the original author picked this example to broadly illustrate how easy it is to make ad hoc changes to your query without worrying about lot about implementation details. Polars, for example, converges on a similar API and gives you the flexibility. You can iterate then refactor easily later to what you consider good practice.

rgavuliak · 2026-04-22T17:42:21 1776879741

For me the whole piping felt like making everything less readable and harder to debug compared to a string of commands.

mchav · 2026-03-24T05:10:45 1774329045

Had always hoped for something like this since the days of Spark and Frameless. Better late than never.

Now hoping to build a bunch of Neuro symbolic AI on top of this.

mchav · 2026-03-23T22:53:01 1774306381

No but something is in the works! We are building reactive notebooks that we will eventually give export capabilties.

You can try it from https://www.datahaskell.org/ under "try out our current stack"

mchav · 2026-03-23T14:24:04 1774275844

Author here: Would have loved to but this is round about my wedding anniversary. Will ask some Haskell friends to submit though.

brightball · 2026-03-24T15:46:41 1774367201

Also, congratulations.

brightball · 2026-03-23T18:00:19 1774288819

Thanks!

mchav · 2026-03-23T14:20:22 1774275622

Author here. At the time I worked in fraud detection and we needed to automate file generation for our BRMS. Initially created this to experiment with “models as dataframe expressions” and Haskell is great for DSL-like stuff. That work is still on going: https://github.com/DataHaskell/symbolic-regression and dataframe has a native sparse oblique tree implementation.

As it’s grown it’s been pretty cool to have transparent schema transformations instead of every function mapping a statement a dataframe you can have function signatures like:

``` extract :: TypedDataFrame [Column "price" (Maybe Double), Column "quantity" Int, Column "comments" T.Text] -> TypedDataFrame [Column "price" (Maybe Double), Column "quantity" Int] -- body of extract

transform :: TypedDataFrame [Column "price" (Maybe Double), Column "quantity" Int] -> TypedDataFrame [Column "price" Double, Column "quantity" Int] -- body of transform

clean :: TypedDataFrame [Column "price" (Maybe Double), Column "quantity" Int, Column "comments" T.Text] -> TypedDataFrame [Column "price" Double, Column "quantity" Int] clean = transform . extract ```

But you can also do the simple thing too and only worry about type safety if you prefer:

``` df |> D.filterWhere (country_code .==. "JPN") |> D.select [F.name name] |> D.take 5 ```

Being able to work across that whole spectrum of type safety is pretty great.

mchav · 2025-11-29T19:46:38 1764445598

RE Jupyter not having advanced features.

Yeah it's a bummer. It seems that notebooks that support these sort of "reactive" workflows are custom built around that model. Marimo, Pluto.jl, and observable are mostly language specific. Creating one would be non trivial.

Do you have your approach documented (tutorial style) anywhere?

mchav · 2025-11-29T19:43:45 1764445425

The rule of thumb is somewhere between 5 and 10x difference. Which is large if you're going to do anything heavy but for most practical purposes it's fine. Roughly the difference between C and Python.