I'll admit it's mostly this way because I thought ExistentialQuantification sounded cool and wanted to give a try with classes - this could definitely be tidied up
The rules were different in 2014 when I made my account! It's actually quite annoying because lots of 3rd party GitHub integrations puke immediately saying I have an invalid username.
You're right! Data.ByteString.Lazy is Word8 under the covers, so wide characters are truncated. tr takes a similar short cut. Swapping to Data.Text would fix that.
Where simplicity conflicted with compatibility, I've chosen the former so far. Targeting the BSD options and behavior is another example of that. The primary goal is to feel out the data flow for each utility, rather than dig into all the edges.
Definitely. Depending on how long you've spent staring at the contents of /bin/ and /usr/bin/ you'll notice there are definitely some array or matrix oriented utils (or options) missing like column.
cut comes to mind as a difficult one. In C, you can just hop around the char buffer[] and drop nulls in place for fields, etc before printing. You could go that way a Data.Array Char, but that's hard to justify as functional.
Shouldn't cut but the easiest one to do in functional style?
You basicaly map one line of a stream to another with some filtering and joining. Do I miss the part where it's terribly slow and/or not doable in Haskell or something?
Hi! A few years ago I found myself wanting an equivalent of `column` that didn’t strip color codes. After I implemented it in Haskell, I found it was useful to use Nix to force statically linking against libraries like gmp to reduce startup time. Perhaps what I ended up doing might be helpful for you too: https://github.com/bts/columnate/blob/master/default.nix
Thank you for the suggestion, I'll give this a whirl! I've fussed around with `--ghc-options '-optl-static -fPIC'` and the like in years past without success.
I imagine for streaming tools like these it's pretty convenient. You don't have to manage buffers etc, just write code against a massive string and haskell takes care of streaming it for you and pulling in more data when needed.
There are libraries that handle it, but they probably have weird types, you can just use functions in the prelude to write a lot of these basic utilities.
For one thing, a "string" in Haskell by default is a linked list of unicode characters, so right out of the gate you've got big performance problems if you want to use strings. The exact way laziness is done also has serious performance consequences as well; when dealing with things as small as individual characters all the overhead looms large as a percentage basis. One of the major purposes of any of the several variants of ByteString is to bundle the bytes together, but that means you're back to dealing with chunks. Haskell does end up with a nice API that can abstract over the chunks but it still means you sometimes have to deal with chunks as chunks; if you turn them back into a normal Haskell "string" you lose all the performance advantages.
It can still come out fairly nice, but if you want performance it is definitely not just a matter of opening a file and pretending you've just got one big lazy string and you can just ignore all the details; some of the details still poke out.
It definitely has some sharp edges. One advantage is skipping computations (and the IO they'd need) that don't end up getting used, which let's you do some clever looking things/ ignore some details. That's hard to take advantage of in practice, I think.
The other advantage is just deferring IO. For instance in split or tee, you could decide that you need 500 output files and open all the handles together in order to pass them to another function that will consume them. I'd squint at someone who wrote `void process_fds(int fds[500]);`, but here it doesn't matter.
I think the scope of lazy constructs should usually be far less than that of strict constructs, so it's only in the cases where the librarified lazy abstractions don't fit that you need a rewrite. Lazy to strict isn't hard, but I don't want the performance and cognitive overhead of lazy-by-default.
Performance (execution, memory) is generally in the same ballpark as the BSD versions, with some caveats specific to utils that do lots of in place data manipulation.
cut comes to mind as an example, slicing and dicing lines into fields quickly without a ton of copies isn't easy. Using Streaming.ByteString generally makes a huge difference, but it's extremely difficult to use unless you get can your mind to meld with the types it wants. Picking it up again months later takes some serious effort.