More

anardil · on Nov 4, 2024

I'll admit it's mostly this way because I thought ExistentialQuantification sounded cool and wanted to give a try with classes - this could definitely be tidied up

anardil · on Nov 3, 2024

The rules were different in 2014 when I made my account! It's actually quite annoying because lots of 3rd party GitHub integrations puke immediately saying I have an invalid username.

anardil · on Nov 3, 2024

You're right! Data.ByteString.Lazy is Word8 under the covers, so wide characters are truncated. tr takes a similar short cut. Swapping to Data.Text would fix that.

Where simplicity conflicted with compatibility, I've chosen the former so far. Targeting the BSD options and behavior is another example of that. The primary goal is to feel out the data flow for each utility, rather than dig into all the edges.

anardil · on Nov 3, 2024

Definitely. Depending on how long you've spent staring at the contents of /bin/ and /usr/bin/ you'll notice there are definitely some array or matrix oriented utils (or options) missing like column.

cut comes to mind as a difficult one. In C, you can just hop around the char buffer[] and drop nulls in place for fields, etc before printing. You could go that way a Data.Array Char, but that's hard to justify as functional.

Muromec · on Nov 3, 2024

Shouldn't cut but the easiest one to do in functional style?

You basicaly map one line of a stream to another with some filtering and joining. Do I miss the part where it's terribly slow and/or not doable in Haskell or something?

anardil · on Nov 3, 2024

I'll have to give this a try, thank you for the suggestion!

anardil · on Nov 3, 2024

Wow, hello! This is my repository. I'm happy to answer any questions.

faragon · on Nov 3, 2024

Very beautiful implementation of the awk interpreter in less than 600 lines!

https://github.com/Gandalf-/coreutils/blob/master/Coreutils/...

anardil · on Nov 3, 2024

Thank you! This is one of my favorites. User declared variables are next on the todo list, when I get back to it.

OskarS · on Nov 3, 2024

It is really gorgeously written Haskell. I’ve only dabbled in Haskell, but you’re really shetting my appetite for digging in deeper.

bts · on Nov 3, 2024

Hi! A few years ago I found myself wanting an equivalent of `column` that didn’t strip color codes. After I implemented it in Haskell, I found it was useful to use Nix to force statically linking against libraries like gmp to reduce startup time. Perhaps what I ended up doing might be helpful for you too: https://github.com/bts/columnate/blob/master/default.nix

anardil · on Nov 3, 2024

Thank you for the suggestion, I'll give this a whirl! I've fussed around with `--ghc-options '-optl-static -fPIC'` and the like in years past without success.

cosmic_quanta · on Nov 3, 2024

Could you speak to the advantages of Haskell's lazy IO? I only hear about its disadvantages usually

habitue · on Nov 3, 2024

I imagine for streaming tools like these it's pretty convenient. You don't have to manage buffers etc, just write code against a massive string and haskell takes care of streaming it for you and pulling in more data when needed.

There are libraries that handle it, but they probably have weird types, you can just use functions in the prelude to write a lot of these basic utilities.

jerf · on Nov 4, 2024

Unfortunately, while that may be the dream, it doesn't work out that way if you want good performance. If you look at the source you'll see that it uses things like https://hackage.haskell.org/package/streaming-bytestring-0.3... a lot.

For one thing, a "string" in Haskell by default is a linked list of unicode characters, so right out of the gate you've got big performance problems if you want to use strings. The exact way laziness is done also has serious performance consequences as well; when dealing with things as small as individual characters all the overhead looms large as a percentage basis. One of the major purposes of any of the several variants of ByteString is to bundle the bytes together, but that means you're back to dealing with chunks. Haskell does end up with a nice API that can abstract over the chunks but it still means you sometimes have to deal with chunks as chunks; if you turn them back into a normal Haskell "string" you lose all the performance advantages.

It can still come out fairly nice, but if you want performance it is definitely not just a matter of opening a file and pretending you've just got one big lazy string and you can just ignore all the details; some of the details still poke out.

habitue · on Nov 4, 2024

I mean, I'm aware of the downsides, the OP asked why someone might use it. Ease of use seems like a reasonable upside

anardil · on Nov 3, 2024

It definitely has some sharp edges. One advantage is skipping computations (and the IO they'd need) that don't end up getting used, which let's you do some clever looking things/ ignore some details. That's hard to take advantage of in practice, I think.

The other advantage is just deferring IO. For instance in split or tee, you could decide that you need 500 output files and open all the handles together in order to pass them to another function that will consume them. I'd squint at someone who wrote `void process_fds(int fds[500]);`, but here it doesn't matter.

mrkeen · on Nov 3, 2024

If your language doesn't give you laziness, you're reinventing it yourself with strict primitives each time.

vacuity · on Nov 3, 2024

On the other hand, when you don't want laziness you really won't like if it's present anyways.

weebull · on Nov 3, 2024

Lazy to strict is reasonably easy to do though. The problem is normally that once one bit goes strict, most other things implicitly do too.

Strict to lazy is normally a rewrite.

weebull · on Nov 3, 2024

Lazy to strict is reasonably easy to do though. The problem is normally that once one bit goes strict, most other things implicitly do too.

Strict to lazy is normally a rewrite.

vacuity · on Nov 3, 2024

I think the scope of lazy constructs should usually be far less than that of strict constructs, so it's only in the cases where the librarified lazy abstractions don't fit that you need a rewrite. Lazy to strict isn't hard, but I don't want the performance and cognitive overhead of lazy-by-default.

aeonik · on Nov 3, 2024

You specify "fast", can you elaborate on the performance of the collection? How does it compare to the standard core utils?

Great work, looks amazing.

anardil · on Nov 3, 2024

Performance (execution, memory) is generally in the same ballpark as the BSD versions, with some caveats specific to utils that do lots of in place data manipulation.

cut comes to mind as an example, slicing and dicing lines into fields quickly without a ton of copies isn't easy. Using Streaming.ByteString generally makes a huge difference, but it's extremely difficult to use unless you get can your mind to meld with the types it wants. Picking it up again months later takes some serious effort.

Vosporos · on Nov 3, 2024

Fantastic work, thank you so much!

anacrolix · on Nov 4, 2024

LOTR fan detected

anardil · on July 4, 2023

https://anardil.net

D&D, tech, and scuba diving!

anardil · on Jan 14, 2023

It's an interesting/challenging exercise to work through implementing some basic functions in BF, ie thinking only in loops.

I implemented an optimizing interpreter in Bash[1] on a plane ride, and it's still one of my favorite pet projects.

[1]: https://github.com/Gandalf-/BrainBash

teaearlgraycold · on Jan 15, 2023

In bash? You're a madman.

Here's my entry - https://github.com/danthedaniel/BF-JIT

anardil · on April 6, 2022

https://goto.anardil.net/ My root dashboard, links to all other sites!

Some interesting sub-sites

https://www.anardil.net/ My blog on programming and CS projects

https://diving.anardil.net/ Scuba diving pictures + taxonomy + game

https://timelapse.anardil.net/ Raspberry Pi timelapse videos since 2019

https://sensors.anardil.net/ Raspberry Pi temperature sensor plotting

stsourlidakis · on April 6, 2022

The diving taxonomy looks cool!

anardil · on Dec 21, 2021

I can second this; I've been with them for 8 years without complaints and really appreciate the record detail masking for my .net domain