mkirchner's comments

mkirchner · on Oct 7, 2023

Ha! This was probably the first serious problem I ever tackled with an open source contribution!

The year was 2002, the 2.4 Linux kernel had just been released and I was making money on the side building monitoring software for a few thousand (mostly Solaris) hosts owned by a large German car manufacturer. Everything was built in parallel ksh code, “deployed” to Solaris 8 on Sun E10Ks, and mostly kicked off by cron. Keeping total script runtime down to avoid process buildup and delay was critical. The biggest offender: long timeouts for host/port combinations that would sporadically not be available.

Eventually, I grabbed W. Richard Stevens’ UNIX network programming book and created tcping [0]. FreeBSD, NetBSD, a series of Linux distros picked it up at the time and it was steady decline from there… good times!

[0]: https://github.com/mkirchner/tcping

edit: grammar

gwright · on Oct 7, 2023

I worked for Rich when he was writing UNP. Seeing comments like this 30+ years later reminds me how fortunate I was to spend time with him early in my career.

AceyMan · on Oct 8, 2023

The syntax is different and I don't remember who compiles the binaries, but as a primarily Windows sysadmin, tcping is literally the first thing I put in my %PATH% on a new jumpbox; even ahead of dig, believe it or not.

(After that it's Russinovich's sysinternala suite.)

kennu · on Oct 7, 2023

Awesome, I think I had the same book in 1990s. Things were so simple back then, when the default approach to anything was to start writing some C code with socket calls.

macintux · on Oct 7, 2023

RIP WRS. Such a fantastic book.

1vuio0pswjnm7 · on Oct 8, 2023

When I was reading his books I used the utility he wrote, sock.

https://github.com/keyou/sock

Levitating · on Oct 9, 2023

How is this different from just using netcat?

Netcat even has port scanning capabilities.

mkirchner · on July 11, 2023

Beautiful, thanks!

mkirchner · on July 11, 2023

#1 doing it in many 20-30min batches with little kids

#2 see #1

Jokes aside, IMHO there was not a single challenge standing out in terms of implementation; getting the pieces to work together and finding residual bugs was hard. LLDB was my friend but still missing valgrind on Mac. Writing (and re-writing) the docs really helped with mental clarity and not having to stress about a deadline was not hurting either. I often just closed the lid and made it my future self's problem with good success ;-)

Oh, and one thing I am proud of: how well the recursive search generalized to path copying. That came together very nicely.

mkirchner · on July 11, 2023

Agree. I was trying to mostly mimick the libavl API, hence the design decision.

mkirchner · on July 11, 2023

Thanks! Yes, basically log_64(n) vs. log_32(n) and it would also require to switch to a 64 bit hash function and adjust hash exhaustion and bit fiddling arithmetic accordingly.

Re the impact of the recursion, that's actually zero for the search code since clang does a tail call optimization; it's a fair point for the removal code.

mkirchner · on July 11, 2023

Thank you, happy to share. These are excellent pointers.

Regarding the batch construction, it's not immediately clear to me how to implement sorting since the order is implicit through the hash function (it seems one would need to construct a trie to build a trie?) but I might be wrong...

JonChesterfield · on July 11, 2023

Hash the keys before/while sorting. The repeated hashing with different seeds is an interesting approach to collisions but would add some annoyance here.

Roughly do just enough work to put the key/values in the same order that you would see them in when iterating through the corresponding trie.

Other way to go would be to implement merge then do the batch construction by partitioning the initial array, building tries out of the pieces then merging them. That could also be the fallback for when the first hash collides. If the partitioning was by the first five bits of the hash you'd get a reasonable approximation to doing the sort.

mkirchner · on July 22, 2022

This. Roughly a year ago I got interested in efficient immutability for my write-from-scratch-in-C Lisp [0] and started to write a HAMT implementation in C [1], along with a (somewhat hacky, you have been warned) benchmarking suite [2].

The docs are only 70% done (in particular the "putting it all together" part is missing) but it has been a really interesting and enlightening journey so far and can only recommend embarking on this path to everyone.

[0]: https://github.com/mkirchner/stutter [1]: https://github.com/mkirchner/hamt [2]: https://github.com/mkirchner/hamt-bench

ignoramous · on July 22, 2022

See also Matt Bierner's HAMT impl in JavaScript: https://blog.mattbierner.com/hash-array-mapped-tries-in-java...

mkirchner · on Dec 7, 2020

Author here.

To add context that seems lost on some: this is a cleaned-up version of my own notes from when I tried to understand the technical detail behind what Linus called "good taste" in the TED talk.

The main contributions of the writeup (if any) are the two conceptual insights that using an indirect pointer yields a homogeneous data structure and a convenient handle to the list item and its predecessor.

The article is not intended to show an example of clean code, there is no checking for NULLs, there are implicit expectations (the target needs to exist). It's just not the point.

I also strongly agree with the sentiment in the discussion that simple is often better than elegant. If it takes an entire article to figure out what's happening, that says something about how careful you should be with putting it in production code.

Anyways, thanks everyone on HN for a great discussion and for all the insights, comments and suggestions!

dhanna · on Dec 8, 2020

FWIW, I think you get way more correct than some of the comments here give you credit for.

mkirchner · on Dec 6, 2020

Author here. Thanks for fixing the title, I should have seen that!