> For parallel processing I'd reach for the nearest standard library at hand on the language of choice.
That is a good example of what I mean: The nearest standard library is likely to either buffer output in memory or not buffer at all (in which case you can have the start of one line ending with another line). This means you cannot deal with output bigger than physical RAM. And your test set will often be so small that this problem will not show up.
GNU Parallel buffers on disk. It checks whether the disk runs full during a run and exits with a failure if that happens. It also removes the temporary files immediately, so if GNU Parallel is killed, you do not have to clean up any mess left behind.
You _could_ do all that yourself, but then we are not talking 50 lines of code. Parallelizing is hard to get right for all the corner cases - even with a standard library.
And while you would not have to look up how to use command line parameters on S/O you _would_ be doing exactly the same for the standard libraries.
Assuming you can get better performance is also not given: GNU Sort has built-in parallel sorting. So you clearly would not want to use a standard non-parallelized sort.
Basically I see you have 2 choices: Built it yourself from libraries, or build it as a shell script from commands.
You would have to spend time understanding how to use the libraries and the commands in both cases, and you are limited by whatever the library or the command can do in both cases.
I agree that if you need tighter control than a shell script will give you, then you need to switch to another language.
I agree with everything you said, as always, everything is a trade off. Good point about trickiness of memory management w/parallel processing! Would have to be extra careful to avoid hoarding RAM.
That is a good example of what I mean: The nearest standard library is likely to either buffer output in memory or not buffer at all (in which case you can have the start of one line ending with another line). This means you cannot deal with output bigger than physical RAM. And your test set will often be so small that this problem will not show up.
GNU Parallel buffers on disk. It checks whether the disk runs full during a run and exits with a failure if that happens. It also removes the temporary files immediately, so if GNU Parallel is killed, you do not have to clean up any mess left behind.
You _could_ do all that yourself, but then we are not talking 50 lines of code. Parallelizing is hard to get right for all the corner cases - even with a standard library.
And while you would not have to look up how to use command line parameters on S/O you _would_ be doing exactly the same for the standard libraries.
Assuming you can get better performance is also not given: GNU Sort has built-in parallel sorting. So you clearly would not want to use a standard non-parallelized sort.
Basically I see you have 2 choices: Built it yourself from libraries, or build it as a shell script from commands.
You would have to spend time understanding how to use the libraries and the commands in both cases, and you are limited by whatever the library or the command can do in both cases.
I agree that if you need tighter control than a shell script will give you, then you need to switch to another language.