Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

https://scipipe.org - A pipeline tool for shell commands by a declarative flow-based API in Go

Github link: https://github.com/scipipe/scipipe

There are many pipeline tools for shell commands, but a majority has one or more limitations in their API which makes certain complex pipelines impossible or really hard to write.

We were pushing the limits of all the tools we tried, so developed our own, and implemented it in Go, with a declarative API for defining the data flow dependencies, instead of inventing yet another DSL. This has allowed us great flexibility in developing also complex pipelines, e.g. combining parameter sweeps nested with cross-validation implemented as workflow constructs.

SciPipe is also unique in providing an audit report for every single output of the workflow, in a structured JSON format. A helper tool allows converting these reports to either an HTML report, a PDF, or a Bash script that will generate the one accompanying output file from scratch.

An extra cool things is that, because the audit reports live alongside output files, if you run a scipipe workflow that uses files generated by another scipipe workflow, it will pick up also all the history for the input files generated by this earlier workflow, meaning that you get a 100% complete audit report, even if your analysis spans multiple workflows!

(More on the audit/provenance report in this post: https://rillabs.com/posts/provenance-reports-in-scientific-w... )



Very interesting.

This is like snakemake but with go.

The audit trail is very interesting, especially for certified version controlled pipelines.

Is it being continually developed?


Thanks for kind words! Yes, although conceptually it is even more similar to Nextflow. That is, it is push-based and dataflow-based, whereas Snakemake is pull-based, quite similar to Luigi (well, hey, we also developed the SciLuigi plugin to make Luigi a tad bit more data flow-like, although only on the surface).

I'm maintaining and continuing to develop SciPipe, although pretty slowly at the moment, due to a day job that doesn't allow much time to spend on it.

I and a former colleague at pharmb.io who is still using it are having some plans for further features to streamline the authoring experience more though, and I'm actually looking to move to a job (e.g. in academia) that will allow me to work on it a bit more on the side.


Thanks for the reply. I'm going to look deeper at scipipe :). It's good to hear you're going to continue developing it.

What advantages does scipipe have over nextflow?


Thanks for your interest!

I should start with saying that Nextflow is a fantastic tool, with an very thriving community and lots of support and buy-in especially in the biomedical community.

But as you are asking, below are the points that I'm aware of, that motivates keeping me using and updating SciPipe:

Before going into the list though, I want to mention that at the time of publication of our paper [1], SciPipe was unique in comparison with Nextflow in allowing to create re-usable modules, which was our primary motivation for creating our own tool. Later Nextflow introduced re-usable modules in its DSL2.

Still, there are a few more factors weighing in, keeping me using and developing SciPipe:

1. SciPipe has a small and maintainable code base, between ~2k LOC last I checked. Nextflow's codebase is much larger AFAIK.

2. SciPipe has zero external dependencies, outside Go and Bash. Nextflow depends on the Java runtime, a Groovy interpreter, and the GPars library for concurrency.

3. SciPipe can compile pipelines to statically compiled binaries, for extremely simple deployment. Nextflow will always require the JVM + Nextflow to be installed.

4. SciPipe has per file audit reports. Nextflow still has per workflow reports only, AFAIK.

5. SciPipe does not require learning a new language, or installing new tooling (editors, syntax highlighting etc), apart from Go, a Go editor and related tooling.

6. SciPipe can handle more concurrent tasks. There are reports that Nextflow can not handle more than 512 concurrent tasks [2]. I have tested SciPipe with 4999 concurrent tasks (It then hit a limit of maximum subprocesses in Go, which I think might even be removed recently. Will have to test if I can go even further still).

7. Debugging SciPipe code can be done using Go debugging tools (Delve or just CGDB). Last I checked, using JVM-based debugging tools for Nextflow code was rather confusing to say the least, as you would end up deep in a stack of Groovy-parsing, and you would be pretty much left to do print-statement based debugging. I don't know if the situation has changed since then.

[1] https://doi.org/10.1093/gigascience/giz044

[2] https://www.nature.com/articles/s41598-021-99288-8




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: