Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I mean task resumption after interruption etc. Like airflow type of tools. Not quite unix task suspend options, this is about data pipelines. For Hadoop-style MapReduce, you can split the task into jobs which can be resumed and discarded etc. Shell scripting is not an elegant way to deal with this, a proper orchestrator tool is better.


You could try the tool my group builds, Arvados (https://arvados.org). We use Common Workflow Language (CWL) as the workflow language. Arvados works great for very large computations and data management at petabyte scale. It really shines in a production environment where data provenance is key. Intelligent handling of failures (which are inevitable at scale) is a key part of the Arvados design.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: