We experienced a big hit on our productivity when we were using airflow, as there is significant overhead when running pipelines.
We think this is easier than airflow and needs less setup:
- You don't need a scheduler, neither a db, nor any initial setup. On the contrary, kedro provides the `kedro new` command which will create a project for you that runs out of the box (optionally with a small pipeline example).
- You can run your pipelines as simple python applications, making it easy to iterate in IDEs or terminals
- Tasks are simple python functions, instead of operators
- Datasets are first level citizens. You don't need to explicitly define dependencies between the tasks: they are resolved according to what each task produces/consumes
We also think that a big differentiating factor is the `DataCatalog`. Being able to define in YAML files where your data is and how it is stored/loaded means that the same code will run in any environment given the appropriate configuration files.
This makes testing & moving from development to production much easier.
(Disclaimer - I am one of the lead developers of kedro)
We hope that you give it a try and give us feedback :)
I personally don't think it's that black and white. Not everyone has the same training in best practices for software engineering, and this tool looks like it places some constraints on the anarchy that can result, without requiring huge amounts of front-loading.
I personally find it simpler then airflow since there is less boiler plate required to construct DAGs and in my opinion there is less of a learning curve.