More

tchiotludo · on Aug 2, 2022

All the issues described in this post lead me to create Kestra [0] . Airflow was a true revolution when it was open-source and we need thanks its innovation. But I totally agree that a large static dag is not appropriate in the actual data world with data mesh and domain responsibility.

[0] https://github.com/kestra-io/kestra

tchiotludo · on March 26, 2022

Temporal.io is a really cool framework for building business process like managing microservice workflow (like paiement workflow: user pay, we call the shipping microservice, the billing microservice, ...) and good fit to handle individual event (lots of individual events).

Kestra (and so airflow) is more a workflow manager to handle data pipeline like moving large dataset (batch) between different source and destination, do some transformation inside database (ELT) or with Kestra you are also able to transform the data (ETL) before save it to external systems.

This lead Kestra (and so airflow) to have a lot of connectors to differents systems (like SQL, NOSQL, Columns database, Cloud Storage, ...) that is ready to use out of the box.

temporal.io, since it's first design to handle microservice (proprietary & internal service) don't have this connector out of the box, and you will need code all this interaction.

So my opinion:

Building data pipeline interacting with many standard systems will be done easily & quickly with Kestra (or airflow)

Handling internal business process of micro service will done easily with temporal.io

tchiotludo · on March 24, 2022

You can easily scale down for small project using a mono node using a simple docker compose setup: https://kestra.io/docs/getting-started/

Working well on a standard laptop easily

tchiotludo · on March 24, 2022

The title was changed by moderators and I can't edit it anymore :'(

tchiotludo · on March 24, 2022

Airflow have design issue and performance issue, If you want to have some details, you can find some reason on this article: https://kestra.io/blogs/2022-02-22-leroy-merlin-usage-kestra....

For other workflow engine (dagster, prefect, ...), we decided to use a complete different approach on how to build a pipeline. Since others decide to use python code, we decided to go to descriptive language (like terraform for example). This have a lot of advantages on how the developer user experience is: With Kestra, you can directly the web UI in order to edit, create and run your flows, no need to install anything on the user desktop and no need a complex deployment pipeline in order to test on final instance. Other advantage is that it allow to use terraform to deploy your flows, typical development workflow are: on development environment, use the UI, on production deploy your resource with terraform, flow and all the others cloud resource.

After, it will be really nice to have some independent performance benchmark. I really think Kestra is really fast since it was based on a queue system (Kafka) and not a Database. Since workflow are only events (change status, new tasks, ...) that is need to be consume by different service, database don't seems to be a good choice and my benchmark show that Kestra is able to handle a lot of concurrent tasks without using a lot of CPU.

TTPrograms · on March 24, 2022

FYI some of the Airflow issues are out of date / can be resolved with config changes.

AirFlow 2 is designed to support larger XCOM messages, so the guidance to only use it for small data no longer applies.

Your DAG construction overhead issue is likely due to dagbag refreshing. Airflow checks for DAG changes on a fixed interval, causing a reimport. The default period for that is fairly small, so for large deployments you will want to use a larger period (e.g. at least 5 minutes). I do not know why the default is so short (or was last I checked, anyway). Python files shouldn't do much of note on import regardless IMO.

I am not otherwise familiar with the improvements in Airflow 2, so I cannot say for sure if your other complaints still remain.

tchiotludo · on March 24, 2022

I know that that some issues are fixed in Airflow 2, they have made a large improvement with that release. But not all issues is resolved with this one.

The performance issue is still here, just launch Airflow and submit thousand dagruns with simple python sleep(1) and you will hit the cpu bound very quickly with a total time that will have a large duration. Airflow is not designed for a lot of short duration tasks. When using event driving data flow, it's really complicated to managed.

Imagine a flow that will be triggered for each store for example (thousand of store, with 10+ tasks for each one), Airflow will not be able to manage this kind of workflow quickly (and it's not its goals). Airflow was clearly defined to handle small (hundreds tasks) for a long time.

For the XCOM part, Airflow store this in database, so you can't store data into this, you will need to store a small data (database is not here to store big files). In Kestra, we have a provide a storage that allow storing large data (Go, To, ...) between tasks natively with the pain on multiple node clusters.

TTPrograms · on April 1, 2022

AirFlow 2 was released in 2020. You're saying you knew that these issues were fixed, and then an article is published on your webpage in 2022 knowingly comparing against the technical properties of a major version release 2 years behind? That is not a good look.

tchiotludo · on April 1, 2022

First of all, the article published is a retrospective, we are talking from decision in 2019, we can't talk from the past that leed us for a choice?

Second, not all issues, some of them are fixed but there is still major issue, just dig google about issue scaling airflow on production, even with airflow 2, it's still complicated. Airflow still use a lot of CPU for doing nothing else than waiting for some api call. Just try to run 5000 tasks that sleep (simulation of an api call) in Airflow and we will see the challenge of scaling it.

Third, Airflow have still design issues that will not allow you to deal with some sort of pipeline.

Last one, I'm not here to fight against Airflow, some people love, some people hate it. We have take a completely different choice about designing and scaling data pipeline, I let people used what they like. For me, Airflow (and other workflow manager) doesn't fit.

tchiotludo · on March 24, 2022

The project start as a side project (yet another side project I do the night and weekend) but was quickly promoted and used in a French Big Retail Company.

This one trust on the project and decide to go production with Kestra. So they decide to inject some resource in order to develop some features that need and that is missing.

But basically, not so much people for now. We are trying to start a community around the product and started to communicate around the product since few weeks only, I hope community will follow us! And I hope to succeed like on my other open source project: https://github.com/tchiotludo/akhq

tchiotludo · on March 24, 2022

Do you have a screenshot please ? I didn't notice where. Thanks

tchiotludo · on March 24, 2022

Agree that both are expensive to scale on multiple node. But keep in mind, you can use it with a single node (like others do with a database like mysql).

Just don't go multiple node if not needed by the project. But when you will need to, with Kestra you can go multiple node and scale.

tchiotludo · on March 24, 2022

Yes, of course!

You have 3 solutions for that:

- you can use this task using runner:DOCKER property and choose the image: https://kestra.io/plugins/core/tasks/scripts/io.kestra.core....

- you can also use PodCreate to launch a pod on a kubernetes cluster: https://kestra.io/plugins/plugin-kubernetes/tasks/io.kestra....

- you have also CustomJob from VertexAI on GCP to be able to launch a container a ephemeral cluster (with any CPU / GPU): https://kestra.io/plugins/plugin-gcp/tasks/vertexai/io.kestr...

speedgoose · on March 25, 2022

Great! That makes Kestra more useful than Dagster for me.

tchiotludo · on March 24, 2022

Really thanks wpietri to have understand the hidden sense behind :+1: