Conflict of interest: I'm happily using Pulsar, I come from an extensive Kafka system, and would like to see Pulsar win this entire space personally.
I see some differences, instead of Pulsar functions RedPanda has gone the extra step of using WASM, suspect the Pulsar community will end up going this direction as the whole community begins this push forward.
Got rid of Zookeeper, I've never truly understood the hatred toward Zookeeper aside from the viewpoint of it being one more external dependency the project requires.
Compatible Kafka API, this is a smart business choice to grab up any business using Kafka that is unhappy with the operational costs of Kafka and want to move off. Pulsar has a connector for Kafka which lets a business leave their existing work entirely untouched and stream to the new source taking the strangler pattern approach. The problem with compatible API is still the fact you need to touch the running system to point over from Kafka to RedPanda and then it opens the cans of worms on how to handle aborting RedPanda roll out and switching back to Kafka without losing data. Business now needs to modify all their existing code where Kafka is producing to also now write to RedPanda.
The other option I see is just the same which is RedPanda has a connector to Kafka and only streams off the existing Kafka which kind of makes the API compatible IMO pointless aside from marketing & sales standpoint with customers.
Kafka also has a KIP to get rid of Zookeeper and I see a bunch of the issues related to the KIP resolved and looks like it should be happening this year.
Doesn't this just internalize the dependency within the project itself? Isn't Redpanda taking on all the effort that Zookeeper has been doing for years, all the edge cases, all the additional support and now the coupling of it within the very project itself?
That is our essential complexity. If you are trying to replicate data to machines, you need to replicate data to machines. We chose raft as the only way. In essence we are much simpler than upstream w.r.t protocols for data replication.
With Pulsar you have to run (as I understand it) not only ZooKeeper, but also Apache BookKeeper. Operationally, Pulsar sounds even more complex than Kafka.
I've never managed any of these, but I know that both ZK and Kafka have a reputation for being operationally complex. I've read comments by other people on HN about Pulsar being complex, too.
I'm optimistic about Pulsar becoming a widely deployed tool once they can get rid of the ZK dependency. In particular since Pulsar seems quite friendly to non-Java languages, while BK requires Java on the client and does not, and will not ever, support other languages.
>Running five node zookeeper cluster was pure overhead.
I have not experience this issue. I run a 5 pod ZK in K8s with each pod's memory: 256Mi & cpu: 0.1 for a couple hundred thousand messages a second with Pulsar.
I think 1.5 Gi and half a core for handling quorum & metadata locking for a stream storage isn't exactly what I would consider overhead. It isn't like deleting ZK tomorrow will not mean that Redpanda doesn't take on the additional resources itself.
I see some differences, instead of Pulsar functions RedPanda has gone the extra step of using WASM, suspect the Pulsar community will end up going this direction as the whole community begins this push forward.
Got rid of Zookeeper, I've never truly understood the hatred toward Zookeeper aside from the viewpoint of it being one more external dependency the project requires.
Compatible Kafka API, this is a smart business choice to grab up any business using Kafka that is unhappy with the operational costs of Kafka and want to move off. Pulsar has a connector for Kafka which lets a business leave their existing work entirely untouched and stream to the new source taking the strangler pattern approach. The problem with compatible API is still the fact you need to touch the running system to point over from Kafka to RedPanda and then it opens the cans of worms on how to handle aborting RedPanda roll out and switching back to Kafka without losing data. Business now needs to modify all their existing code where Kafka is producing to also now write to RedPanda.
The other option I see is just the same which is RedPanda has a connector to Kafka and only streams off the existing Kafka which kind of makes the API compatible IMO pointless aside from marketing & sales standpoint with customers.