Conflict of interest: I'm happily using Pulsar, I come from an extensive Kafka s...

voctor · on Nov 16, 2020

> aside from the viewpoint of it being one more external dependency

This is exactly the point. One more dependency, one more executable. Redpanda get rid of that.

ckdarby · on Nov 16, 2020

Kafka also has a KIP to get rid of Zookeeper and I see a bunch of the issues related to the KIP resolved and looks like it should be happening this year.

Doesn't this just internalize the dependency within the project itself? Isn't Redpanda taking on all the effort that Zookeeper has been doing for years, all the edge cases, all the additional support and now the coupling of it within the very project itself?

agallego · on Nov 17, 2020

we have exactly 1 replication protocol. That's raft.

We spend a ton of time ensuring it's correctness

1. https://vectorized.io/validating-consistency/ 2. https://vectorized.io/kafka-redpanda-availability/

That is our essential complexity. If you are trying to replicate data to machines, you need to replicate data to machines. We chose raft as the only way. In essence we are much simpler than upstream w.r.t protocols for data replication.

atombender · on Nov 16, 2020

With Pulsar you have to run (as I understand it) not only ZooKeeper, but also Apache BookKeeper. Operationally, Pulsar sounds even more complex than Kafka.

I've never managed any of these, but I know that both ZK and Kafka have a reputation for being operationally complex. I've read comments by other people on HN about Pulsar being complex, too.

I'm optimistic about Pulsar becoming a widely deployed tool once they can get rid of the ZK dependency. In particular since Pulsar seems quite friendly to non-Java languages, while BK requires Java on the client and does not, and will not ever, support other languages.

blackoil · on Nov 16, 2020

Running five node zookeeper cluster was pure overhead.

ckdarby · on Nov 16, 2020

>Running five node zookeeper cluster was pure overhead.

I have not experience this issue. I run a 5 pod ZK in K8s with each pod's memory: 256Mi & cpu: 0.1 for a couple hundred thousand messages a second with Pulsar.

I think 1.5 Gi and half a core for handling quorum & metadata locking for a stream storage isn't exactly what I would consider overhead. It isn't like deleting ZK tomorrow will not mean that Redpanda doesn't take on the additional resources itself.