Reining in the thundering herd: Getting to 80% CPU utilization with Django

stingraycharles · on Aug 15, 2021

Tangent, but I always had a different understanding of the “thundering herd” problem; that is, if a service is down for whatever reason, and it’s brought back online, it immediately grinds to a halt again because there are a bazillion requests waiting to be handled.

And the solution to this problem is to slowly, rate-limited, bring the service back online, rather than letting the whole thundering herd go through the door immediately.

toast0 · on Aug 16, 2021

That's really not the traditional meaning of thundering herd, which is about waking up all the processes when a connection comes in, then they all try to accept it and it's a lot of work for nothing. You get much better results if only a single process is woken up for each event.

Your problem is a real problem though. Where I worked, we would call that backlog, and we would manage it with 'floodgates' ... When the system is broken, close the gates, and you need to open them slowly.

In an ideal world, your system would self-regulate from dead to live, shedding load as necessary, but always making headway. But sometimes a little help is needed to avoid the feedback loop of timed out client requests that still get processed on the server keeping the server in overload.

Ozzie_osman · on Aug 15, 2021

Yea you are right. It could be a service being down and requests piling up, or a cache key expiring and many processes trying to regenerate the value at the same time, etc.

I think the article just used this phrase to describe something else. (Great article otherwise).

fanf2 · on Aug 15, 2021

There is an explanation of this kind of thundering herd about 3/4 down this article https://httpd.apache.org/docs/trunk/misc/perf-scaling.html

The short version is that when you have multiple processes waiting on listening sockets and a connection arrives, they all get woken up and scheduled to run, but only one will pick up the connection, and the rest have to go back to sleep. These futile wakeups can be a huge waste of CPU, so on systems without accept() scalability fixes, or with more tricky server configurations, the web server puts a lock around accept() to ensure only one process is woken up at a time.

The term (and the fix) dates back to the performance improvement work on Apache 1.3 in the mid-1990s.

taylorhughes · on Aug 15, 2021

Phrase borrowed from excellent uWSGI docs https://uwsgi-docs.readthedocs.io/en/latest/articles/Seriali...

ambicapter · on Aug 15, 2021

Funny reading this comment after reading the article

> So many options meant plenty of levers to twist around, but the lack of clear documentation meant that we were frequently left guessing the true intention of a given flag.

And then reading your link, they complain >inside the docs< that the docs aren't complete. I have no idea what to believe anymore :D

remram · on Aug 16, 2021

The uWSGI docs also say, in the section called "uWSGI developers are fu*!ing cowards": "why --thunder-lock is not the default when multiprocess + multithread is requested? This is a good question with a simple answer: we are cowards who only care about money."

Strange read.

lookACamel · on Aug 16, 2021

That's not the thundering herd. If someone rings the door (request), only one person (agent, process) needs to answer the door. But what might happen is that everyone in the house rushes to answer the door. The people "thundering" to the door (and making a mess as they do so) are the "herd". This can quickly become a problem if there are a lot of people in the house and the doorbell keeps ringing.

thaumasiotes · on Aug 16, 2021

> but I always had a different understanding of the “thundering herd” problem; that is, if a service is down for whatever reason, and it’s brought back online, it immediately grinds to a halt again because there are a bazillion requests waiting to be handled.

That... doesn't have much to do with the thundering herd problem. It also doesn't make much sense as a concept on its own merits -- say you come in to work and your inbox is full enough for three inboxes. Does that fact, in itself, mean that you decide you're done for the day? No, it just means you have a much longer queue to work through than usual.

The thundering herd problem refers to what happens when (1) a bunch of agents come to you for something while you're busy; (2) you tell them all "I'm busy, go away and come back later"; and (3) the come-back-later time you give to each of them is identical, so they all come back simultaneously.

And that's exactly what's happening here, except that instead of giving each worker thread a come-back-later time when it asks for work, you're receiving work, sending out individual messages to every worker saying "hey, I'm not busy anymore, come back RIGHT NOW and get some more work", and then rejecting all but one of the thundering herd that shows up. The reason the Gunicorn docs and the uWSGI docs both refer to this as a "thundering herd" problem is that it's a near-perfect match for the problem prototype. The only difference is that, instead of giving out identical come-back-later times to worker threads as they ask you for work, you tell them to wait for a notification that includes a come-back-later time, and then when you get one piece of work you fire off that notification separately to every sleeping thread, including identical come-back-later times in each one.

toast0 · on Aug 16, 2021

> That... doesn't have much to do with the thundering herd problem. It also doesn't make much sense as a concept on its own merits -- say you come in to work and your inbox is full enough for three inboxes. Does that fact, in itself, mean that you decide you're done for the day? No, it just means you have a much longer queue to work through than usual.

If my SLA is 24 hour response time, and the inbox is FIFO, and I can't drop old messages, I'm most likely not hitting the SLA. If they all came in overnight, I'll hit the SLA for day 1, but I will be busy all of day 2 and 3 and never respond on time. If after day 1, I get a days worth of messages every day, I'll never catch up.

thaumasiotes · on Aug 16, 2021

OK. But that's not a problem of a thundering herd. It's a problem that you have more incoming work than you are theoretically able to handle even if you stay in continuous operation. Your problem is solved by adding the capacity to do more work. The thundering herd is solved by purposefully desynchronizing incoming work requests.

toast0 · on Aug 16, 2021

Oh, I agree it's not thundering herd, but it is a real problem. Especially if you start getting retries after the first requests timed out. Some sort of backoff with jitter to avoid synchronized retries helps, but what really helps is dropping or not accepting requests when the processing will not be timely. That's simple to say, but not always simple to do.

Adding capacity is also simple to say, but not always simple to do. And there can be a large difference between the capacity needed to handle a cold start at peak vs the capacity needed for peak under regular operations.

c_o_n_v_e_x · on Aug 16, 2021

This reminds me of inrush current when starting large motors... You get a huge current spike when you initially turn on the motor, so large that it can trip the breaker.

One solution is to use a soft starter which slow brings the motor up to speed.

luhn · on Aug 15, 2021

Unfortunately HAProxy doesn't buffer requests*, which is necessary for a production deployment of gunicorn. And for anybody using AWS, ALB doesn't buffer requests either. Because of this I'm actually running both HAProxy and nginx in front of my gunicorn instances—nginx in front for request buffering and HAProxy behind that for queuing.

If anybody is interested, I've packaged both as Docker containers:

HAProxy queuing/load shedding: https://hub.docker.com/r/luhn/spillway

nginx request buffering: https://hub.docker.com/r/luhn/gunicorn-proxy

* It does have an http_buffer_request option, but this only buffers the first 8kB (?) of the request.

Twirrim · on Aug 16, 2021

Couldn't Apache httpd just do all of that for you? mod_buffer provides request buffering, and mod_proxy_balancer provides load balancing capabilities.

luhn · on Aug 16, 2021

Can Apache do request queuing?

Twirrim · on Aug 16, 2021

Yes, under MPM, you have a listen backlog. If there's a spare worker available to process a request, they'll be picked off the backlog.

https://httpd.apache.org/docs/2.4/mod/mpm_common.html#listen...

luhn · on Aug 20, 2021

I don't think that's what I'm looking for, that's queuing at the front of the pipe but I need it queuing at the end of the pipe. Apache should be buffering and queuing lots of connections (with a timeout) and sending them single-file in gunicorn.

This lays out what I'm trying to achieve: https://aws.amazon.com/builders-library/using-load-shedding-...

jhgg · on Aug 16, 2021

This is somewhat suspect. At my place of work, we operate a rather large Python API deployment (over an order of magnitude more QPS than the OP's post). However, our setup is... pretty simple. We only run nginx + gunicorn (gevent reactor), 1 master process + 1 worker per vCPU. In-front of that we have an envoy load-balancing tier that does p2c backend selection to each node. I actually think the nginx is pointless now that we're using envoy, so that'll probably go away soon.

Works amazingly well! We run our python API tier at 80% target CPU utilization.

lddemi · on Aug 16, 2021

glad you are seeing such awesome performance with gevent+envoy! which part of our experience do you think is suspect?

jhgg · on Aug 16, 2021

So, in guincorn default mode (sync), the mode I'm assuming you're using. This means you really have 1 process handling 1 request at a time. The "thundering herd" problem really only applies to connection acceptance. Which is to say, that in the process of accepting a connection, it is possible to wake all idle processes that are waiting for a connection comes in (they will wake and hit EAGAIN and then go back to waiting.) Busy processes that are servicing requests (not waiting on the accept call) will not be woken, since they aren't waiting on a new request to come in. The "thundering herd" problem as I understand it, can indeed waste CPU cycles, but only on processes that aren't doing much anyways. I do however believe that `accept()` calls have been synchronized between processes on Linux for a while now to prevent spurious wakeups. You should verify you're actually doing spurious wakeups by using `strace` and seeing if you are seeing a bunch of `accept()` calls returning EAGAIN.

In gunicorn, `sync` mode does exhibit a rather pathological connection churn, because it does not support keep-alive. Generally, most load balancing layers already will do connection pooling to the upstream, meaning, your gunicorn processes won't really be accepting much connections after they've "warmed up". This doesn't apply in sync mode unfortunately :(. Connection churn can waste CPU.

Another thing to also note is that if you have 150 worker processes, but your load balancer only allows 50 connections per upstream, chances are 100 of your processes will be sitting there idle.

Something just doesn't feel quite right here.

EDIT: I do see mention of `gthread` worker - so you might be already able to support http-keepalives. If this is the case, then you should really have no big thundering herd problem after the LB establishes connections to all the workers.

mac-chaffee · on Aug 16, 2021

Could the discrepancy be explained by the type of responses?

Sounds like an app like clubhouse might have lots of small, fast responses (like direct messaging), where very little of the response time is spent in application code. Does your API happen to do a lot of CPU-intensive stuff in application code?

jhgg · on Aug 16, 2021

Our app is also a messaging app. So lots of small & fast responses.

kvalekseev · on Aug 15, 2021

HAProxy is a beautiful tool but it doesn't buffer requests that is why NGINX is recommended in front of gunicorn otherwise it's suspectible to slowloris attack. So either cloubhouse can be easily DDOS'd right now or they have some tricky setup that prevents slow post reqests reaching gunicorn. In the blog post they don't mention that problem while recommend others to try and replace NGINX with HAPRoxy.

lddemi · on Aug 15, 2021

1. HAProxy does support request buffering https://cbonte.github.io/haproxy-dconv/2.2/configuration.htm...

2. our load balancer buffers requests as well

kvalekseev · on Aug 15, 2021

From HAProxy mailing list about http_buffer_request option https://www.mail-archive.com/haproxy@formilux.org/msg23074.h...

> In fact, with some app-servers (e.g. most Ruby/Rack servers, most Python servers, ...) the recommended setup is to put a fully buffering webserver in front. Due to it's design, HAProxy can not fill this role in all cases with arbitrarily large requests.

A year ago I was evaluating recent version of HAProxy as buffering web server and successfully run slowloris attack against it. Thus switching from NGINX is not a straightforward operation and your blog post should mention http-buffer-request option and slow client problem.

TekMol · on Aug 15, 2021

Performance is the only thing that is holding me back to consider Python for bigger web applications.

Of the 3 main languages for web dev these days - Python, PHP and Javascript - I like Python the most. But it is scary how slow the default runtime, CPython, is. Compared to PHP and Javascript, it crawls like a snake.

Pypy could be a solution as it seems to be about 6x faster on average.

Is anybody here using Pypy for Django?

Did Clubhouse document somewhere if they are using CPython or Pypy?

zinodaur · on Aug 16, 2021

I feel like people let go of perf too easily.

When using something like Golang, I have apps doing normal CRUD-ish queries at 10k QPS, on 32c/64g machines. For most web apps, 10k QPS is much more than they will ever see, and the fact that it is all done in a single process means you could do really cool things with in-memory datastructures.

Instead, every single web app is written as a distributed system, when almost none of them need to be, if they were written on a platform that didn't eat all of their resources.

midrus · on Aug 16, 2021

I could rephrase you comment as why would anyone use Go when I could just use assembler or C and keep all into a single node.

People don't use python because they want performance. People use python because of productivity, frameworks, libraries, documentation, resources and ecosystems. Most projects don't even need 10k qps, but instead most projects do need an ORM, a migrations system, authentication, sessions, etc. Python has bottle tested tools and frameworks for this.

jusssi · on Aug 16, 2021

People have been taught to be irrationally afraid of in-process concurrency (including async). Not too long ago the standard approach for concurrency was "it's hard, don't do it".

I've been told off in code review for using Python's concurrent.futures.ThreadPoolExecutor to run some http requests (making the code finish N times faster, in a context where latency mattered) "because it's hard to reason about".

mixmastamyk · on Aug 16, 2021

Backend controller performance is rarely a bottleneck, and if raw-compute still is there are a number of ways to speed it up, such as cython and/or work queues.

domino · on Aug 15, 2021

Clubhouse is using CPython

TekMol · on Aug 15, 2021

Interesting. Is there a reason for this?

mst · on Aug 15, 2021

It's the standard and best supported approach and the level of speedup you get from PyPy is significantly workload dependent.

truffdog · on Aug 15, 2021

I mean... it's the default?

fnord77 · on Aug 15, 2021

The top 5 web app programming languages by market share are PHP, Java, JS, Lua and Ruby.

fnord77 · on Aug 17, 2021

lol at the downvotes. sorry your favorite language isn't on the list.

I specifically said "market share", not "best" or "favorite".

https://www.wappalyzer.com/technologies/programming-language...

TekMol · on Aug 15, 2021

...said a stranger on the internet without any sources to back up this claim.

sdze · on Aug 15, 2021

only one data point though, but... https://redmonk.com/rstephens/2021/08/05/top-20-june-2021/

TekMol · on Aug 15, 2021

That one lists the top 5 as:

    Javascript
    Python
    Java
    PHP
    C#

But is it about web?

fnord77 · on Aug 17, 2021

source: https://www.wappalyzer.com/technologies/programming-language...

IshKebab · on Aug 15, 2021

Typescript is a nicer language than Python in many ways and it doesn't suffer from Python's crippling performance issues or dubious static typing situation. Plus you can run it in a browser so there's only one language to learn.

franga2000 · on Aug 16, 2021

Typescript would be nice if it weren't essentially just a bunch of macros for JavaScript. As it is now, as soon as you want to run it, you lose all the benefits of it (including many performance optimizations that could be made in a statically typed runtime) and of course, all the usual footguns of vanilla JS still apply. It's a great development tool though, I'll give you that.

IshKebab · on Aug 16, 2021

> just a bunch of macros for JavaScript

It isn't any macros. Not sure what you're talking about.

> all the usual footguns of vanilla JS still apply

Yeah that does suck but fortunately ESLint and Typescript have options to prevent most of them. If you use Deno they're enabled by default.

> including many performance optimizations that could be made in a statically typed runtime

Also true, but we're comparing it to PHP and Python.

TekMol · on Aug 15, 2021

You cannot run TS in a browser.

You can compile it to JS or to Webassembly. But you can do that with every language.

IshKebab · on Aug 16, 2021

You can't compile TS to WASM, and there's a big difference between just stripping types from TS and transpiling Java or even something like Dart.

Way to miss the point.

petargyurov · on Aug 16, 2021

> Which exacerbated another problem: uWSGI is so confusing. It’s amazing software, no doubt, but it ships with dozens and dozens of options you can tweak.

I am glad I am not the only one. I've had so many issues with setting up sockets, both with gevent and uWSGI, only to be left even more confused after reading the documentation.

j4mie · on Aug 15, 2021

If you’re delegating your load balancing to something else further up the stack and would prefer a simpler WSGI server than Gunicorn, Waitress is worth a look: https://github.com/pylons/waitress

powerbook5300CS · on Aug 15, 2021

Aside: AWS only allows registering 1000 targets in a target group… i wonder if thats the limit they hit. If so, its documented.

tarasglek · on Aug 16, 2021

Have to wonder how well haproxy works vs balancing by making gunicorn listen via SO_REUSEPORT and letting the kernel balance instead (ala https://talawah.io/blog/extreme-http-performance-tuning-one-...)

JanMa · on Aug 15, 2021

Interesting to read that they are using Unix sockets to send traffic to their backend processes. I know that it's easily done when using HaProxy but I have never read about people using it. I guess the fact that they are not using docker or another container runtime makes sockets rather simple to use.

kvalekseev · on Aug 15, 2021

It's standard way to connect things in UNIX and provides better performance. For example postgresql tcp+ssl is 175% slower than socket https://momjian.us/main/blogs/pgblog/2012.html#June_6_2012

lttlrck · on Aug 16, 2021

But domain sockets only work between processes on the same machine, why would SSL be used in that case?

mst · on Aug 15, 2021

I do that every chance I can get.

At a guess, it's probably most loved by people picking old school simple architectures that aren't the sort of thing that goes viral.

ram_rar · on Aug 16, 2021

> Python's model of running N separate processes for your app is not as unreasonable as people might have you believe! You can achieve reasonable results this way, with a little digging.

I have been through this journey, we eventually migrated to Golang and it saved a ton of money and firefighting time. Unfortunately, python community hasnt been able to remove GIL, it has its benefits (especially for single threaded programs), but I believe the cost (lack of concurrent abstractions. async/await doesn't cut it) far outweigh it.

Apart from what the article mentions, other low hanging fruits worth exploring are

[1] Moving under PyPy (this should give some perf for free)

[2] Bifurcate metadata and streaming if not already. All the django CRUD stuff could be one service, but the actual streaming should be separated to another service altogether.

jstrong · on Aug 16, 2021

I read the article and could not believe that was their takeaway. sometimes people are determined to vindicate their technology choices, no matter what.

stu2010 · on Aug 15, 2021

Interesting to see this. It sounds like they're not on AWS, given that they mentioned that having 1000 instances for their production environment made them one of the bigger deployments on their hosting provider.

If not for the troubles they experienced with their hosting provider and managing deployments / cutting over traffic, it possibly could have been the cheaper option to just keep horizontally scaling vs putting in the time to investigate these issues. I'd also love to see some actual latency graphs, what's the P90 like at 25% CPU usage with a simple Gunicorn / gevent setup?

ksec · on Aug 15, 2021

I was wondering that too, but there aren't that many common cloud provider that has 96 vCPU offering.

I am also wondering on 144 Workers, on 96 vCPU which is not 96 CPU Core but 96 CPU thread. So effectively 144 Workers on 48 CPU Core possibly running at sub 3Ghz Clock Speed. But it seems they got it to work out in the end. ( May be at the expense of latency )

mst · on Aug 15, 2021

Assuming you're running a system where normal request/response handling blocks on database queries it's often optimal to have more workers than available cpu threads and 1.5x is a common rule of thumb to try first.

dilyevsky · on Aug 15, 2021

Kinda funny they decided paying a ton of money to aws was ok but paying for nginx plus was not

Spivak · on Aug 15, 2021

I kinda get that honestly. It’s why I’ll spend $20 without even thinking for take out but not spend $2 for an app. It’s because the cost off the software is way way more than the money. It’s a commitment to actually use it and integrate it, deal with their sales team, talk to purchasing, handle licensing, and introducing friction to replacing it or using tools that don’t integrate well because “well we already pay for it.” Licensing also complicates deployments substantially when you’re doing lots of autoscaling.

And on top of that Nginx Plus is also expensive as hell.

rowanG077 · on Aug 15, 2021

The buy in into AWS is much, much larger then using a piece of software though.

dilyevsky · on Aug 15, 2021

Don’t you have to integrate cloud? This whole post is about having to put a bunch of workaround bc the cloud can’t scale apparently

ClumsyPilot · on Aug 15, 2021

"It’s why I’ll spend $20 without even thinking for take out but not spend $2 for an app."

I pay for apps, its not a healthy attotude

spullara · on Aug 15, 2021

The difference people see, as far as I can tell, is that AWS is charging you cost+ and pure software companies need to charge for value or die.

dilyevsky · on Aug 16, 2021

Maybe for barebones compute they’re cost+ but I don’t think that’s really true for other services. For example traffic should cost effectively zero to them but they charge a huge premium. Some other managed services also appear to use value based pricing

andrenotgiant · on Aug 15, 2021

ClubHouse runs on AWS?

dilyevsky · on Aug 15, 2021

Hm actually might be google based on what their traffic is going to (i only looked just now). Ok now it makes more sense why support wasn’t able to figure this out =)

vvatsa · on Aug 15, 2021

ya, I pretty much agree with 3 suggestions at the end:

* use uWSGI (read the docs, so many options...)

* use HAProxy, so very very good

* scale python apps by using processes.

latchkey · on Aug 15, 2021

If it is just a backend, why not port it over to one of the myriad of cloud autoscaling solutions that are out there?

The opportunity cost of spending time figuring out why only 29 workers are receiving requests over adding new features that generate more revenue, seems like a quick decision.

Personally, I just start off with that now in the first place, the development load isn't any greater and the solutions that are out there are quite good.

lddemi · on Aug 15, 2021

Author here. We do and did use autoscaling heavily but at a certain scale we just ran out of headroom on the smaller instance types we were using. Jumping to a much larger instance types meant that we will likely never run into those headroom issues again, plus solves other problems like faster spin up, better sidecar connection pooling and allows for a much higher hit rate on per instance caching.

TekMol · on Aug 15, 2021

Did you consider switching form CPython to Pypy?

latchkey · on Aug 15, 2021

You were autoscaling a single threaded process. You had 1000 connections coming in and scaling 1000 workers for those connections. Everything was filtered through gunicorn and nginx, which just adds additional latencies and complexity, for no real benefit.

What I'm talking about is just pointing at something like AppEngine, Cloud Functions, etc... (or whatever solution AWS has that is similar) and being done with it. I'm talking about not running your own infrastructure, at all. Let AWS and Google be your devops so that you can focus on building features.

aeyes · on Aug 16, 2021

According to the article they have a monolithic Django application so this will have at least a couple of seconds start-up time. That is not a good match for Cloud Functions.

Django also has in-memory caches, for example for templates which can be extremely slow (seconds) and CPU intensive to render. So you really don't want to have AWS or Google restart your application on AppEngine whenever they feel like it.

scrollaway · on Aug 16, 2021

There's a few reasons why this scenario wouldn't be a good fit for cloud functions, but that "couple of seconds start-up time" can be almost entirely removed from the equation by keeping the Django instance alive (all cloud function type offerings will have a concept of cold and warm starts, and some way to control persistence across calls on the same "instance").

I've run Django on AWS Lambda in a a scenario that scaled between 25-250 calls per second depending on time of day (for a runtime of 5-30 sec). Moving Django's bootstrapping so it would stay warm across calls was very easy.

stuaxo · on Aug 16, 2021

Silly question, but under nginx or apache do django instances persist or are they recreated for every new request?

aeyes · on Aug 16, 2021

The standard gunicorn configuration (and the one shown in the blog post) never restarts worker processes.

gunicorn has an option --max-requests to restart every X requests but unless you have unfixable memory leaks there is no reason to do this.

Nginx can't directly run WSGI applications, you can do it with Nginx Unit which also never restarts processes.

zbentley · on Aug 17, 2021

> unless you have unfixable memory leaks there is no reason to do this.

It's also useful to set this threshold to prevent long-lived connections to services/datastores not used by every request from accumulating and consuming resources on those services.

ddorian43 · on Aug 15, 2021

Now you just 5x their costs.

latchkey · on Aug 15, 2021

Not if you do it right.

a) you get to fire the devops person, which saves $150k+ a year.

b) you add appropriate caching layers in front of everything.

c) you spend time adding features, which generate revenue.

I've done all of this before at scale. This whole case study was written about work I did [1]. Two devs, 3 months to release, first year was $80m gross revenue on $500/month cloud bills. Infinite scalability, zero devops.

[1] https://cloud.google.com/customers/gearlaunch

Nextgrid · on Aug 15, 2021

> you get to fire the devops person, which saves $150k+ a year.

You are deluded or extremely short-sighted if you believe you can actually fire the devops guy. From my experience, the more you stray away from the conventional "dedicated server" paradigm the more you need a devops guy and you are in a very precarious position if you do fire him and something goes wrong.

latchkey · on Aug 16, 2021

You don't hire the devops person until you've scaled to the point that you need one.

Additionally, your thought of having my company held hostage by a single devops person is terrifying. Now you need two of them, which is even more expensive.

It is a great way to bootstrap a company by saving on a salary (or two) that can honestly be engineered out for a lot of SASS businesses. It worked super well for us... and calling someone who did $80m in the first year deluded seems well, rude.

But, if you start off designing systems that scale on their own, you are much better prepared for when you do get some fast growth than dealing with hiring a good devops person (which is extremely hard, as they say.. all the good ones are taken).

At the end of the day, the actual elephant in the room is that django was the wrong choice. You end up having to go through a lot of contortions to make things work, as evidenced by the blog post. The architecture doesn't make things easy to spin up quickly... which creates a lot of bottlenecks. There are better cloud-based solutions.

quickthrower2 · on Aug 16, 2021

If you don’t have a devops person, then you end up with developers pitching in to fill that void. That’s OK and may be desirable but it is still a cost.

_tom_ · on Aug 15, 2021

They are on a back-end that does auto-scaling. They stated that they had problems when scaling up past 1000 nodes.

Now, maybe they could have fixed that issue instead, but going from 29 to 58 workers is easy, it's not the same going to 29,000 to 58,000. And 1000 hosts vs 500 is a non-trivial cost.

PaywallBuster · on Aug 15, 2021

containers would've solved it

one process per container, easy peasy

motoboi · on Aug 15, 2021

you now containers are just processes, right?

This is what they did, but because they didn't need to schedule other jobs on the same machine, kubernetes or even docker would be overkill.

In this case, simple VM orchestration seems like a fine solution.

PaywallBuster · on Aug 16, 2021

Indeed,

but you wouldn't be thinking about instance sizes,

how many processes per instance and

wondering if you're hitting kernel limits with all the issues coming up

zbentley · on Aug 17, 2021

You'd probably be worrying more about instance sizes if you ran a single executor per container; the memory overhead of your app would become a problem very quickly unless it's startup footprint was quite small.

PaywallBuster · on Aug 18, 2021

That's what they're doing now.

One app pool with one worker x number of cores

zbentley · on Aug 19, 2021

I assumed they're managing all those workers under one parent process which compiled their codebase on start. Perhaps that assumption was in error.

motoboi · on Aug 16, 2021

Why not?

Spivak · on Aug 15, 2021

This doesn’t work so easily with architectures with process pools for workers. So now your app server needs to speak docker (or whatever control plane) to spawn new workers and deal with more complicated IPC. Also the startup time is brutal.

One process per container and multiprocessing is a huge lift most of the time. I’ve done it but it can be a mess because you don’t really have as much a handle on containers than subprocesses because you can only poke them at a distance through the control plane.

zbentley · on Aug 17, 2021

> One process per container and multiprocessing

Do you mean multiprocessing inside the containers? Or are you managing multiprocessing child procs by forking into a container somehow? If the latter, I'd be really interested to learn how to do that; I didn't think it was possible, and it would be super useful for some of what I work on.

PaywallBuster · on Aug 18, 2021

That's what they're doing now.

One app pool with one worker x number of cores

Wrapping it around a container makes no difference

trinovantes · on Aug 16, 2021

I've always used nginx for my servers. Is HAProxy that much better to consider learning/switching?

lmilcin · on Aug 15, 2021

1M requests per minute on 1000 web instances is not an achievement, it is a disaster.

It is ridiculous people brag about it.

Guys, if you have budget maybe I can help you up this by couple orders of magnitude.

yuliyp · on Aug 15, 2021

Knowing nothing else, it's hard to know if this is good or not. It's 16 requests per second. Are those requests something like "Render a support article" or are they "Give the user a ranked feed of what they should see on their home screen"? Is most of the logic run by the web server or some combination of app servers / backend services behind it? What kind of hardware does the web server have?

All of those would affect the answer, and would preclude being able to guarantee "up this by couple orders of magnitude"

lmilcin · on Aug 15, 2021

Well, to give you an idea, I am working on an service that implements rather complex business process involving fetching data from multiple sources, parsing binary blobs with market data in proprietary format, saving results to a database and so on. And it does around 10k requests per second on a single relatively normal node (8 cores, 64GB ram, etc.)

And no, it does not require any special tricks. It is regular Java / WebFlux / REST / MongoDB backend service.

CPUs can do really a lot and if your node processes 16 requests per second on a multi-core machine then you are using billions of clock cycles and gigabytes of possible transfer to memory for a single request. Something is not quite right...

polynox · on Aug 16, 2021

As a somewhat imprecise example, if a single "request" requires sorting 2.4 billion integers, then a 2.4 GHz CPU with 16 cores will be able to process at most 16 RPS no matter how much you switch from JavaScript to Java or if you write assembly.

At the end of the day efficiency is ultimately a business problem and not a technical problem and is rarely the thing that tips a project (Clubhouse in the article) from being profitable to being unprofitable. It's usually an investing question - I have X engineer-months to spend. I can cut costs by Y by optimizing stuff or get Z more profit by building a feature. I will choose to optimize stuff if and only if Y>Z as it returns more.

Clubhouse's major costs are probably bandwidth and engineer time rather than servers. That is to say, even if efficiency was infinity for compute (i.e. server costs magically went to zero) it would probably not change Clubhouse's business proposition that much.

More to the point, I think you are uncharitable at best when you say elsewhere that other frameworks and languages won't require more development work. These frameworks (and the choice of language being implicit in that) are specifically designed to reduce development work. Let's examine for example garbage collection. Garbage collection is undeniably more wasteful than other solutions to memory management, absolutely. But would you really argue that garbage collection does nothing to reduce development time? I find that extremely hard to believe, empirically and subjectively having written programs in many environments including bare metal, reference counted or otherwise semi-managed and garbage collected languages. And so it goes with all of the choices these frameworks like Django and Rails take. And it's getting better with time as things like JRuby are developed, inefficiencies in Rails or Django are removed, etc.

luckycharms810 · on Aug 15, 2021

This is a comically yet incredibly common engineering bad take. When you run a company there is only one question to answer, one north star - does it make money ?

smashed · on Aug 15, 2021

To be honest the article does realize this, first blaming it on the poor hindsight from original developer (co-founder) and in the conclusion about maybe rewriting the whole thing.

It seemed to be all about how to extract the most performance from the lemon they had to deal with.

I found the linked reference really informative too: https://rachelbythebay.com/w/2020/03/07/costly/

lmilcin · on Aug 15, 2021

I don't know Python or how complex their domain is but the number of workers suggests to me it is not that complex and their application spends most of its time switching contexts and in inefficient frameworks.

Per my experience most applications that mostly serve documents from databases should be able to take on at least 10k requests per second on a single node. this is 600k requests per minute on one node, compared to their 1M per 1000 nodes.

This is what I am typically getting from a simple setup with Java, WebFlux and MongoDB with a little bit of experience on what stupid things not to do but without spending much time fine tuning anything.

I think bragging about performance improvements when your design and architecture is already completely broken is at the very least embarrassing.

> poor hindsight from original developer (co-founder)

Well, you have a choice of technologies to write your application in, why chose one that sucks so much when there are so many others that suck less?

It is not poor choice, it is lack of competency.

You are co-founder and want your product to succeed? Don't do stupid shit like choosing stack that already makes reaching your goal very hard.

taylorhughes · on Aug 15, 2021

(CH employee here)

The job of the cofounder is to create a thing that people want, which has nothing to do with performance. The first goal is capturing lightning in a bottle with social products. Performance doesn’t matter until the lightning is there, and 99%+ of the time you never have to worry about performance, because you don’t get the lightning. So, probably the correct choice is leveraging the tech stack that gives you the best shot at capturing the lightning. Django seemed to help!

jimsimmons · on Aug 15, 2021

Don’t sweat it buddy. People here just want to stand on your toes and feel taller. Classic HN.

Velocity of development is priority #1 and having something that needs to be scaled is a monumental achievement.

mst · on Aug 15, 2021

Plus, if he could've predicted the pandemic that far in advance there would probably have been plenty of not clubhouse ways to monetise that prescience ;)

lmilcin · on Aug 15, 2021

This is just silly excuse.

The job of the cofounder is also to anticipate possible risks.

And building your company on an astronomically inefficient technology sounds like a huge risk to me.

Those 1000s of servers are probably a very significant cost with such small technical staff. Just by choosing the right technology for the problem, most of that cost could have been avoided.

Django has nothing special in it that would allow building applications faster than in a lot other frameworks that are also much more efficient.

So it is just a matter of simple choice.

Nobody expects people to write webapps in C++ or Rust. Just don't choose technology that is famous for being inefficient.

jimsimmons · on Aug 15, 2021

Python is not astronomically inefficient. Instagram serves like a billion users with it. Job of a cofounder is to build what people want. You can always scale in Silicon Valley by hiring people like you. You can’t build another viral app like clubhouse by hiring from the same crowd.

This may hurt you but the truth is scaling and software engineering is highly commoditised. That’s the whole point of being in the valley. You can hire people for such things and forget about it.

Clubhouse is not a tech company. They don’t have to care about being the best at infra

lmilcin · on Aug 15, 2021

> Python is not astronomically inefficient.

Well, it is. It is a fact.

https://rachelbythebay.com/w/2020/03/07/costly/

> Clubhouse is not a tech company.

When you spin 1000s of nodes you need some tech competency.

Or in other words, if it blew one day and there would be a link to writeup on HN, people would be asking "They had 1000s of servers and nobody competent to maintain it?"

vilified · on Aug 15, 2021

You sound just like the average sports fan commenting after a match about what x player should have done, shouldn't have done, blame it on decisions, style of the trainer, owner etc.. But you're just that.. a fan yapping about how they could do better.

chimen · on Aug 15, 2021

So Django is an "astronomically inefficient technology"? I would just stop if I were you.

nomdep · on Aug 15, 2021

So do you think using Django is stupid? I guess you think the same about every product that uses Ruby on Rails?

lmilcin · on Aug 15, 2021

No, Django is not stupid.

It is the decision to choose it to run load that will require 1000s of servers when it could be handled with 5-10 servers in another technology without more development effort.

mst · on Aug 15, 2021

I doubt they expected that level of request load that early on - I imagine the technology choice was made significantly before the whole pandemic thing started.

themarkers · on Aug 16, 2021

You are right if it’s a technical driving thing. But most are not that case.

CPU is much cheaper for scaling a business.

sdze · on Aug 15, 2021

use PHP ;)

nsizx · on Aug 15, 2021

So much this. Practically any other option is better than Python for web development if you're looking for performance.

_6pvr · on Aug 15, 2021

By this logic, why not Java, C++, Rust, Go, C#?

They’re all web-capable and blow the doors off PHP, Python, etc.

IshKebab · on Aug 15, 2021

Yes all of those would be way better options than Python and probably PHP. Well maybe not C++. You'd have to be pretty crazy to have web developers writing security sensitive code in C++.

The "blame our co-founder for the choice" bit is exactly what that graph about the cost of defects vs how early they are fixed is talking about.

If they had just picked Go or Java right at the start they wouldn't have had to expend all this engineering effort to get to a still-not-very-good solution.

_6pvr · on Aug 15, 2021

This thread arose from a person that said "Use PHP", as an argument to using Python. It's a silly argument.

sdze · on Aug 15, 2021

It was just a silly remark about the snail-like performance of Python.

Another silly thing:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

_6pvr · on Aug 16, 2021

I'm not really seeing what's silly about what you're linking. I'm seeing Java blowing the doors off of PHP, which matches my previous assertion.

tored · on Aug 16, 2021

Much better comparison for typical web load, what this discussion is about.

https://www.techempower.com/benchmarks/#section=data-r20&hw=...

Java wins as expected, but a typical setup with Spring versus the typical top PHP frameworks isn't blowing the doors off. Typical Python + Django is far behind, as someone pointed out.

However what we can see in the diagrams is that ORM layers, regardless of language, are more expensive than what most people realize, even for a compiled language like Java.

Why PHP wins is because it is fast enough, compared to other dynamic languages, but is a better fit for web development than Java or other compiled languages.

_6pvr · on Aug 16, 2021

In the link you provided PHP is 53% as fast as Spring. I would argue that’s an insane performance reduction.

If Python weren’t fast “enough”, would it be used to power so many successful web backends?

tored · on Aug 16, 2021

That is for spring-webflux, is that a typical spring set up for web today?

I haven’t coded spring for a few years now, but I was thinking about the traditional spring setup that most use and that is comparable.

You can of course use Python successfully, my argument is that it is easier with PHP, not that it is not possible with Python.

It is a similar argument compared with Java, it is easier with PHP than Java in a web context. Java has other benefit thats fits better for web services IMHO, general higher performance is one.

sdze · on Aug 15, 2021

C#, Java, C++ need application servers, no?

"Serverless" scales infinitely due to its simpler request/response lifecycle.

_6pvr · on Aug 15, 2021

Serverless is too overloaded a term to have any meaning. I'm not really seeing how Python or PHP "scales infinitely" in any way that C#, Java, C++ couldn't.

tored · on Aug 16, 2021

PHP is usually easier to scale because it just a matter of how many webservers. e.g. apache or nginx, you choose to deploy. This also possible with other platforms, but can be a bit trickier to get right.

For large PHP setups it is usually the number of database connections that is the limiting factor, however that is why historically the replicated MySQL databases was such a good fit for PHP, thus only creating a limit for writes on the master.

_6pvr · on Aug 16, 2021

> For large PHP setups it is usually the number of database connections that is the limiting factor,

For pretty much every modern programming language, IO is the bottleneck over everything else.

To save you some time, there are practically no metrics in which I think PHP beats another programming language other than maturity, and even then, not really.

swiley · on Aug 16, 2021

Php needs an application server as well.

waprin · on Aug 15, 2021

Yet YouTube, Instagram, Pinterest, Reddit, Robinhood, DoorDash, and Lyft backend were originally primarily written in Python. What’s funny is that nobody can really deny Python is slow yet somehow the biggest websites in the world were written in it. More proof that Worse Is Better?

huffmsa · on Aug 15, 2021

In the early stages:

Speed of development is far more important than optimizing CPU usage.

You can fake your way to fast responses with good caching, but there's not really many ways to fake having the best features.

sdze · on Aug 15, 2021

It blows my mind how quickly PHP7.4 processes even shitty code.

catillac · on Aug 15, 2021

Famous last words, but I get the sense that the need to handle this sort of load on Clubhouse is plateauing and will decline from here. The app seems to have shed all the people that drew other people initially and lost its small, intimate feel and has turned into either crowded rooms where no one can say anything, or hyper specific rooms where no one has anything to say.

Good article though! I’ve dealt with these exact issues and they can be very frustrating.

hnarayanan · on Aug 16, 2021

polote · on Aug 15, 2021

I wouldn't be very proud of writing an article like that.

Usually engineering blogs exists to show that there are fun stuff to do in a company. But here it just seems they have no idea, what they are doing. Which is fine, I'm classifying myself in the same category.

Reading the article I don't feel like they have solved their issue, they just created more future problems