Multi-core HTTP Server with Node.js

axod · on July 19, 2010

>> "While single-process performance is quite good, eventually one CPU is not going to be enough;"

Every article on this sort of thing seems to just gloss over this part. Why isn't one CPU enough? What is using it? Serving static files certainly won't. Doing simple things won't...

Does anyone have any use cases / experience for when this was the case? :/

edit: Fine downmodding fanboys. I get it. Use whatever you like. Meh

sh1mmer · on July 19, 2010

Since Node.js is still fairly new technology people are starting out with 'hello world' examples such as static file servers. Obviously specialised servers like Traffic Server or nginx handle these cases faster.

That said Node is a programming environment so the question is, when on a multi-core machine (which all DC machines are) how can we scale to use all the cores so we can do much harder stuff.

What about a node system to deal with 100k concurrent long-poll connections? When some of those are active they could be really active, requiring all the cores, etc. There are lots of scenarios in which more compute power is useful.

axod · on July 19, 2010

I agree there's cases where more CPU power is useful, but I'm just not sure it's a good idea to firstly assume you need it before it's an issue, and secondly to split the whole thing (networking IO) over multiple cores, rather than just shell out the CPU heavy stuff to multi-cores.

Networking IO isn't CPU heavy. There's no reason to increase complexity and slow throughput in the hope that more CPUs will help...

silentbicycle · on July 19, 2010

Part of node.js's appeal comes from writing all the server code within Javascript, even when it'd be more efficient breaking pieces out into separate programs. In that case, worrying about CPU usage for the server itself makes some sense.

Not saying I agree with the design choices (I'm more of a multiple language / "hard and soft layers" person, and I don't care for Javascript), but I think that's the reason.

tlrobinson · on July 19, 2010

If you're just serving static files why would you be using Node?

axod · on July 19, 2010

so what is the common use case for using Node, and what in that use case eats CPU?

pgriess · on July 19, 2010

Application logic is not free.

The article mentions that using NodeJS as a simple HTTP proxy with no application logic can sustain only 2100 reqs/s before a 2.5GHz Xeon is maxed out. NodeJS uses CPU more efficiently than other HTTP stacks, but its I/O engine is not infinitely scalable.

axod · on July 19, 2010

>> "The article mentions that using NodeJS as a simple HTTP proxy with no application logic can sustain only 2100 reqs/s before a 2.5GHz Xeon is maxed out."

That sounds fairly lame to me. Proxying network traffic isn't a CPU heavy operation. Worst case you have to move a few bits of memory around.

wizard_2 · on July 19, 2010

Thats specious, you really have to know what you're proxying, and squid and varnish supposedly get much less throughput. (Google around http://deserialized.com/reverse-proxy-performance-varnish-vs... ) "Moving bits" in memory is not a measure of anything.

0bfusct3 · on July 19, 2010

Overhead of the operating system and more likely a massive amount of packets per second will easily peg a single core.. I did some tests with nginx (comparable to node.js) and it easily pegged a xenon 2 cpu quad core 8GB ram (all 8 cpu's were 90+%) with a paltry 8055.77 rps over 2 x 10gbit ethernet but then this is more likely an OS / fine tuning limitation

sedachv · on July 19, 2010

"However, rather than accepting connections using this socket, it is passed off to some number of child processes using net.Stream.write() (under the covers this uses sendmsg(2) and FDs are delivered using recvmsg(2)). Each of these processes in turn inserts the received file descriptor into its event loop and accepts incoming connections as they become available. The OS kernel itself is responsible for load balancing connections across processes."

Racing (ie thread-safe) accept() is a really good way to improve server throughput. Epoll is also awesome for being thread-safe.

postfuturist · on July 19, 2010

From the day node.js was released you could run multiple instances on different ports and stick a load balancer in front of it. Even now, I think that is a healthier option than baking the number of processes into the script itself.

sh1mmer · on July 19, 2010

Doing it using Node allows you to use application logic to balance rather than just raw traffic.

This entirely depends on the use cases you have.

mathias_10gen · on July 19, 2010

Why are they using multiple processes rather than multiple worker threads? IPC is much costlier than using shared memory, even if it's just passing the initial state.

Goosey · on July 19, 2010

Multiprocess is going to be more robust against failures, for one. While it is a bit of an apples-to-oranges comparison, I thought Chrome had shown pretty conclusively the benefits of adoption multi-process over multi-threaded

mathias_10gen · on July 19, 2010

I don't think there is a browser that uses a full thread-per-tab model, so there's really nothing to compare it against. The problem with other browsers is that slow JS or in some cases flash in one tab will slow down other tabs.

Also, the tabs in a browser running many sites are much closer to traditional use of processes. Web application servers running the same codebase on each request seem like a better fit for threads. For one thing, the security model for the two uses of JS are very different.

As for robustness, if doing X will cause a crash and your code does X, then you will just have a bunch of crashing processes rather than just one. How is that better? Wouldn't the real solution be to either stop doing X or fix X so that it doesn't cause a crash?

Goosey · on July 21, 2010

<i>As for robustness, if doing X will cause a crash and your code does X, then you will just have a bunch of crashing processes rather than just one. How is that better? Wouldn't the real solution be to either stop doing X or fix X so that it doesn't cause a crash?</i>

Sure, but bug-free code doesn't exist and not all crashes happen 100% of the time. If you have a difficult to track down crash bug that happens for some mysterious reason once every 100,000 requests... Would you rather have that resulting in the entire server blowing up or one request-session blowing up?

tlrobinson · on July 19, 2010

I'm not sure, but I believe V8 is (or was?) not thread safe. Even web workers in Chrome get their own processes.

hackermom · on July 19, 2010

Isn't this tremendously inefficient compared to just running Apache, nginx or whatever floats your boat? Both of them thread perfectly across SMP systems. While it's an interesting implementation, I completely fail to see the point of even using it. Does anyone have any sane usage scenarios they could share?

mmaunder · on July 19, 2010

Node.js lets you write server applications in a server container that can handle tens of thousands of concurrent connections in a loosely typed language like Javascript which lets you code faster. It uses the same design as Nginx which is why it can handle so many connections without a huge amount of memory or CPU usage.

If you were to do this on Nginx you'd have to write the module in C.

You can't do it on Apache because of Apache's multi-process/thread model.

The fact that you can write a web server in a few lines of easy to understand and maintain Javascript that can handle over 10,000 concurrent connections without breaking a sweat is a breakthrough.

Node.js may do for server applications what Perl did for the Web in the 90's.

alecco · on July 19, 2010

  > If you were to do this on nginx you'd have to write the
  > module in C.

You can write a module to glue nginx and V8. Many people've done it. It takes less than 400 lines of code and a lot of it is nginx typical module code. (The problem is more about the lack of nginx online help, perhaps.)

  > The fact that you can write a web server in a few lines of easy to
  > understand and maintain Javascript that can handle over 10,000
  > concurrent connections without breaking a sweat is a
  > breakthrough.

Yes. But the big performance issue still is hitting the database and disks. There's no point in having a super fast web server if the DB is dog slow like the vast majority of databases out there. Including the NoSQL bunch. They are not fixing the issue of latency vs. scalability vs. reliability. For that they need to address many uncomfortable problems of current hardware architectures. This is the elephant in the room.

IsaacSchlueter · on July 31, 2010

Well, at least it's an elephant we're all talking about a lot. It's not as if anyone's ignoring that.

If your DB is dog slow, you can still handle wildly high concurrency with node. It's just that the user will feel the slowness, and your DB will struggle. But at least your server won't be unable to serve requests while it's waiting for IO.

Of course it's still on the developer to architect their system for success. Node just takes one unnecessary bottleneck out of the equation.

axod · on July 19, 2010

"The fact that you can write a web server in a few lines of easy to understand and maintain Javascript that can handle over 10,000 concurrent connections without breaking a sweat is a breakthrough."

I think it's a big mistake to judge based on "number of lines taken to write 'hello world'". It's what happens when you have 20k LOC to maintain and a heap of complexity that matters.

silentbicycle · on July 19, 2010

Agreed. It's too bad it's hard to discuss complexity issues at that level in blog posts - it tends to skew things towards overly trivial examples.

I'm curious how suitable node's callback-centric model is for larger codebases - it's workable for smaller stuff, but can turn into spaghetti code quickly, like CPS code or giving someone elaborate instructions in passive voice. (Of course, relatively autonomous chunks of the system can be moved into their own processes.)

Off and on, I've been working on a similar event loop/async server framework in Lua, and I think Lua's coroutines make the resulting code easier to manage. (No time frame on that yet, btw.)

swannodette · on July 19, 2010

The complexity around callback heavy code is why I think that the Clojure Aleph library's hybrid approach presents an avenue worth exploring - if your language has good syntactical support for dealing with concurrency.

c00p3r · on July 20, 2010

Why not nginx? ^_^

It is so inadequate to use things like JVM or V8 to serve static content.

Btw, is there any cool-web-server project for Flash - yet another artificial blob (or to be more correct - tumour) in an OS? ^_^

And of course there should be some dynamic web server written on PHP! (yes, you can run it standalone)