Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is not about “taming lag” as suggested by the title, which implies some form of failure on node’s part.

They accidentally wrote synchronous O(n^2) code that hogged the CPU, blocking the event loop, then fixed it. But that doesn’t sound as adventurous…

Otherwise a solid example of using observability tools to debug a live issue.



While I don’t think the article is very advanced, it’s really not about the root cause. The O(n^2) code isn’t the subject (they don’t even show the fix, as it’s not really interesting).

It’s about how to systematically detect and debug the problem. In Node that’s not a trivial thing to do. That has value


>In Node that’s not a trivial thing to do.

Depends on your code best practices, I've found it way easier than other platforms I've used (C++, Python). Even without explicit interrupts and such.


I might be missing something, what code practices do you have in mind that help with CPU profiling?


Things that help:

* No anonymous functions/lambdas (unless they're extremely trivial, but then you probably don't need that in a function at all)

* Avoid recursion

* Functions do one simple thing and return

* Only one return at the end of the function body

* No uncaught exceptions

* Functions that "await" and functions that "compute" are separate ones

* Avoid 3rd party libraries unless it is absolutely necessary

* Write code for a single thread model, scale horizontally with cluster, things like pm2 or just running several node processes

Never had to write them down, tbh, there's definitely many more. This could apply to every language, though, not just JS. You could borrow some practices from realtime computing, as well.

If you do most of these things, debugging with console.trace is straightforward. You can also use one of the flamegraph tools out there for the profiling part.

They only thing you can't control (but you can examine to some extent) is the GC, but in practice that's never been an issue for me since the GC that comes with V8 is really good.


Very well said!

I'm a pretty sophisticated node dev and I haven't heard of the technique of logging spans whenever the event loop blocks for 100 ms (similar to how you get warnings by default in frameworks/chrome for going over 100 ms on event handlers).

Obviously simple to run a setInterval and compare wall clock, but I would have no idea how to detect the actual issue.


You misunderstand. CPU hogging causes event loop delay (lag), they are "taming" that by fixing that cpu-hogging code. There is no implication nodejs itself is the cause of the lag.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: