Here's a trick to confidence in a BEAM system. If you get good at hot loading, y...

bostik · 2025-11-11T18:42:57 1762886577

Erlang's hot reload is a two-edged blade. (Yes yes, everything is a tradeoff but this is on another level.)

Because it's possible to do hot code reloading, and since you can attach a REPL session into a running BEAM process, running 24/7 production Erlang systems - rather counterintuitively - can encourage somewhat questionable practices. It's too easy to hot-patch a live system during firefighting and then forget to retrofit the fix to the source repo. I _know_ that one of the outages in the previous job was caused by missing retrofit patch, post deployment.

The running joke is that there have been some Ericsson switches that could not be power cycled because their only correct state was the one running the network, after dozens of live hot patches over time had accumulated that had not been correctly committed to the repository.

toast0 · 2025-11-11T18:58:06 1762887486

You certainly can forget to push fixes to the source repo. But if you do that enough times, it's not hard to build tools to help you detect it. You can get enough information out of loaded modules to figure out if they match what's supposed to be there.

I had thought there was a way to get the currently loaded object code for a module, but code:get_object_code/1 looks like it pulls from the filesystem. I would think in the situation where you a) don't know what's running, and b) have the OTP team on staff, you could most likely write a new module to at least dump the object code (or something similar), and then spend some time turning that back into source code. But it makes a nice story.

[1] https://www.erlang.org/doc/apps/kernel/code.html#get_object_...

manveru · 2025-11-11T21:19:29 1762895969

You can run https://www.erlang.org/doc/apps/kernel/code.html#modified_mo... in some process and make it send notifications to your monitoring when anything stays modified for too long.

toast0 · 2025-11-11T22:21:57 1762899717

That's part of it yeah. But, at least in my experience, that tells me you pushed code (to disk) and didn't load it. You could probably just notify at 4 am every day if erlang:modified_modules() /= []; assuming you don't typically do operations overnight. No big deal if you're doing emergency fixes at 4 am, you'll get an extra notification, but you're probably knee deep in notifications, what's one more per node?

But, that's not enough to tell you that the code on disk doesn't match what it's supposed to be. You'd need to have some infrastructure that keeps track of that too. But if you package your code, your package system probably has a check, which you can probably also run at 4 am.