More

jake_morrison · 2025-04-04T15:08:25 1743779305

I thought about doing this years ago, but without AI, the only way was to have people do the analysis. I thought this was bad karma, as I would end up paying people in Bangladesh to look at the stuff that Americans eat.

0_____0 · 2025-04-04T15:18:47 1743779927

A lot of what Americans eat is kinda grim? I couldn't eat the sausage at the hotel breakfast in Little Rock this morning because it tasted like new vinyl smells.

Wrt Bangladesh... I imagine the job might be a little harder during Ramadan.

jake_morrison · 2025-02-24T03:28:35 1740367715

Embedded systems often have crappy compilers. And you sometimes have to pay crazy money to be abused, as well.

Years ago, we were building an embedded vehicle tracker for commercial vehicles. The hardware used an ARM7 CPU, GPS, and GPRS modem, running uClinux.

We ran into a tricky bug in the initial application startup process. The program that read from the GPS and sent location updates to the network was failing. When it did, the console stopped working, so we could not see what was happening. Writing to a log file gave the same results.

For regular programmers, if your machine won't boot up, you are having a bad day. For embedded developers, that's just a typical Tuesday, and your only debugging option may be staring at the code and thinking hard.

This board had no Ethernet and only two serial ports, one for the console and one hard-wired for the GPS. The ROM was almost full (it had a whopping 2 MB of flash, 1 MB for the Linux kernel, 750 KB for apps, and 250 KB for storage). The lack of MMU meant no shared libraries, so every binary was statically linked and huge. We couldn't install much else to help us.

A colleague came up with the idea of running gdb (the text mode debugger) over the cellular network. It took multiple tries due to packet loss and high latency, but suddenly, we got a stack backtrace. It turned out `printf()` was failing when it tried to print the latitude and longitude from the GPS, a floating point number.

A few hours of debugging and scouring five-year-old mailing list posts turned up a patch to GCC (never applied), which fixed a bug on the ARM7 that affected uclibc.

This made me think of how the folks who make the space probes debug their problems. If you can't be an astronaut, at least you can be a programmer, right? :-)

toast0 · 2025-02-24T06:37:29 1740379049

At least the debugger worked. The processor I used in embedded systems in college, the 68HC11, would stop doing conditional branches when the supply voltage was too low.

We had a battery powered board, with no brownout detection, and I was using rechargable NiMH batteries to save money/waste. When the students with alkaline batteries had low batteries, the motor load would bring vcc down far enough that the CPU would reset by itself. With NiMH, the batteries could still drive the motors and keep the CPU alive...

You could single step in the debugger, and see the flag register was set as expected, but the branch didn't happen. Just ran straight through. I can't remember if unconditional jump or call worked. After about the third time this happened, I got good at figuring it out.

apple1417 · 2025-02-24T11:04:32 1740395072

> For regular programmers, if your machine won't boot up, you are having a bad day. For embedded developers, that's just a typical Tuesday, and your only debugging option may be staring at the code and thinking hard.

Of course where it becomes even more fun is when it's a customer's unit in Peru and you can't replicate it locally :). But oh how I love it. I have definitely spent many a day staring at code piecing things together with what limited info we have.

But to get back on topic, I can definitely confer on the quality of most embedded compilers. It's a great day when I can just use normal old gcc. I've never run into anything explicitly wrong, but I see so many bits of weird codegen or missed optimisations that I keep the disassembly view open permanently, as a sanity check. The assembly never lies to you - until you find a silicon bug at least.

anitil · 2025-02-24T05:11:09 1740373869

> For embedded developers, that's just a typical Tuesday

I was trying to explain to my colleague the other day that I've spent an unhealthy amount of time rebooting devices while staring at an LED wondering why it won't turn on.

eschneider · 2025-02-24T11:06:46 1740395206

Tuesday, indeed. :)

In the embedded world, correctly working hardware isn't a given, either. Part of the board bringup/hardware verification process is just determining that everything on the board actually works. Always fun when you have to figure out if a problem is in your code or in the hardware. (HINT: It's often both.)

It's rare that you need to break out the oscilloscope or logic analyzer, but when you absolutely have to know if that line went high or not, there's no substitute. :)

taneq · 2025-02-25T04:58:49 1740459529

> (HINT: It's often both.)

Or worse, it’s neither! By which I mean both. Neither part of the design is technically wrong but the fault is in the way the two interact. Those are some of the fun ones… I had one where I had to make sure the chip select line was off before turning power off to a chip, because CS would keep it half powered.

eschneider · 2025-03-02T18:36:44 1740940604

At a sufficiently high resolution, all digital electronics is actually analog. :/

sitkack · 2025-02-24T05:21:10 1740374470

It is nuts to have a dev board that is constrained as the final device. You should have had an additional serial port and 8x as much flash, it would have solved your problem immediately.

It is even better to do the bulk of the dev inside of an emulator if you can swing it. The GPS and GPRS could be tethered into the emulator instead of trying to get a debug link into the system board.

ShroudedNight · 2025-02-24T04:01:42 1740369702

Were these commodity boards? Having to resort to using the cellular connection, instead of attaching a hardware debugging probe (J-Link?) seems like a recipe for a painful squandering of intellect.

exmadscientist · 2025-02-24T06:12:47 1740377567

One of the lovely "features" of embedded work is that after a while of doing this sort of thing, sometimes you get good enough at the crazy hacks that it becomes faster and easier to do something like this than to track down who has the J-Link (okay, they've usually got more than one) and can they spare it/where did they put it/why does that person have a J-Link at all/is the J-Link still alive....

jamesfinlayson · 2025-02-24T23:43:26 1740440606

Oof, I remember doing lots of embedded stuff at university and this rings true.

The compiler we used was built off gcc so it was reasonably good but I remember we had some weird crash one day that I couldn't figure out. Eventually I added some inline assembly to do an absolute jump to the next place that it needed to go and it started working again. I was too inexperienced to know how to dig deeper but presumably the code generator had inserted something weird that was causing a crash.

lisper · 2025-02-24T17:15:29 1740417329

Yeah, I have a war story...

I was working on mobile robot research at JPL back in the 1990s. We had a robot with an arm attached. It worked fine except that every now and then the whole system would crash hard with a totally corrupted heap and stack, just random data everywhere. So no chance of a backtrace. The really weird thing was that this only happened when the arm was moving. We also had the exact same system running under a different operating system and we never had any problems there, so we were 100% sure it was not a compiler error.

It was a compiler error.

It took us a year to figure out what was going on. It turned out that the compiler had a bug where it would emit code that would pop the stack pointer and then pull a value out of the now unprotected stack frame. On the non-embedded system this did not cause any problems, but on the embedded system (running vxWorks) hardware interrupts used the same stack as the process that was running when the interrupt hit. So if we happened to get an interrupt just after the stack pointer was popped but before the unprotected value was grabbed, that value would get stomped on by the interrupt handler. Then when the interrupt handler would return, the process would resume, grab the now-random value, and chaos ensued.

ShroudedNight · 2025-02-24T23:20:05 1740439205

How many novel depressions were created as a result of high velocity impacts after making that discovery? I think I'd be seeing red...

lisper · 2025-02-24T23:28:29 1740439709

Actually, I remember being thrilled to have finally figured it out. We had been beating our heads against the wall (metaphorically) for a year, and I remember looking at the screen at the disassembly sequence and thinking, Oh my God, I think I've found it! It felt like making a major scientific discovery. (To be fair, I was only able to do this after others laid the groundwork for me by finding ways to reliably reproduce the problem. But I'm the one who spent hours single-stepping through assembly code before finally realizing what was happening.)

I also remember reporting the problem to one of the authors of the compiler (I think it was David Kranz) so he could fix it in the next version and him telling me that there wasn't going to be a next version because the funding for the project had been cut. There was no github in those days so the whole thing just faded into the mists of time, which is a real shame because the system really kicked ass.

The whole history of the project can be found here:

https://paulgraham.com/thist.html

motorest · 2025-02-24T06:34:34 1740378874

> For regular programmers, if your machine won't boot up, you are having a bad day. For embedded developers, that's just a typical Tuesday, and your only debugging option may be staring at the code and thinking hard.

It seems to me that if you can still update and reboot said machine, you can do a bisect on your commits to pinpoint the regression. Once you spot the regression commit you can split it to check what introduced the regression.

smcl · 2025-02-24T07:33:28 1740382408

It took them multiple tries just to use gdb, I don’t think this is a scenario where you can easily reflash the image on the board

stuaxo · 2025-02-24T08:00:33 1740384033

Did the GCC patch get applied after that?

actionfromafar · 2025-02-24T11:13:52 1740395632

"Never" implies no, I guess. :-)

jake_morrison · on Sept 3, 2024

I am optimistic about Ubuntu Chilselled, which automates this process: https://canonical.com/blog/chiselled-ubuntu-ga

Here is a full example: https://github.com/cogini/phoenix_container_example/blob/mai...

jake_morrison · on Feb 28, 2024

Apple making cars was an interesting play when they had a bunch of cash overseas and good relationships with Chinese manufacturers. Tax law changes made bringing money back to the US less punishing, and manufacturing in China is becoming problematic.

jake_morrison · on Feb 28, 2024

You can do something similar with docker compose, driving the system from the outside. Create dockerized versions of dependencies like the database, build and run tests, and then run tests against the production app container. It's particularly useful for testing a set of microservices.

See https://github.com/cogini/phoenix_container_example for a full example. This blog post describes it in detail: https://www.cogini.com/blog/breaking-up-the-monolith-buildin...

m00x · on Feb 28, 2024

We use docker environments like this for tests, but it does have its issues.

You often need to add custom behavior like waiting for the app to load and start serving, healthchecks, etc. Having it all in code is pretty useful, and it's self-contained within the code itself vs having to set up the environment in different places (CI, Github actions, local dev, etc).

The negative is that code isn't portable to prod, it doesn't test your environment as well (important for staging), and you're missing out on sharing some environment settings.

I feel like it definitely has its place in the stack and in certain companies.

jake_morrison · on Feb 22, 2024

A few years ago I did an embedded Linux project on an ARM7 CPU without an MMU, a vehicle tracker for trucks with GPS, cellular modem, ODB2 interface, and other I/O.

Lack of MMU caused some unexpected issues:

* Because there is no MMU, there are no shared libraries. All the binaries have to be statically linked. We had a total of 2MB of ROM, of which about 1M was the Linux kernel. So we ran out of space for the programs we needed for the application about 3/4 of the way through the project. The solution was to use Lua, which was a 200k binary, and it could run scripts to do some things.

* Lack of MMU means no memory protection between programs. One program can step on another program's memory. Just like the good old days of DOS. Ouch!

* We started with the ucLinux distribution, which is pretty lame. A lot of programs in the tree didn't build. We ended up rebuilding with Yocto, and making patches to things where necessary.

* We found other entertaining bugs: https://www.cogini.com/blog/debugging-your-space-probe/

kragen · on Feb 23, 2024

the post you're commenting on describes some ways to make shared libraries work on arm without an mmu, even though its main focus is on risc-v

possibly https://github.com/mickael-guene/fdpic_doc postdates the time you're talking about, but since you mention yocto, i suspect not

lua is pretty great

jake_morrison · on Feb 23, 2024

Yeah, I actually used OpenEmbedded, the predecessor to Yocto. The project was in 2011.

kragen · on Feb 23, 2024

aha, yeah, i think you would've had to invent fdpic then, and add support for the new abi to your compiler

jake_morrison · on Jan 31, 2024

It's a bad market, probably the worst since 2000. Some of this is overhiring and zero interest rate behavior coming to an end. Companies are shifting from prioritizing growth to profitability. There is also a lot of asshole behavior. They are laying people off to please Wall Street, not because they need to for financial reasons. And they are doing layoffs in extremely callous ways, e.g., right before Christmas.

My daughter's data engineering team in finance was reduced from five to two, and they are not giving raises that match inflation. She is almost singlehandedly responsible for servicing million-dollar accounts. It's not like there isn't money.

There is also a lot of hiring of remote people outside the US. The companies figured out how to do remote work during Covid, and they are using it to reduce costs.

Things will probably improve in the next year, but it will be rough for a while.

jake_morrison · on Jan 16, 2024

In some companies, staff positions can only be effectively done by someone who has been there for 10 years. They know the specific tech stack that the company uses, know the application, know where the bodies are buried, and everyone in the company trusts them. It's basically impossible for someone to parachute in and do the job.

jake_morrison · on Jan 16, 2024

Erlang is based on virtual threads (confusingly called processes). The Erlang virtual machine schedules them on OS threads. Erlang processes communicate using message passing, preventing deadlocks. You can use millions of Erlang processes without problems, e.g., to handle millions of Elixir LiveView sessions.

toast0 · on Jan 16, 2024

Erlang has the advantage that it was built around processes and is effectively preemptive. Processes can be descheduled anytime they make a function call or use receive to get or wait for messages and being a functional language, you have a finite amount of instructions before calling a function.

Other languages adding virtual threads later in life don't have the same ability to feel preemptive. Although I think someone said Java has a nice trick or two?

Anyway, if all the virtual threads seem preemptive, you won't have the case that your limited number of actual threads are waiting on locks and not yielding --- all Erlang processes yield eventually; usually in a fairly short time frame.

Jtsummers · on Jan 16, 2024

You can have deadlock in Erlang, it's just a bit harder. It happens when two processes are both waiting on the other to send them a message which is analogous to two threads each waiting for a mutex the other holds. The same thing can happen in Go with its channels, another message passing based concurrency control mechanism.

orthoxerox · on Jan 16, 2024

Two deadlocked processes won't exhaust the thread pool in Erlang, they will simply never wake up.

jake_morrison · on Jan 16, 2024

Sure, you can make deadlocks in any language, but it's uncommon in Erlang. Shared state is the exception, and message passing means that things that manage state, such as gen_servers, only process one message at a time from their inbox.

Contrast this with languages like Java where every object is a potential concurrency problem. Or the 10+ years of trying to make Python async (see Twisted).

Alifatisk · on Jan 16, 2024

This feels like something I could read on Wikipedia about Erlang, how does this add anything to the topic?

kaba0 · on Jan 16, 2024

It all has the cost of a significantly lower throughput in case of Erlang.

Also, this is more of a user error, than a fundamental issue.

jake_morrison · on Jan 8, 2024

Like ogres