Nitpick: not all bugs have to be reproducible to be taken seriously. Defensive programming, and adding extra logging could be a mitigation to avoid future problems, or to help fixing them in the future.
Imagine you're writing trading software, you have an algo go haywire and it machine guns the whole order book, and then you refuse to put a "max order size" outside of the algo to stop it from happening again because you can't figure out why it happened in the first place.
Try telling a regulator or your boss that was your reasoning.
How many one-off band aids do you think should be applied for rare, never reproduced problems before you slap a “100% safe” label on it and ship it with the confidence of a bloated, cruft-ridden job well done?
Are you arguing in bad faith, or do you just not have any practical experience dealing with complex systems?
Even if the bug can't be reproduced, on the basis of multiple user reports the first step absolutely should be to add some assertions and logging around email deletion.
The point is not to give it a "100% safe" label, the point is to start narrowing down possible root causes. If the problem recurs again, you'll have assertions ruling out certain possible culprit code paths as well as logs displaying the values of relevant variables.
Kinda off topic, but I've been searching for good introduction and best practises for defensive programming, but never really found much. Any recommendations?
I don’t know of any real posts on it, it just ends up being kind of a “assume it’ll go wrong,” then figure out how you know something has gone wrong and track it down. Your starting point is, after an issue is reported, add a load of logs around places that seem like candidates for the flow. Over time, you get a sense of where things can break and you add that telemetry ahead of time.
I feel like this is sort of like reading a book to get better at self defense. Yeah, you'll probably pick up a few interesting things that may be of questionable use. But when you train in martial arts, you often get to go through the motions and put the moves into practice. Even then "real" fights will feel quite different and a lot of the stuff you've learned will likely fly out the window. If you've been in real fights a lot, you've begun to internalize your training and your moves become more like instinct. It's quite difficult to go from book knowledge to instinct without getting beat up a lot in between I think. The real valuable lessons come from building something that breaks and getting to fix it yourself.
This issue comes up in my role a lot, where I am often dealing with various environmental conditions and human factors, plus multiple integration points between various software and hardware systems.
The answer is that you keep working at it iteratively using a combination of logging, reporting, and defensive programming to systematically narrow down the possible causes. Sometimes you never arrive at a true root cause, but you get close enough that you can mitigate the problem and finally close the ticket out. At the end of the day, the customer/user doesn't care as long as it works.
However, what will really piss them off is telling them your hands are tied until they can reliably reproduce the issue for you. It's important they understand that you are working on it, and typically they will go out of their way to help solve the problem when they feel taken care of.
Why do you think it needs reproducible steps? It is obvious that the bug is still active, so in a way it is reproducible, just not in a systematic way.
This happens more often, for example when many services work together in an asynchronous way, and in some very rare situation, unwanted behavior occurs. To fix that, it is often easier to reason through the entire process, and to identify weak spots. It might even be a good idea to switch to a different paradigm to avoid certain bugs altogether.
For this particular bug, I would start by reading a lot, and ensurong that the bug is indeed not easily reproducible (by trying to make it reproducible of course). If that fails, I would continue to think about root causes for the bug, and possible workarounds that would work in theory. Then I would try to estimate the amount of work required, and the risk of breaking other things, and report that to those who like to decide on further actions.
And of course, as I know very little about the inner workings of ThunderBird, I would simply ask ChatGPT o3 or similar for advice. It comes up with a plan that seems reasonable.