Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How do you fix a bug you can't reproduce?

It's a genuine question because I'm puzzled here.

A very small number of users have this bug (and tbf, it's a really bad bug), and are unable to consistently reproduce it and it seems none of the developers have been able to (the seemingly random nature of the bug occurring is not helping). How is it supposed to be fixed?



You add more and more diagnostics (e.g. logging) in that area till you manage to track down the bug. Over several years this should be possible. At that point you can either fix the bug directly or do it properly by first reproducing the bug (in a test) and then fixing it.


How do you close a bug you cannot reproduce?

Said another way - If they can't reproduce it, they can't close it.

They may well have fixed it already, but without a way to reproduce it the only prudent behavior is to leave it open and wait for the next diagnostic file to be uploaded.


That's not the only prudent behaviour, as the OP said, the prudent behaviour is to add more diagnostics and guards against the conditions that lead up to the bug.


Okay, let's assume more diagnostics and guards were added.

Now re-answer the above questions with these assumptions.

  - How do you fix a bug you can't reproduce?
  - How do you *close* a bug report when you can't reproduce? 
Being generous here, we're assuming there's 17 years worth of diagnostics and safety guards added but through that time the bug still isn't reproducible. Let's try to answer the questions under these assumptions.


If you've added guards and diagnostics, then you close it until someone else files a follow-up, then it can be re-opened. There's no sense keeping it open unless there are ongoing reports of the issue.


  > There's no sense keeping it open unless there are ongoing reports of the issue.
I think you've misunderstood. There's other options.

Let's consider this from a failure analysis standpoint. Here's our options

  - You have incorrectly marked issue as solved
  - You have incorrectly left the issue marked as unsolved
*Which error case would you rather have?*

The classic example of this design choice is with a safe. Let's imagine you are building a safe. If the safe fails, would you prefer that it fails into a state that is unlocked or into a state that is locked? The answer isn't so obvious, as it actually depends on how it fails, right?

A very common example is when designing skyscrapers. The choice is that when a skyscraper fails, there is a strong preference that it falls in on itself (think 9/11). Why? Because if it falls to the side then it takes out other buildings and can create a chain reaction (a related famous example being housing in Industrial Revolution London and fire...)

Your action is a valid option, but it is not the option that I would chose. I think what they did was perfectly fine. They left it open (to avoid tricking anyone to thinking it is solved when the status of solved is actually unknown) and marked with additional information about lack of verification/reproducibility. Essentially, it is marked as stale.

So we're back to the earlier question:

  - How do you *close* a bug report when you can't reproduce? 
Or we can frame differently: "How do you close a bug report if you have no indication that the bug was resolved nor exists?"


There are users in the comments here reporting the issue affects them.

Would some noble purpose be served by closing the existing issue in the hope that they'll complain via more official channels?


Hopefully those people will report the issue in the bug report and try to help the devs reproduce the issue. Especially since it is linked.


I don't think one more user report is going to be the difference that pushes them over the finish line after two decades. Let's not pretend the developers have been taking this bug seriously.


They still have it marked as unreproducible. What do you expect them to do if they can't reproduce?

So yeah, I do think more user reports can help. At worse, it will make them take it more seriously if there are more reports.

You also are falling to observation bias. You can see linked in the issue as well as by searching that there are similar issues that were resolved and marked solved. So I don't think they were just doing nothing as everyone seems to be assuming.


Have the been any potential fixes made since those reports?


Given there are people in the comments here indicating the issue still exists, there haven’t been any actual fixes made.


The way I've dealt with that in the past is putting into into Review or whatever the equivalent is, make a note ("cannot repro, but attempted potential fix in version XXXX, moving to review, please reopen if anyone reports this again) and then if nobody reports it still happening for x amount of time (e.g. 12 months), close it. Can always reopen it if it gets reported again beyond that.


For starters, put a lot more effort into reproducing it.


- You can try harder to reproduce it.

- You can extend logging to gather additional information to reproduce it.

- You can try to reason about the code and figure out possible causes.

- You can attempt to formally verify the correctness of the code.

- You can put guards into the code against unexpected states and actions.

- You can verify the correct result of previous actions before any destructive actions.

- If all fails you can scrap the piece of code in question since it seems to be beyond your ability to maintain.


> How do you fix a bug you can't reproduce?

You strangle it from the edges.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: