Exactly this. Even for internal use. Our corp approved a small project where NN ...

Exactly this. Even for internal use. Our corp approved a small project where NN will do the analysis of nightly test runs (our test suite runs very long). For now it does classification of results into several exiting broad categories. Technically product type failures are the most important usually and this should allow to focus efforts on them. But since even 1% false rate (it is actually in double digits in real life) would mean that we, QAs, need to verify all results anyway. So no time saved, and this NN software is eh... useless.

There are other ideas how to make it more useful, but my point is that non-zero failure rate with unpredictable answers is not applicable to many domains.