TFA's while point is that there is no easy way to tell if LLM output is correct or not. Driving mistakes provide instant feedback if the output of whatever AI is driving is correct or not. Bad comparison.
Many of the things that LLMs will output can be validated in a feedback loop, e.g., programming. It's easy to validate the generated code with a compiler, unit tests, etc. LLMs will excel in processes that can provide a validating feedback loop.
I love how everyone thinks software is easy to validate now. Like seriously, do you have any awareness at all about how much is invested in testing software by the likes of Microsoft, the game studios, and any other serious producers of software? It's a lot, and they still release buggy code.