This is an example of a transformation that is probably more expensive in compile-time than it actually wins in runtime performance. Gcc has explicitly rejected this kind of optimization in the past for precisely that reason.
In general, compilers face an iron triangle of optimization: do you want a fast compiler, a small binary, or fast output? Optimizations that help all three are rare (especially ones that aren't already implemented in -O1), and even two of the three is often difficult to obtain.
> This is an example of a transformation that is probably more expensive in compile-time
Why? Reasoning that the value must be 1 after the loop is not even a loop optimization, just a simple consequence of the branching instruction. This leaves the loop but nothing it computes is used anymore so since C++ guarantees that it exits you can simply remove it.
The transformation as a whole might not be useful often, but having a constant for the induction variable after the loop and being able to delete useless loops will both have runtime wins.
> In general, compilers face an iron triangle of optimization: do you want a fast compiler, a small binary, or fast output?
I often wish compilers provided more options to just brute force a bit more performance out of release builds. You can manually tune many parameters but then you need to know a lot more about how the passes work than just saying --compile-for=5h (or something like that, doesn't need to have a hard time limit).
Note Gcc does do this, just at -O2 and above, whereas Clang does it on -O1 and above. Gcc also only does it on C++ code whereas Clang does it on both C and C++
I'm not saying it isn't magical, and certainly requires some symbolic solving, but it's doable.
Of course the real goal is to eventually get it to optimize