If the denominator is a constant, wouldn't it be faster to use the divmod identity to turn it into (divide, multiply, subtract), then use the usual constant-divide-is-multiply-and-shift optimization?
The article isn't very clear but assuming it's a 16-bit numerator and 8-bit denominator, then MSN's answer to [0] lays it out (although for higher bit sizes). If the denominator was 16-bit, then the top-rated answer (by caf) to the same SO question seems like another approach, but that wouldn't be a one line change.
Anybody know what's the exact transformation here? I searched around and found this answer, but it doesn't work:
https://stackoverflow.com/a/10441333