Modulus (division) is slow and the length should be pow2 and the operation "& (len-1)". If you do %len, you have far greater issues. I have pretty extensive experience writing hashmap/cyclic buffers and the like. If you have auto-grow structs (and you almost always want that), you want pow2 length arrays. e.g.
This is entirely orthogonal to whether you do "prev+1 & len-1" or "(prev & len-1) + 1". (In fact, the latter gives a more natural construction if you fill downward.)
Adding/advancing is indeed easier. What about going backwards (insert before head or remove from tail), i.e when 0, then length. One way to do it is something like that, assume 2 compliment and there is negative shift left available. Not straightforward:
int h = ( -(--head) >>> 31) ^ 1;//if head was equal to 1 (and only 1), h = 1, otherwise zero
head += h * len; //or shift left, if there is log2 len available; still, mul is a fast operation, unlike div
Of course, it can implemented via branching but that would be a major downside for 1-based idx.
Circular buffers?