Was thinking of a shorter avl producing partial results merged into another reg.
Something like a += b; a[0] += c[0]. Without avl we'd just have a write-after-write, but with it, we now have an additional input, and whether this happens depends on global state (VL).
> Agree agnostic would help, but the machine also has to handle SW asking for mask/tail unchanged, right?
Yes, but it should rarely do so.
The problem is that because of the vl=0 case you always have a dependency on avl. I think the motivavtion for the vl=0 case was that any serious ooo implementation will need to predict vl/vtype anyways, so there might as well be this nice to have feature.
IMO they should've only supported ta,mu. I think the only usecase for ma, is when you need to avoid exceptions. And while tu is usefull, e.g. summing am array, it could be handled differently. E.g. once vl<vlmax you write the summ to a difgerent vector and do two reductions (or rather two diffetent vectors given the avl to vl rules).
What's the "nice to have feature" of vl=0 not modifying registers? I can't see any benefit from it. If anything, it's worse, due to the problems on reduce and vmv.s.x.
"nice to hace" because it removes the need for a branch for the n=0 case, for regular loops you probably still want it, but there are siturations were not needing to worry about vl=0 corrupting your data is somewhat nice.
Huh, in what situation would vl=0 clobbering registers be undesirable while on vl≥1 it's fine?
If hardware will be predicting vl, I'd imagine that would break down anyway. Potentially catastrophically so if hardware always chooses to predict vl=0 doesn't happen.
> Agree agnostic would help, but the machine also has to handle SW asking for mask/tail unchanged, right?
The agnosticness flags can be forwarded at decode-time (at the cost of the non-immediate-vtype vsetvl being very slow), so for most purposes it could be as fast as if it were a bit inside the vector instruction itself. Doesn't help vl=0 though.
Espasa discusses this around 6:45 of https://www.youtube.com/watch?v=WzID6kk8RNs.
Agree agnostic would help, but the machine also has to handle SW asking for mask/tail unchanged, right?