#simd
there's this trick i randomly found a few years ago and i've been wondering if there's a name for it or if other people have done this before
```
for enforcing floating point determinism with realigned buffers
if we have
x x x 0 1 2 3 4 5 6 7 x x x
where x is the identity for my operation, and our operation is commutative (not necessarily associative)
then adding x padding doesn't affect the result as long as we do a tree reduction at the end
e.g.
accumulate in register: v = 0+4 1+5 2+6 3+7
tree reduction step 0: (0+4)+(2+6) (1+5)+(3+7)
tree reduction step 1: ((0+4)+(2+6)) + ((1+5)+(3+7))
if we add padding (e.g., by realigning the buffer and using a masked load)
accumulate in register: v = x+1+5 x+2+6 x+3+7 0+4+x
tree reduction step 0: (1+5)+(3+7) (0+4)+(2+6)
tree reduction step 1: ((1+5)+(3+7)) + ((0+4)+(2+6))
commuting the elements shows us that this is the exact same result as the previous one, so the bit pattern of the final result is unaffected (modulo signed zero, nan, etc)
```