And nowdays conditional jumps (if predictable enough) are faster than CMOVs on x86. Makes you wonder why Intel introduced those in the first place... Then again, I just found a use for BSR instruction, and nothing beats that idea of theirs to remove hardware shifters from P4
Quite frankly I never much liked how AO handles ADPCM decoding (a bit too complicated for my taste) but at least it seemed to work. Are you sure this change will not break the long stream type on loop jumps? It gets a bit tricky if you're doing interpolation and only keep the last computed sample value...