I've just finished few more DC ARM tests. You already know this but to refresh your memory: ARM7 is pipelined CPU and this makes cycle counting somewhat complicated. It's especially difficult once R15/PC comes into play.

Usually instruction is fetch+decode, then execute. If execute doesn't involve any memory transfers, ARM does prefetch of next opcode and last cycle of instruction N and first of N+1 fold. So basically execution time of N+1 instruction depends on what N was... And of course if you modify PC things get even more interesting - beacuse prefetch queue has to be flushed.
This is also why some instructions add +8 to PC, and some +12 - it all depends on when exacly PC is being used, 2nd or 3rd cycle of the instruction in question.

I still have no idea why am I getting .5 differences, maybe the internal clock is 2x faster than memory access, but it all fits in more or less. Loads are longer than stores due to additional cycle for possible abort situation (in which case Rd has to rolled back). Also, in the light of what I just said, measuring execution times with long sequences of one opcode is probably not the best way to do it...

Anyway, B/BL are whooping 3 cycles. SUB PC,4 is only .5 cycle longer then SUB Rx,4 (where x != 15) though. Don't ask me why.
I'm also proud to announce I got it right and PC used in barrel shifter operations will be +8 if the shift is specified in opcode, or +12 if it's in Rs. What's more, PC can be base of the shift, and same rules apply as well. This is for Data processing, as manual says single load/store may not use PC in that way (but I wouldn't bet on it, maybe I'll even try to test that case too).

As for ADPCM - "ChuChu Rocket!" was suffering from the same bug that "Love Hina" was and now plays music so much better. It also has lots of type 2 samples with LSA != 0 and it seems reseting decoder to 0/127 works in those cases as well. Problem is, most of the time such reset doesn't produce noise if it's wrong, rather it makes the sound too quiet or click a bit. Hard to tell - seems right to me though.