ARM7vC: Register R15 holds the Program Counter (PC). When R15 is read, bits [1:0] are zero and bits [31:2] contain the PC.
ARM7TDMI (rev r4p1): Register r15 holds the PC. In ARM state, bits [1:0] of r15 are undefined and must be ignored. Bits [31:2] contain the PC. In Thumb state, bit  is undefined and must be ignored. Bits [31:1] contain the PC.
Now I remember why was my code clearing lower 2 bits of PC on write - that was (never finished) attempt to move the masking from reads to writes.
Usually it's clearly stated that any unaligned doubleword address will still set A[1:0] lines and those "might be interpreted" by external memory controller. STR is an exception and all the docs say is: "A word store (STR) should generate a word aligned address. The word presented to the data bus is not affected if the address is not word aligned. That is, bit 31 of the register being stored always appears on data bus output 31." Nothing about lowest two address bits... SWP by the way is said to operate as LDR followed by STR - and I now assume that means any unaligned read will rotate and store will be masked to be aligned (on DC at least).
In short, all transfers except LDR are implementation specific (unaligned memory access can be masked, or happen as it is). I'm aiming for my core to be DC compatible but that in turn might break other systems. It's up to MAME devs what to do with their ARMs, I'd split the core code by CPU generation and memory access handlers by system.
Here's another interesting difference, about R14 when branch link bit is set:
ARM7vC: Note that the CPSR is not saved with the PC.
ARM7TDMI (rev r4p1): Note that the CPSR is not saved with the PC and R14[1:0] are always cleared.
Why is R14 being sanitized if the code is executing in 32-bit (non-Thumb) mode? PC should be aligned then. Or does it mean lower PC bits are not undefined but in fact valid, and in that case (sooner or later) someone is going to write a piece of code relying on that 'feature'.