What they actually do is run big endian but twiddle the low three address bits for memory accesses smaller than 64 bits. You then wire the bytes on the 64-bit data bus opposite way around (but bits within the bytes around the normal way), and it appears to magically work as long as you don't do unaligned accesses.
You've just described exactly how we do mismatched endian in MAME.