With MIPS, I wonder how much you actually save. Decoding the opcode is pretty easy but with this you still need to decode the arguments.
At least you make the next opcode pointer predictable for the CPU (means pre-fetchable CALL) instead of a pipeline stalling/memory dependant switch().
How much does it save on an Athlon/Pentium M compared to P4 ? Something for benchmarks
Still, the BIGGEST saving for MAME or ZiNc on the CPU core would be CORRECT memory timing. Which means that you do not have to execute 33 million instructions but rather waste tons of these on fetch/load/store.
And with MAME I would really suggest patching the fetch/load/store with correct timing, and then bother with caching interpreter/dynamic recompiler if you still feel like it. Its all about accurate emulation vs. playing the games