I now sanitize memory writes in ARM core - I figure this is where it belongs. This allows for a few simple optimizations, like block transfers do the masking only once per run. Well, with this outta my way I guess I'll try doing LFO now.
By the way, I've noticed that you do your own byte rotation in unaligned reads, so no need for RBOD macro anymore (or it will get rotated twice). And that search & replace on my memory access functions introduced some oddities into Data processing macros. Some opcodes still count cycles wrong. I'm not sure you're going to be using my core for much longer if MAME is getting ARM re-write, but if you do that source needs some serious cleaning now