I'm not AWJ, but I'll add a few thoughts anyway.
If this really is a problem with inexact timing, then that exemplifies a common problem with MAME's MCU cores, many of which just execute each instruction all at once and then go back and run the timers and timer-based peripherals for the same number of cycles. This works fine for purely internal interrupt timers but often fails for more sensitive cases such as serial ports.
In the past, Vas Crabb has suggested "separate icount and bcount" as the one proper solution to this problem, but much like the modern floppy system (which at least does have some readable technical documentation), only Olivier Galibert seems to have fully mastered its implementation details. While I only half-understand how it works, I'm not convinced of its effectiveness as a solution, especially after finding that Olivier Galibert's MCS-96 core, despite having this thing called "bcount" (not defined in MAME's execution core), triggers high-speed outputs at the wrong times.