You don't need to do the TI thing (which performance-penalizes even un-contended accesses), you can just make the Z80 so if HALT gets pulled by a read or write handler it backs up that cycle and halts.
but then a write that wouldn't happen has already happened.
or a read that wouldn't happen has already happened.
you can rewind time to before they happened, but they still happened, that's why it needs to be speculative, not the actual operation, the actual operation can't be allowed to happen if the Z80 is frozen.
it might 'work' for most cases, but it's still a hack, and could still have side-effects. I suspect those hacks exist for this reason, because MAME can't handle it properly for the cores where it gets used.
since you're working with screens and raster timings here those things would directly affect how demos run. If you allow a write to happen, then have the CPU pause, that data, used for a screen effect, is going to be in VRAM not next time the ULA frees it, but before/during, which if you're doing accurate ULA timings, means the ULA might pull the incorrect data and the effect will appear a few pixels too early as RAM has already changed.
the fundamental problem, or at least the way I see it, is that the asserting of the address lines, and the actual read/write operation are two separate actions.
it's a similar theory those fast ARM cartridges for old systems work on, they can see the address lines being asserted as a separate operation before the data is read / written, and in those cases the ARM is fast enough to execute an entire block of code to get the data before it needs to be pushed out without slowing down the CPU.
MAME doesn't differentiate between the two stages of the operation. For things like this to work, without hacks, MAME needs to.
So yeah, the SETOFFSET_MEMBER type thing, prior to any access, be it readmem/writemem/readport/writeport/opcode fetch, as a separate instruction stage, including performance hit, absolutely *is* needed. This needs to be in conjunction with another callback to tell the CPU that the address it just tried to access is now free so that it can try again (due to timings, I guess it would need to check again if it was free incase something else stole it in the meantime)
I guess this is also really why you need code generators generating the CPU .cpp code too, because ideally you'd be able to generate fast/slow paths in the Z80 core, because I don't really think you want this level of accuracy for the one embedded in the Dreamcast for instance ;-) You'd want to be able to generate a version with all the checks / callbacks, and one without and pick which codepath to use depending on if the driver attempts to install callback handlers for address accesses.
I suppose you'd call this 'signal accurate' rather than 'cycle accurate' ?