Master cycles are based on memory access timing.

You won't find a better algorithm than mine. Even beats full and partial lookup table variants.

Code
//rom_speed = 6 if $420d.d0 is set (FastROM), 8 otherwise
unsigned sCPU::speed(unsigned addr) const {
  if(addr & 0x408000) {
    if(addr & 0x800000) return rom_speed;
    return 8;
  }
  if((addr + 0x6000) & 0x4000) return 8;
  if((addr - 0x4000) & 0x7e00) return 6;
  return 12;
}

As for breaking apart the S-CPU cycles, you should really use the bsnes source code. There is the W65C816S documentation that explains it (only the older version), but it has errors. Especially in WAI and IRQ timing. And it's missing the IRQ edge case condition that converts an I/O cycle into a bus read.

Really, you'll save yourselves two years of fixing IRQ bugs in golf games if you port my S-CPU core instead of writing your own ... but whatever you guys want to do wink

If you go your own route, you should decide on whether or not you will use cooperative threading. Save states are still possible with it. If you choose not to, you will wish for death upon attempting to implement bus hold delays and proper H/DMA bus synchronization in a state machine. I couldn't do it after 2 years, and the Snes9X team gave up on it as well.

Also, here's the code we have so far for the S-SMP TEST register:
http://board.byuu.org/viewtopic.php?p=12381#p12381

Another gem. There is absolutely no game known, not even in the demo scene, that ever writes anything at all to this register.