I also discovered an edge case with my implementation in v062.
It seems a lot of games are really good at parallelizing, and will write to $4202, $4204 and $4205 immediately after writing to $4203 and $4206.
What happens is when you write to $4203 and $4206, it caches the current values (WRMPYA + WRMPYB or WRDIVA + WRDIVB), and then each stage of the ALU doesn't actually employ a variable bit-shifter (obviously), but shifts those temporary values by one place each time. In other words, the internal values get decimated after the computation is complete; hence the need for the internal variables.
I guess programmers realized this and took advantage of it to edge out a little more speed.
You'll need to support this, or Seiken Densetsu 3 and Winter Gold, to name two, glitch in certain areas.
Lastly, I understand the importance of adding these things, especially while my knowledge about them is strong and I can help. But things like MUL/DIV are only going to cause new bugs, and not fix any games. With the problems you guys are having with general CPU timing, PPU rendering, etc ... it may not be the best idea to be going after these extreme edge cases, heh. I mean, there's hundreds of these things. Still, I'm happy to assist if you want this added now.