I ran into a similar problem while working on the gameboy driver. It should eventually be fixed once a cpu core becomes sub-cycle accurate. I worked around this by making the execute_set_input method public and calling that directly as well from the driver code.
I'm going to apply the same workaround -- have you committed yours?