There is no dynarec for rtype. Probably just that the startup code of the game incurs more reschedules due to testing timers and such.
Would that kind of MAME activity cause a lot of time to be spent in libgcc?
I gather from ym2151 being expensive that the N900 doesn't have hardware floating point?
It does, and has both neon and vfp modes, but it seems that gcc doesn't support them well. I've just discovered
this post on FP optimizations on the Pandora hardware (~N900) and am trying out some of them. Previously i was just specifying -mcpu=cortex-a8 and -mfpu=neon.
After reading that page i am thinking it may be worthwhile trying to add arm ASM helper functions after all, but it's daunting with no ARM ASM experience.