So it's the same one that creeped into arm7_read_32? smile
I tested that SUB PC,4 again and found out I got the opcode wrong. In the end this instruction takes 3 cycles just as any jump would. Also, barrel shifter should add .5 to any instruction using register count.

Code
diff -Nru old/arm7i.c new/arm7i.c
--- old/arm7i.c	2008-02-18 00:47:09.000000000 +0100
+++ new/arm7i.c	2008-02-18 00:48:03.000000000 +0100
@@ -335,6 +335,7 @@
 
   if (ARM7.kod & (1 << 4))
     {
+    s_cykle++;
     // shift count in Rs (8 lowest bits)
     if (Rm != ARM7_PC)
       w = ARM7.Rx [Rm];
@@ -890,7 +891,7 @@
     {
     if (Rd == ARM7_PC)
       {
-      s_cykle++;
+      s_cykle += 4;
       // copy current SPSR to CPSR
       ARM7_SetCPSR (ARM7.Rx [ARM7_SPSR]);
       }

This needs to be patched against the source I've sent via mail. I think there were some Polish leftovers in that code, but just a line or two. I had double definitions and it didn't throw an error on compile. Let me just remind you that ARM7_Execute now needs to be fed with twice the number of cycles you want to run. This is to allow for all those .5 differences.