Previous Thread
Next Thread
Print Thread
Page 8 of 10 1 2 6 7 8 9 10
Joined: Dec 2005
Posts: 330
A
AWJ Offline
Senior Member
Senior Member
A Offline
Joined: Dec 2005
Posts: 330
Possible stupid question: How much of this optimization work (e.g. the stuff about cache line sizes), if any, is specific to the Core 2 architecture and how much is generically applicable? I mean, I know that MAMEDev is full of unapologetic Intel fanboys wink and I can sort of understand the attitude that "C2s are the only thing fast enough to run these drivers anywhere close to full speed, so C2s are all we care about right now" but what happens whe^H^H^Hif the Phenom comes out, kicks as much ass as K7 and K8 both did on their initial release, and MAME's multithreading implementation is grossly suboptimal on it?

Joined: Feb 2004
Posts: 2,628
Likes: 339
Very Senior Member
Very Senior Member
Joined: Feb 2004
Posts: 2,628
Likes: 339
64-byte synchronisation structures will improve performance on all current PowerPC chips. Also, my hand-tuned assembly should be good for any chips it runs on.

Joined: Mar 2001
Posts: 17,261
Likes: 267
R
Very Senior Member
Very Senior Member
R Offline
Joined: Mar 2001
Posts: 17,261
Likes: 267
Nothing we're doing is specific to any chip, but you'll get best results running 64-bit [SDL]MAME on a 64-bit OS. And if Phenom were actually good I assume we would've seen leaks saying so by now, like we did with K8 and C2D.

Joined: Oct 2006
Posts: 1,017
Likes: 21
S
Very Senior Member
Very Senior Member
S Offline
Joined: Oct 2006
Posts: 1,017
Likes: 21
I just wanted to say that you guys are kicking major ass and I love following this thread, weekly. Good stuff.

Joined: Mar 2001
Posts: 17,261
Likes: 267
R
Very Senior Member
Very Senior Member
R Offline
Joined: Mar 2001
Posts: 17,261
Likes: 267
Time to resurrect this thread again. I've got u2 compiling but it's radically slow again - Blitz is 48% in-game instead of 105 in u1. That's more than a bit of a regression. Vas, could you take a look?

http://rbelmont.mameworld.info/sdlmame0120u2_pre1.zip

Joined: Feb 2004
Posts: 2,628
Likes: 339
Very Senior Member
Very Senior Member
Joined: Feb 2004
Posts: 2,628
Likes: 339
OK, will do.

Joined: Feb 2004
Posts: 2,628
Likes: 339
Very Senior Member
Very Senior Member
Joined: Feb 2004
Posts: 2,628
Likes: 339
Well, I'm not sure it I've fixed the performance regression, but I did manage to do a lot of cleanups.

First up, eminline.h only defined compare_exchange_ptr if PTR64 was defined. This is silly; we need it on 32-bit platforms, too. After fixing that, I couldn't get GCC to shut up about cast warnings without using a local variable to store the result before returning.

Second, eigccppc.h was all over the place - the constraints were far less than optimal and there were invalid opcodes scattered through it. So that's all cleaned up, and I made the assembly a bit neater and clearer, too.

On top of that, the constraints were often less than optimal or even unsafe in some cases. So I straightened the constraints out and made the assembly clearer (named operands and all that).

I also changed some #if directives to use GCC's predefined macros to determine what's available rather than PTR64 - I think it's a bit safer.

I do intend to fill in some of the TBD slots in eigccXXX.h, but I'll do that later. Anyway, here's a diff of my changes in the core:

Code
diff -ur sdlmame0120u2_pre1/src/emu/eigccppc.h sdlmame0120u2_pre1/src/emu/eigccppc.h
--- sdlmame0120u2_pre1/src/emu/eigccppc.h	2007-11-02 23:32:40.000000000 +1100
+++ sdlmame0120u2_pre1/src/emu/eigccppc.h	2007-11-03 18:04:49.000000000 +1100
@@ -47,11 +47,10 @@
 	INT32 result;
 
 	__asm__ (
-		" mulhw  %0,%1,%2   \n"
-	  : "=&r" (result),
-		"+r" (val1)
-	  : "r" (val2)
-	  : "xer"
+		" mulhw  %[result], %[val1], %[val2] \n"
+		: [result] "=r" (result)
+		: [val1]   "%r" (val1)
+		, [val2]   "r"  (val2)
 	);
 
 	return result;
@@ -70,11 +69,10 @@
 	UINT32 result;
 
 	__asm__ (
-		" mulhwu %0,%1,%2   \n"
-	  : "=&r" (result),
-		"+r" (val1)
-	  : "r" (val2)
-	  : "xer"
+		" mulhwu  %[result], %[val1], %[val2] \n"
+		: [result] "=r" (result)
+		: [val1]   "%r" (val1)
+		, [val2]   "r"  (val2)
 	);
 
 	return result;
@@ -95,26 +93,26 @@
 
 #if defined(__ppc64__) || defined(__PPC64__)
 	__asm__ (
-		" mulld  %0,%1,%2 \n"
-		" srd    %0,%0,%3 \n"
-	  : "=&r" (result)			/* result can go in any register */
-	  : "%r" (val1)				/* any register, can swap with val2 */
-		"r" (val2)				/* any register */
-		"r" (shift)				/* any register */
+		" mulld  %[result], %[val1], %[val2]    \n"
+		" srd    %[result], %[result], %[shift] \n"
+		: [result] "=&r" (result)	/* result can go in any register */
+		: [val1]   "%r"  (val1)		/* any register, can swap with val2 */
+		, [val2]   "r"   (val2)		/* any register */
+		, [shift]  "r"   (shift)	/* any register */
 	);
 #else
 	__asm__ (
-		" mullw  %0,%2,%3   \n"
-		" mulhw  %2,%2,%3   \n"
-		" srw    %0,%0,%1   \n"
-		" subfic %1,%1,0x20 \n"
-		" slw    %2,%2,%1   \n"
-		" or     %0,%0,%2   \n"
-	  : "=&r" (result),
-		"+r" (shift),
-		"+r" (val1)
-	  : "r" (val2)
-	  : "xer"
+		" mullw   %[result], %[val1], %[val2]    \n"
+		" mulhw   %[val1], %[val1], %[val2]      \n"
+		" srw     %[result], %[result], %[shift] \n"
+		" subfic  %[shift], %[shift], 0x20       \n"
+		" slw     %[val1], %[val1], %[shift]     \n"
+		" or      %[result], %[result], %[val1]  \n"
+		: [result] "=&r" (result)
+		, [shift]  "+r"  (shift)
+		, [val1]   "+r"  (val1)
+		: [val2]   "r"   (val2)
+		: "xer"
 	);
 #endif
 
@@ -136,26 +134,26 @@
 
 #if defined(__ppc64__) || defined(__PPC64__)
 	__asm__ (
-		" mulldu %0,%1,%2 \n"
-		" srd    %0,%0,%3 \n"
-	  : "=&r" (result)			/* result can go in any register */
-	  : "%r" (val1)				/* any register, can swap with val2 */
-		"r" (val2)				/* any register */
-		"r" (shift)				/* any register */
+		" mulld  %[result], %[val1], %[val2]    \n"
+		" srd    %[result], %[result], %[shift] \n"
+		: [result] "=&r" (result)	/* result can go in any register */
+		: [val1]   "%r"  (val1)		/* any register, can swap with val2 */
+		, [val2]   "r"   (val2)		/* any register */
+		, [shift]  "r"   (shift)	/* any register */
 	);
 #else
 	__asm__ (
-		" mullw  %0,%2,%3   \n"
-		" mulhwu %2,%2,%3   \n"
-		" srw    %0,%0,%1   \n"
-		" subfic %1,%1,0x20 \n"
-		" slw    %2,%2,%1   \n"
-		" or     %0,%0,%2   \n"
-	  : "=&r" (result),
-		"+r" (shift),
-		"+r" (val1)
-	  : "r" (val2)
-	  : "xer"
+		" mullw   %[result], %[val1], %[val2]    \n"
+		" mulhwu  %[val1], %[val1], %[val2]      \n"
+		" srw     %[result], %[result], %[shift] \n"
+		" subfic  %[shift], %[shift], 0x20       \n"
+		" slw     %[val1], %[val1], %[shift]     \n"
+		" or      %[result], %[result], %[val1]  \n"
+		: [result] "=&r" (result)
+		, [shift]  "+r"  (shift)
+		, [val1]   "+r"  (val1)
+		: [val2]   "r"   (val2)
+		: "xer"
 	);
 #endif
 
@@ -237,9 +235,9 @@
 	UINT32 result;
 
 	__asm__ (
-		" cntlzw %0,%1 \n"
-		: "=r" (result)		/* result can be in any register */
-		: "r" (value)		/* 'value' can be in any register */
+		" cntlzw  %[result], %[value] \n"
+		: [result] "=r" (result)	/* result can be in any register */
+		: [value]  "r"  (value)		/* 'value' can be in any register */
 	);
 
 	return result;
@@ -257,10 +255,10 @@
 	UINT32 result;
 
 	__asm__ (
-		" not %0,%1 \n"
-		" cntlzw %0,%0 \n"
-		: "=r" (result)		/* result can be in any register */
-		: "r" (value)		/* 'value' can be in any register */
+		" not     %[result], %[value]  \n"
+		" cntlzw  %[result], %[result] \n"
+		: [result] "=r" (result)	/* result can be in any register */
+		: [value]  "r"  (value)		/* 'value' can be in any register */
 	);
 
 	return result;
@@ -290,7 +288,7 @@
 		"   bne    2f                     \n"
 		"   sync                          \n"
 		"   stwcx. %[exchange], 0, %[ptr] \n"
-		"   bne- 1b                       \n"
+		"   bne-   1b                     \n"
 		"2:                                 "
 		: [result]   "=&r" (result)
 		: [ptr]      "r"   (ptr)
@@ -310,7 +308,7 @@
     return the previous value at 'ptr'.
 -------------------------------------------------*/
 
-#ifdef PTR64
+#if defined(__ppc64__) || defined(__PPC64__)
 #define compare_exchange64 _compare_exchange64
 INLINE INT64 _compare_exchange64(INT64 volatile *ptr, INT64 compare, INT64 exchange)
 {
@@ -320,7 +318,6 @@
 		"1: ldarx  %[result], 0, %[ptr]   \n"
 		"   cmpd   %[compare], %[result]  \n"
 		"   bne    2f                     \n"
-		"   sync                          \n"
 		"   stdcx. %[exchange], 0, %[ptr] \n"
 		"   bne--  1b                     \n"
 		"2:                                 "
@@ -352,7 +349,7 @@
 		"   sync                          \n"
 		"   stwcx. %[exchange], 0, %[ptr] \n"
 		"   bne-   1b                     \n"
-		: [result]      "=&r" (result)
+		: [result]   "=&r" (result)
 		: [ptr]      "r"   (ptr)
 		, [exchange] "r"   (exchange)
 		: "cr0"
@@ -373,15 +370,15 @@
 {
 	register INT32 result;
 
-	__asm __volatile__ (
-		"1: lwarx  %[result], 0, %[ptr]     \n"
+	__asm__ __volatile__ (
+		"1: lwarx  %[result], 0, %[ptr]           \n"
 		"   add    %[result], %[result], %[delta] \n"
-		"   sync                            \n"
-		"   stwcx. %[result], 0, %[ptr]     \n"
-		"   bne- 1b                         \n"
-		: [result]   "=&b" (result)
-		: [ptr]   "r"   (ptr)
-		, [delta] "r"   (delta)
+		"   sync                                  \n"
+		"   stwcx. %[result], 0, %[ptr]           \n"
+		"   bne-   1b                             \n"
+		: [result] "=&b" (result)
+		: [ptr]    "r"   (ptr)
+		, [delta]  "r"   (delta)
 		: "cr0"
 	);
 
diff -ur sdlmame0120u2_pre1/src/emu/eigccx86.h sdlmame0120u2_pre1/src/emu/eigccx86.h
--- sdlmame0120u2_pre1/src/emu/eigccx86.h	2007-11-02 09:15:36.000000000 +1100
+++ sdlmame0120u2_pre1/src/emu/eigccx86.h	2007-11-03 18:20:53.000000000 +1100
@@ -23,17 +23,18 @@
     multiply and return the full 64 bit result
 -------------------------------------------------*/
 
-#ifndef PTR64
+#ifndef __x86_64__
 #define mul_32x32 _mul_32x32
 INLINE INT64 _mul_32x32(INT32 a, INT32 b)
 {
 	INT64 result;
 
 	__asm__ (
-		"imull %2;"
-      : "=A,A" (result)			/* result in edx:eax */
-      : "0,0" (a),				/* 'a' should also be in eax on entry */
-        "r,m" (b)				/* 'b' can be memory or register */
+		" imull  %[b] ;"
+		: [result] "=A" (result)	/* result in edx:eax */
+		: [a]      "%a"  (a)		/* 'a' should also be in eax on entry */
+		, [b]      "rm"  (b)		/* 'b' can be memory or register */
+		: "%cc"						/* Clobbers condition codes */
 	);
 
 	return result;
@@ -47,17 +48,18 @@
     result
 -------------------------------------------------*/
 
-#ifndef PTR64
+#ifndef __x86_64__
 #define mulu_32x32 _mulu_32x32
 INLINE UINT64 _mulu_32x32(UINT32 a, UINT32 b)
 {
 	UINT64 result;
 
 	__asm__ (
-		"mull  %2;"
-      : "=A,A" (result)			/* result in edx:eax */
-      : "0,0" (a),				/* 'a' should also be in eax on entry */
-        "r,m" (b)				/* 'b' can be memory or register */
+		" mull  %[b] ;"
+		: [result] "=A" (result)	/* result in edx:eax */
+		: [a]      "%a"  (a)		/* 'a' should also be in eax on entry */
+		, [b]      "rm"  (b)		/* 'b' can be memory or register */
+		: "%cc"						/* Clobbers condition codes */
 	);
 
 	return result;
@@ -71,24 +73,21 @@
     result
 -------------------------------------------------*/
 
-#ifndef PTR64
 #define mul_32x32_hi _mul_32x32_hi
 INLINE INT32 _mul_32x32_hi(INT32 a, INT32 b)
 {
 	INT32 result;
 
 	__asm__ (
-		"imull %2;"
-		"movl  %%edx,%0;"
-      : "=a,a" (result)			/* result ends up in eax */
-      : "0,0" (a),				/* 'a' should also be in eax on entry */
-        "r,m" (b)				/* 'b' can be memory or register */
-      : "%edx", "%cc"			/* clobbers EDX and condition codes */
+		" imull  %[b] ;"
+		: [result] "=d"  (result)	/* result in edx */
+		, [a]      "+%a" (a)		/* 'a' should be in eax on entry (clobbered) */
+		: [b]      "rm"  (b)		/* 'b' can be memory or register */
+		: "%cc"						/* Clobbers condition codes */
 	);
 
 	return result;
 }
-#endif
 
 
 /*-------------------------------------------------
@@ -97,24 +96,21 @@
     of the result
 -------------------------------------------------*/
 
-#ifndef PTR64
 #define mulu_32x32_hi _mulu_32x32_hi
 INLINE UINT32 _mulu_32x32_hi(UINT32 a, UINT32 b)
 {
 	UINT32 result;
 
 	__asm__ (
-		"mull  %2;"
-		"movl  %%edx,%0;"
-      : "=a,a" (result)			/* result ends up in eax */
-      : "0,0" (a),				/* 'a' should also be in eax on entry */
-        "r,m" (b)				/* 'b' can be memory or register */
-      : "%edx", "%cc"			/* clobbers EDX and condition codes */
+		" mull  %[b] ;"
+		: [result] "=d"  (result)	/* result in edx */
+		, [a]      "+%a" (a)		/* 'a' should be in eax on entry (clobbered) */
+		: [b]      "rm"  (b)		/* 'b' can be memory or register */
+		: "%cc"						/* Clobbers condition codes */
 	);
 
 	return result;
 }
-#endif
 
 
 /*-------------------------------------------------
@@ -124,20 +120,20 @@
     result to 32 bits
 -------------------------------------------------*/
 
-#ifndef PTR64
+#ifndef __x86_64__
 #define mul_32x32_shift _mul_32x32_shift
 INLINE INT32 _mul_32x32_shift(INT32 a, INT32 b, UINT8 shift)
 {
 	INT32 result;
 
 	__asm__ (
-		"imull %2;"
-		"shrdl %3,%%edx,%0;"
-      : "=a,a,a,a" (result)		/* result ends up in eax */
-      : "0,0,0,0" (a),			/* 'a' should also be in eax on entry */
-        "r,m,r,m" (b),			/* 'b' can be memory or register */
-        "I,I,c,c" (shift)		/* 'shift' must be constant in 0-31 range or in CL */
-      : "%edx", "%cc"			/* clobbers EDX and condition codes */
+		" imull  %[b]                       ;"
+		" shrdl  %[shift], %%edx, %[result] ;"
+		: [result] "=a" (result)	/* result ends up in eax */
+		: [a]      "%0" (a)			/* 'a' should also be in eax on entry */
+		, [b]      "rm" (b)			/* 'b' can be memory or register */
+		, [shift]  "Ic" (shift)		/* 'shift' must be constant in 0-31 range or in cl */
+		: "%edx", "%cc"				/* clobbers edx and condition codes */
 	);
 
 	return result;
@@ -152,20 +148,20 @@
     result to 32 bits
 -------------------------------------------------*/
 
-#ifndef PTR64
+#ifndef __x86_64__
 #define mulu_32x32_shift _mulu_32x32_shift
 INLINE UINT32 _mulu_32x32_shift(UINT32 a, UINT32 b, UINT8 shift)
 {
 	UINT32 result;
 
 	__asm__ (
-		"imull %2;"
-		"shrdl %3,%%edx,%0;"
-      : "=a,a,a,a" (result)		/* result ends up in eax */
-      : "0,0,0,0" (a),			/* 'a' should also be in eax on entry */
-        "r,m,r,m" (b),			/* 'b' can be memory or register */
-        "I,I,c,c" (shift)		/* 'shift' must be constant in 0-31 range or in CL */
-      : "%edx", "%cc"			/* clobbers EDX and condition codes */
+		" mull   %[b]                       ;"
+		" shrdl  %[shift], %%edx, %[result] ;"
+		: [result] "=a" (result)	/* result ends up in eax */
+		: [a]      "%0" (a)			/* 'a' should also be in eax on entry */
+		, [b]      "rm" (b)			/* 'b' can be memory or register */
+		, [shift]  "Ic" (shift)		/* 'shift' must be constant in 0-31 range or in cl */
+		: "%edx", "%cc"				/* clobbers edx and condition codes */
 	);
 
 	return result;
@@ -195,30 +191,27 @@
     division, and returning the 32 bit quotient
 -------------------------------------------------*/
 
-#ifndef PTR64
-/* TBD - I get an error if this is enabled: error: 'asm' operand requires impossible reload */
-#if 0
+#ifndef __x86_64__
 #define div_32x32_shift _div_32x32_shift
 INLINE INT32 _div_32x32_shift(INT32 a, INT32 b, UINT8 shift)
 {
 	INT32 result;
 
 	__asm__ (
-	    "cdq;"
-	    "shldl %3,%0,%%edx;"
-	    "shll  %3,%0;"
-		"idivl %2;"
-      : "=a,a,a,a" (result)		/* result ends up in eax */
-      : "0,0,0,0" (a),			/* 'a' should also be in eax on entry */
-        "r,m,r,m" (b),			/* 'b' can be memory or register */
-        "I,I,c,c" (shift)		/* 'shift' must be constant in 0-31 range or in CL */
-      : "%edx", "%cc"			/* clobbers EDX and condition codes */
+	    " cdq                          ;"
+	    " shldl  %[shift], %[a], %%edx ;"
+		" shll   %[shift], %[a]        ;"
+		" idivl  %[b]                  ;"
+		: [result] "=&a" (result)	/* result ends up in eax */
+		: [a]      "0"   (a)		/* 'a' should also be in eax on entry */
+        , [b]      "rm"  (b)		/* 'b' can be memory or register */
+        , [shift]  "Ic"  (shift)	/* 'shift' must be constant in 0-31 range or in cl */
+		: "%edx", "%cc"				/* clobbers edx and condition codes */
 	);
 
 	return result;
 }
 #endif
-#endif
 
 
 /*-------------------------------------------------
@@ -227,22 +220,22 @@
     division, and returning the 32 bit quotient
 -------------------------------------------------*/
 
-#ifndef PTR64
+#ifndef __x86_64__
 #define divu_32x32_shift _divu_32x32_shift
 INLINE UINT32 _divu_32x32_shift(UINT32 a, UINT32 b, UINT8 shift)
 {
 	INT32 result;
 
 	__asm__ (
-	    "xorl  %%edx,%%edx;"
-	    "shldl %3,%0,%%edx;"
-	    "shll  %3,%0;"
-		"divl  %2;"
-      : "=a,a,a,a" (result)		/* result ends up in eax */
-      : "0,0,0,0" (a),			/* 'a' should also be in eax on entry */
-        "r,m,r,m" (b),			/* 'b' can be memory or register */
-        "I,I,c,c" (shift)		/* 'shift' must be constant in 0-31 range or in CL */
-      : "%edx", "%cc"			/* clobbers EDX and condition codes */
+	    " clr    %%edx                 ;"
+	    " shldl  %[shift], %[a], %%edx ;"
+		" shll   %[shift], %[a]        ;"
+		" divl   %[b]                  ;"
+		: [result] "=&a" (result)	/* result ends up in eax */
+		: [a]      "0"   (a)		/* 'a' should also be in eax on entry */
+        , [b]      "rm"  (b)		/* 'b' can be memory or register */
+        , [shift]  "Ic"  (shift)	/* 'shift' must be constant in 0-31 range or in cl */
+		: "%edx", "%cc"				/* clobbers edx and condition codes */
 	);
 
 	return result;
@@ -290,13 +283,13 @@
 	UINT32 result;
 
 	__asm__ (
-		"bsrl  %1,%0;"
-		"jnz   1f;"
-		"movl  $63,%0;"
-	  "1:xorl  $31,%0;"
-	  : "=r,r" (result)			/* result can be in any register */
-	  : "r,m" (value)			/* 'value' can be register or memory */
-	  : "%cc"					/* clobbers condition codes */
+		"   bsrl  %[value], %[result] ;"
+		"   jnz   1f                  ;"
+		"   movl  $63, %[result]      ;"
+		"1: xorl  $31, %[result]      ;"
+		: [result] "=r" (result)	/* result can be in any register */
+		: [value]  "rm" (value)		/* 'value' can be register or memory */
+		: "%cc"						/* clobbers condition codes */
 	);
 
 	return result;
@@ -314,15 +307,15 @@
 	UINT32 result;
 
 	__asm__ (
-		"movl  %1,%0;"
-		"notl  %0;"
-		"bsrl  %0,%0;"
-		"jnz   1f;"
-		"movl  $63,%0;"
-	  "1:xorl  $31,%0;"
-	  : "=r,r" (result)			/* result can be in any register */
-	  : "r,m" (value)			/* 'value' can be register or memory */
-	  : "%cc"					/* clobbers condition codes */
+		"   movl  %[value], %[result]  ;"
+		"   notl  %[result]            ;"
+		"   bsrl  %[result], %[result] ;"
+		"   jnz   1f                   ;"
+		"   movl  $63, %[result]       ;"
+		"1: xorl  $31, %[result]       ;"
+		: [result] "=r"  (result)	/* result can be in any register */
+		: [value]  "rmi" (value)	/* 'value' can be register, memory or immediate */
+		: "%cc"						/* clobbers condition codes */
 	);
 
 	return result;
@@ -347,7 +340,7 @@
 	register INT32 result;
 
 	__asm__ __volatile__ (
-		" lock ; cmpxchg %[exchange], %[ptr] ;"
+		" lock ; cmpxchg  %[exchange], %[ptr] ;"
 	  : [ptr]      "+m" (*ptr)
 	  , [result]   "=a" (result)
 	  : [compare]  "1"  (compare)
@@ -366,14 +359,14 @@
     return the previous value at 'ptr'.
 -------------------------------------------------*/
 
-#ifdef PTR64
+#ifdef __x86_64__
 #define compare_exchange64 _compare_exchange64
 INLINE INT64 _compare_exchange64(INT64 volatile *ptr, INT64 compare, INT64 exchange)
 {
 	register INT64 result;
 
 	__asm__ __volatile__ (
-		" lock ; cmpxchg %[exchange], %[ptr] ;"
+		" lock ; cmpxchg  %[exchange], %[ptr] ;"
 	  : [ptr]      "+m" (*ptr)
 	  , [result]   "=a" (result)
 	  : [compare]  "1"  (compare)
@@ -398,7 +391,7 @@
 	register INT32 result;
 
 	__asm__ __volatile__ (
-		" lock ; xchg %[exchange], %[ptr] ;"
+		" lock ; xchg  %[exchange], %[ptr] ;"
 		: [ptr]      "+m" (*ptr)
 		, [result]   "=r" (result)
 		: [exchange] "1"  (exchange)
@@ -419,10 +412,10 @@
 {
 	register INT32 result;
 
-	__asm __volatile__ (
-		" mov           %[delta],%[result] ;"
-		" lock ; xadd   %[result],%[ptr]   ;"
-		" add           %[delta],%[result] ;"
+	__asm__ __volatile__ (
+		" mov          %[delta],%[result] ;"
+		" lock ; xadd  %[result],%[ptr]   ;"
+		" add          %[delta],%[result] ;"
 	  : [ptr]    "+m"  (*ptr)
 	  , [result] "=&r" (result)
 	  : [delta]  "rmi" (delta)
diff -ur sdlmame0120u2_pre1/src/emu/eminline.h sdlmame0120u2_pre1/src/emu/eminline.h
--- sdlmame0120u2_pre1/src/emu/eminline.h	2007-11-02 21:40:10.000000000 +1100
+++ sdlmame0120u2_pre1/src/emu/eminline.h	2007-11-03 17:34:55.000000000 +1100
@@ -308,18 +308,19 @@
     return the previous value at 'ptr'.
 -------------------------------------------------*/
 
-#ifdef PTR64
 #ifndef compare_exchange_ptr
 INLINE void *compare_exchange_ptr(void * volatile *ptr, void *compare, void *exchange)
 {
 #ifdef PTR64
-	return (void *)compare_exchange64((INT64 volatile *)ptr, (INT64)compare, (INT64)exchange);
+	INT64 result;
+	result = compare_exchange64((INT64 volatile *)ptr, (INT64)compare, (INT64)exchange);
 #else
-	return (void *)compare_exchange32((INT32 volatile *)ptr, (INT32)compare, (INT32)exchange);
+	INT32 result;
+	result = compare_exchange32((INT32 volatile *)ptr, (INT32)compare, (INT32)exchange);
 #endif
+	return (void *)result;
 }
 #endif
-#endif
 
 
 /*-------------------------------------------------

The SDL code was a bit messy, too. It was still using some of the old osd_XXX names for functions in eminline.h, so I cleaned that up. Some stuff we still need had been removed from osinline.h, too. Also, some things in sdlsync.c weren't needed any more. Here are my changes to the SDL layer:

Code
diff -ur sdlmame0120u2_pre1/src/osd/sdl/osinline.h sdlmame0120u2_pre1/src/osd/sdl/osinline.h
--- sdlmame0120u2_pre1/src/osd/sdl/osinline.h	2007-11-02 09:24:44.000000000 +1100
+++ sdlmame0120u2_pre1/src/osd/sdl/osinline.h	2007-11-03 17:58:29.000000000 +1100
@@ -9,7 +9,66 @@
 
 #include "eminline.h"
 
+
+//============================================================
+//	INLINE FUNCTIONS
+//============================================================
+
+#if defined(__i386__) || defined(__x86_64__)
+
+
+INLINE void osd_yield_processor(void)
+{
+	__asm__ __volatile__ ( " rep ; nop ;" );
+}
+
+
+//============================================================
+//  osd_interlocked_increment
+//============================================================
+
+ATTR_UNUSED
+INLINE INT32 _osd_interlocked_increment(INT32 volatile *ptr)
+{
+	register INT32 ret;
+	__asm__ __volatile__(
+		" mov           $1,%[ret]     ;"
+		" lock ; xadd   %[ret],%[ptr] ;"
+		" inc           %[ret]        ;"
+		: [ptr] "+m"  (*ptr)
+		, [ret] "=&r" (ret)
+		: 
+		: "%cc"
+	);
+	return ret;
+}
+#define osd_interlocked_increment _osd_interlocked_increment
+
+
+//============================================================
+//  osd_interlocked_decrement
+//============================================================
+
+ATTR_UNUSED
+INLINE INT32 _osd_interlocked_decrement(INT32 volatile *ptr)
+{
+	register INT32 ret;
+	__asm__ __volatile__(
+		" mov           $-1,%[ret]    ;"
+		" lock ; xadd   %[ret],%[ptr] ;"
+		" dec           %[ret]        ;"
+		: [ptr] "+m"  (*ptr)
+		, [ret] "=&r" (ret)
+		: 
+		: "%cc"
+	);
+	return ret;
+}
+#define osd_interlocked_decrement _osd_interlocked_decrement
+
+
 #if defined(__x86_64__)
+
 //============================================================
 //  osd_exchange64
 //============================================================
@@ -30,7 +89,64 @@
 
 #endif /* __x86_64__ */
 
+
+#elif defined(__ppc__) || defined (__PPC__) || defined(__ppc64__) || defined(__PPC64__)
+
+
+INLINE void osd_yield_processor(void)
+{
+	__asm__ __volatile__ ( " nop \n nop \n" );
+}
+
+
+//============================================================
+//  osd_interlocked_increment
+//============================================================
+
+ATTR_UNUSED
+INLINE INT32 _osd_interlocked_increment(INT32 volatile *ptr)
+{
+	register INT32 ret;
+	__asm__ __volatile__(
+		"1: lwarx  %[ret], 0, %[ptr] \n"
+		"   addi   %[ret], %[ret], 1 \n"
+		"   sync                     \n"
+		"   stwcx. %[ret], 0, %[ptr] \n"
+		"   bne-   1b                \n"
+		: [ret] "=&b" (ret)
+		: [ptr] "r"   (ptr)
+		: "cr0"
+	);
+	return ret;
+}
+#define osd_interlocked_increment _osd_interlocked_increment
+
+
+//============================================================
+//  osd_interlocked_decrement
+//============================================================
+
+ATTR_UNUSED
+INLINE INT32 _osd_interlocked_decrement(INT32 volatile *ptr)
+{
+	register INT32 ret;
+	__asm__ __volatile__(
+		"1: lwarx  %[ret], 0, %[ptr]  \n"
+		"   addi   %[ret], %[ret], -1 \n"
+		"   sync                      \n"
+		"   stwcx. %[ret], 0, %[ptr]  \n"
+		"   bne-   1b                 \n"
+		: [ret] "=&b" (ret)
+		: [ptr] "r"   (ptr)
+		: "cr0"
+	);
+	return ret;
+}
+#define osd_interlocked_decrement _osd_interlocked_decrement
+
+
 #if defined(__ppc64__) || defined(__PPC64__)
+
 //============================================================
 //  osd_exchange64
 //============================================================
@@ -55,44 +171,6 @@
 #endif /* __ppc64__ || __PPC64__ */
 
 
-
-#ifndef YieldProcessor
-
-#if defined(__i386__) || defined(__x86_64__)
-
-INLINE void osd_yield_processor(void)
-{
-	__asm__ __volatile__ ( " rep ; nop ;" );
-}
-
-#elif defined(__ppc__) || defined (__PPC__) || defined(__ppc64__) || defined(__PPC64__)
-
-INLINE void osd_yield_processor(void)
-{
-	__asm__ __volatile__ ( " nop \n" );
-}
-
-#endif
-
-#else
-#define osd_yield_processor() YieldProcessor()
-#endif
-
-/*-----------------------------------------------------------------------------
-    osd_compare_exchange_ptr: INLINE wrapper to compare and exchange a
-        pointer value of the appropriate size
------------------------------------------------------------------------------*/
-#ifndef osd_compare_exchange_ptr
-INLINE void *osd_compare_exchange_ptr(void * volatile *ptr, void *compare, void *exchange)
-{
-#ifdef PTR64
-	INT64 result = compare_exchange64((INT64 volatile *)ptr, (INT64)compare, (INT64)exchange);
-	return (void *)result;
-#else
-	INT32 result = compare_exchange32((INT32 volatile *)ptr, (INT32)compare, (INT32)exchange);
-	return (void *)result;
-#endif
-}
 #endif
 
 #endif /* __OSINLINE__ */
diff -ur sdlmame0120u2_pre1/src/osd/sdl/sdlsync.c sdlmame0120u2_pre1/src/osd/sdl/sdlsync.c
--- sdlmame0120u2_pre1/src/osd/sdl/sdlsync.c	2007-11-02 09:24:24.000000000 +1100
+++ sdlmame0120u2_pre1/src/osd/sdl/sdlsync.c	2007-11-03 18:25:46.000000000 +1100
@@ -72,8 +72,6 @@
 	pthread_t			thread;
 };
 
-static osd_lock			*atomic_lck = NULL;
-
 INLINE pthread_t osd_compare_exchange_pthread_t(pthread_t volatile *ptr, pthread_t compare, pthread_t exchange)
 {
 #ifdef PTR64
@@ -395,26 +393,6 @@
 }
  
 //============================================================
-//  osd_atomic_lock
-//============================================================
-
-void osd_atomic_lock(void)
-{
-	if (!atomic_lck)
-		atomic_lck = osd_lock_alloc();
-	osd_lock_acquire(atomic_lck);
-}
-
-//============================================================
-//  osd_atomic_unlock
-//============================================================
-
-void osd_atomic_unlock(void)
-{
-	osd_lock_release(atomic_lck);
-}
-
-//============================================================
 //  osd_thread_create
 //============================================================
 
@@ -465,56 +443,10 @@
 	free(thread);
 }
 
-//============================================================
-//  osd_compare_exchange32
-//============================================================
-
-#ifndef osd_compare_exchange32
-
-INT32 osd_compare_exchange32(INT32 volatile *ptr, INT32 compare, INT32 exchange)
-{
-	INT32	ret;
-	osd_atomic_lock();
-
-	ret = *ptr;
-
-	if ( *ptr == compare )
-	{
-		*ptr = exchange;
-	}
-
-	osd_atomic_unlock();	
-	return ret;
-}
-
-#endif
-
-//============================================================
-//  osd_compare_exchange64
-//============================================================
-
-#ifndef osd_compare_exchange64
-
-INT64 osd_compare_exchange64(INT64 volatile *ptr, INT64 compare, INT64 exchange)
-{
-	INT64	ret;
-	osd_atomic_lock();
-
-	ret = *ptr;
-
-	if ( *ptr == compare )
-	{
-		*ptr = exchange;
-	}
-
-	osd_atomic_unlock();	
-	return ret;
-}
-
-#endif
 
 #else   // SDLMAME_OS2
 
+
 struct _osd_lock
 {
      HMTX   hmtx;
diff -ur sdlmame0120u2_pre1/src/osd/sdl/sdlsync.h sdlmame0120u2_pre1/src/osd/sdl/sdlsync.h
--- sdlmame0120u2_pre1/src/osd/sdl/sdlsync.h	2007-10-09 18:39:40.000000000 +1000
+++ sdlmame0120u2_pre1/src/osd/sdl/sdlsync.h	2007-11-03 18:26:04.000000000 +1100
@@ -198,41 +198,6 @@
 int osd_num_processors(void);
 
 
-/*-----------------------------------------------------------------------------
-    osd_atomic_lock: block other processes
-
-    Parameters:
-
-        None.
-
-    Return value:
-
-        None.
-    
-    Notes:
-        This will be used on certain platforms to emulate atomic operations
-        Please see osinclude.h 
------------------------------------------------------------------------------*/
-void osd_atomic_lock(void);
-
-/*-----------------------------------------------------------------------------
-    osd_atomic_unlock: unblock other processes
-
-    Parameters:
-
-        None.
-
-    Return value:
-
-        None.
-    
-    Notes:
-        This will be used on certain platforms to emulate atomic operations
-        Please see osinclude.h 
------------------------------------------------------------------------------*/
-void osd_atomic_unlock(void);
-
-
 #endif	/* SDLMAME_WIN32 */
 
 #endif	/* __SDL_SYNC__ */
diff -ur sdlmame0120u2_pre1/src/osd/sdl/sdlwork.c sdlmame0120u2_pre1/src/osd/sdl/sdlwork.c
--- sdlmame0120u2_pre1/src/osd/sdl/sdlwork.c	2007-11-02 09:17:24.000000000 +1100
+++ sdlmame0120u2_pre1/src/osd/sdl/sdlwork.c	2007-11-03 17:45:15.000000000 +1100
@@ -58,7 +58,7 @@
 //============================================================
 
 #if KEEP_STATISTICS
-#define add_to_stat(v,x)		do { osd_interlocked_add((v), (x)); } while (0)
+#define add_to_stat(v,x)		do { atomic_add32((v), (x)); } while (0)
 #define begin_timing(v)			do { (v) -= osd_profiling_ticks(); } while (0)
 #define end_timing(v)			do { (v) += osd_profiling_ticks(); } while (0)
 #else
@@ -153,17 +153,6 @@
 //  INLINE FUNCTIONS
 //============================================================
 
-#ifndef osd_exchange32
-INLINE INT32 osd_exchange32(INT32 volatile *ptr, INT32 exchange)
-{
-	INT32 origvalue;
-	do {
-		origvalue = *ptr;
-	} while (compare_exchange32(ptr, origvalue, exchange) != origvalue);
-	return origvalue;
-}
-#endif
-
 #ifndef osd_interlocked_increment
 INLINE INT32 osd_interlocked_increment(INT32 volatile *ptr)
 {
@@ -178,13 +167,6 @@
 }
 #endif
 
-#ifndef osd_interlocked_add
-INLINE INT32 osd_interlocked_add(INT32 volatile *ptr, INT32 add)
-{
-	return atomic_add32(ptr, add);
-}
-#endif
-
 INLINE void scalable_lock_init(scalable_lock *lock)
 {
 	memset(lock, 0, sizeof(*lock));
@@ -397,10 +379,10 @@
 
 	// reset our done event and double-check the items before waiting
 	osd_event_reset(queue->doneevent);
-	osd_exchange32(&queue->waiting, TRUE);
+	atomic_exchange32(&queue->waiting, TRUE);
 	if (queue->items != 0)
 		osd_event_wait(queue->doneevent, timeout);
-	osd_exchange32(&queue->waiting, FALSE);
+	atomic_exchange32(&queue->waiting, FALSE);
 
 	// return TRUE if we actually hit 0
 	return (queue->items == 0);
@@ -522,7 +504,7 @@
 		do
 		{
 			item = (osd_work_item *)queue->free;
-		} while (item != NULL && osd_compare_exchange_ptr((void * volatile *)&queue->free, item, item->next) != item);
+		} while (item != NULL && compare_exchange_ptr((void * volatile *)&queue->free, item, item->next) != item);
 
 		// if nothing, allocate something new
 		if (item == NULL)
@@ -555,7 +537,7 @@
 	scalable_lock_release(&queue->lock, lockslot);
 
 	// increment the number of items in the queue
-	osd_interlocked_add(&queue->items, numitems);
+	atomic_add32(&queue->items, numitems);
 	add_to_stat(&queue->itemsqueued, numitems);
 
 	// look for free threads to do the work
@@ -652,7 +634,7 @@
 	{
 		next = (osd_work_item *)item->queue->free;
 		item->next = next;
-	} while (osd_compare_exchange_ptr((void * volatile *)&item->queue->free, next, item) != next);
+	} while (compare_exchange_ptr((void * volatile *)&item->queue->free, next, item) != next);
 }
 
 
@@ -699,7 +681,7 @@
 			break;
 
 		// indicate that we are live
-		osd_exchange32(&thread->active, TRUE);
+		atomic_exchange32(&thread->active, TRUE);
 		osd_interlocked_increment(&queue->livethreads);
 
 		// process work items
@@ -726,7 +708,7 @@
 		}
 
 		// decrement the live thread count
-		osd_exchange32(&thread->active, FALSE);
+		atomic_exchange32(&thread->active, FALSE);
 		osd_interlocked_decrement(&queue->livethreads);
 	}
 	return NULL;

I hope that fixes the performance regression. If not, give me a yell and I'll see if I can't work out what's killing it.

Joined: Sep 2000
Posts: 258
W
Senior Member
Senior Member
W Offline
Joined: Sep 2000
Posts: 258
Hey guys, I upgraded to Leopard (10.5) with XCode 3.0.

I get almost a dozen of these warnings while linking...

ld64: warning: option -s is obsolete and being ignored

thanks



=will=
Joined: Mar 2001
Posts: 17,261
Likes: 267
R
Very Senior Member
Very Senior Member
R Offline
Joined: Mar 2001
Posts: 17,261
Likes: 267
I think it's irritating that Xcode 3 doesn't run on Tiger. Anyway, I have Leopard on order and I'll clean up the warnings when I get it (which should be for u3).

Joined: Jan 2006
Posts: 3,694
Very Senior Member
Very Senior Member
Joined: Jan 2006
Posts: 3,694
Originally Posted by R. Belmont
I think it's irritating that Xcode 3 doesn't run on Tiger.

do you think the new gcc compiler will appear also for the old Xcode? or will we need to upgrade to Leopard to be able to play virtua racing on mac?

Page 8 of 10 1 2 6 7 8 9 10

Moderated by  R. Belmont 

Link Copied to Clipboard
Who's Online Now
3 members (hap, Dorando, B2K24), 123 guests, and 1 robot.
Key: Admin, Global Mod, Mod
ShoutChat
Comment Guidelines: Do post respectful and insightful comments. Don't flame, hate, spam.
Forum Statistics
Forums9
Topics9,363
Posts122,479
Members5,082
Most Online1,283
Dec 21st, 2022
Our Sponsor
These forums are sponsored by Superior Solitaire, an ad-free card game collection for macOS and iOS. Download it today!

Superior Solitaire
Powered by UBB.threads™ PHP Forum Software 8.0.0