Previous Thread
Next Thread
Print Thread
Page 8 of 9 1 2 3 4 5 6 7 8 9
Joined: Dec 2009
Posts: 24
F
Member
Member
F Offline
Joined: Dec 2009
Posts: 24
Originally Posted by R. Belmont
There is no dynarec for rtype. Probably just that the startup code of the game incurs more reschedules due to testing timers and such.

Would that kind of MAME activity cause a lot of time to be spent in libgcc?

Quote
I gather from ym2151 being expensive that the N900 doesn't have hardware floating point?

It does, and has both neon and vfp modes, but it seems that gcc doesn't support them well. I've just discovered this post on FP optimizations on the Pandora hardware (~N900) and am trying out some of them. Previously i was just specifying -mcpu=cortex-a8 and -mfpu=neon.

After reading that page i am thinking it may be worthwhile trying to add arm ASM helper functions after all, but it's daunting with no ARM ASM experience.

Last edited by Flandry; 07/21/10 12:22 AM.
Joined: Mar 2001
Posts: 17,239
Likes: 263
R
Very Senior Member
Very Senior Member
R Offline
Joined: Mar 2001
Posts: 17,239
Likes: 263
Sorry, I misread your post, I thought you had fingered some reschedule-related function and you didn't. I don't know what's spending time in libgcc without seeing better data, but on most platforms that means either 64-bit integer or floating point being emulated in software. MAME does make extensive use of both, which generally is only an issue on ARM targets nowadays. You may be better off sticking with MAME4ALL on that target.

Joined: Dec 2009
Posts: 24
F
Member
Member
F Offline
Joined: Dec 2009
Posts: 24
Thanks.

It's true that MAME4All performs great. My (possibly perverse) goal is to get modern MAME running well on ARM and i think it can do a lot better with some optimizations.

To move forward, i need a bit of guidance if you please. You mention software emulation of FP. I'm still using osd/miniwork.c (NOASM) and am trying to see where optimizations might be made. Assuming a minimal core work function is the starting point, where should i be looking?

Joined: Mar 2001
Posts: 17,239
Likes: 263
R
Very Senior Member
Very Senior Member
R Offline
Joined: Mar 2001
Posts: 17,239
Likes: 263
I'd love to get modern MAME running well, I agree it can, but I don't have anything near the data I need. Valgrind's sample profiler can give you a complete call trace for hotspots - it's pretty much necessary to know that to understand why libgcc is eating all the CPU time.

Joined: Feb 2003
Posts: 168
Senior Member
Senior Member
Joined: Feb 2003
Posts: 168
Just my 2�, but wouldn't a machine like the new toshiba ac100 be a more adequate hardware to try to port mame to ARM? I hear it is going to be priced around $500 USD so it is not too expensive.

Joined: Mar 2001
Posts: 17,239
Likes: 263
R
Very Senior Member
Very Senior Member
R Offline
Joined: Mar 2001
Posts: 17,239
Likes: 263
Sure, but it should be possible to get at least the classics to work well on N900/BeagleBoard/Pandora level hardware. It may simply require waiting for a stable version of Clang 2.0. Apple's slides from WWDC indicate that Clang's generated code averages 2 to 5 times faster for ARM targets (it's much less flashy for x86/x64, proving once again that GCC for non-x86 targets can get pretty dire).

Last edited by R. Belmont; 07/21/10 03:22 PM.
Joined: Jul 2006
Posts: 87
L
Member
Member
L Offline
Joined: Jul 2006
Posts: 87
2 to 5 times faster?!? Your bullshit detector should have warned you smile

I've tested recent versions of gcc for ARM and they are pretty good. I'd be surprised to see Clang beat them. Except for one thing: IIRC Clang can use NEON floating-point instructions instead of standard FP ones; that would give a boost for Cortex-A8 (but not for Cortex-A9 such as Tegra2), but then you lose IEEE-754 compliance; so twice faster for carefully chosen small FP loops, yes; for real programs even 10% would be nice. Even armcc isn't 10% faster than gcc...

Back to Flandry issue, I don't think Nokia SDK would rely on FP emulation. We'd need a real profiling of MAME to see what's happening...

Joined: Feb 2004
Posts: 2,608
Likes: 315
Very Senior Member
Very Senior Member
Joined: Feb 2004
Posts: 2,608
Likes: 315
I'd believe it - SunPRO gets 2 to 3 times the performance of GCC on SPARC because GCC's codegen is atrocious.

Joined: Feb 2007
Posts: 507
C
Senior Member
Senior Member
C Offline
Joined: Feb 2007
Posts: 507
Originally Posted by R. Belmont
Sure, but it should be possible to get at least the classics to work well on N900/BeagleBoard/Pandora level hardware.
Some of the classics have discrete sound emulation. This is 100% floating point. It does however not rely on IEEE compliance.

Joined: Jul 2006
Posts: 87
L
Member
Member
L Offline
Joined: Jul 2006
Posts: 87
Originally Posted by Vas Crabb
I'd believe it - SunPRO gets 2 to 3 times the performance of GCC on SPARC because GCC's codegen is atrocious.
I'm sorry but I've never found armcc (ARM Ltd own compiler) that much faster than gcc except for some *very* specific things (e.g., detecting widening multiplications). So I won't believe it... until proven wrong smile

Page 8 of 9 1 2 3 4 5 6 7 8 9

Moderated by  R. Belmont 

Link Copied to Clipboard
Who's Online Now
4 members (Darkstar, farngle, hal3000, 1 invisible), 58 guests, and 2 robots.
Key: Admin, Global Mod, Mod
ShoutChat
Comment Guidelines: Do post respectful and insightful comments. Don't flame, hate, spam.
Forum Statistics
Forums9
Topics9,331
Posts122,197
Members5,077
Most Online1,283
Dec 21st, 2022
Our Sponsor
These forums are sponsored by Superior Solitaire, an ad-free card game collection for macOS and iOS. Download it today!

Superior Solitaire
Powered by UBB.threads™ PHP Forum Software 8.0.0