|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
In anticipation of the port to the two similar arm cortex 8 devices running the linux kernel (Pandora and N900), i thought i'd start a thread here for discussion. I am not very familiar with MAME internals, so i hope to get up to speed and get to the point that i can contribute upstream from the to-be Maemo package.
0.135u2 I grabbed the 0.135u2 release and built it for the host system (Kubuntu Karmic/AMD64) and tested a few ROMs. Building is so effortless and the project is so much more polished than last time i visited it--kudos to all the devs.
I have the scratchbox environment of the SDK for debian-based maemo5 (Nokia's N900 OS) on an x86 virtual box. I tried to build for the x86 target, and had no trouble. However,
> ./mame sf2yyc -sdlvideofps -video soft -oslog -v Build version: 0.135u2 (Dec 1 2009) Build architecure: SDLMAME_ARCH= Build defines 1: SDLMAME_UNIX=1 SDLMAME_X11=1 SDLMAME_LINUX=1 Build defines 1: LSB_FIRST=1 NDEBUG=1 DISTRO=generic SYNC_IMPLEMENTATION=tc SDL/OpenGL defines: SDL_COMPILEDVERSION=1212 USE_OPENGL=1 USE_DISPATCH_GL=1 Compiler defines A: __GNUC__=4 __GNUC_MINOR__=2 __GNUC_PATCHLEVEL__=1 __VERSION__="4.2.1" Compiler defines B: __unix__=1 __i386__=1 Compiler defines C: __USE_FORTIFY_LEVEL=0 SDL Device Driver : x11 SDL Monitor Dimensions: 800 x 480
And then a whole lot of this: 'maincpu' (0C066A): unmapped program memory word write to 180000 = 0000 & FFFF 'maincpu' (0C068E): unmapped program memory word write to 180000 = 0014 & FFFF 'maincpu' (000320): unmapped program memory word write to 8001FE = 0000 & FFFF
Nothing is ever displayed.
./mame gngt -v Parsing mame.ini Build version: 0.135u2 (Dec 1 2009) Build architecure: SDLMAME_ARCH= Build defines 1: SDLMAME_UNIX=1 SDLMAME_X11=1 SDLMAME_LINUX=1 Build defines 1: LSB_FIRST=1 NDEBUG=1 DISTRO=generic SYNC_IMPLEMENTATION=tc SDL/OpenGL defines: SDL_COMPILEDVERSION=1212 USE_OPENGL=1 USE_DISPATCH_GL=1 Compiler defines A: __GNUC__=4 __GNUC_MINOR__=2 __GNUC_PATCHLEVEL__=1 __VERSION__="4.2.1" Compiler defines B: __unix__=1 __i386__=1 Compiler defines C: __USE_FORTIFY_LEVEL=0 SDL Device Driver : x11 SDL Monitor Dimensions: 800 x 480 Using SDL single-window soft driver (SDL 1.2) Keyboard: Start initialization Input: Adding Kbd #1: System keyboard Keyboard: Registered System keyboard Keyboard: End initialization Mouse: Start initialization Input: Adding Mouse #1: System mouse Mouse: Registered System mouse Mouse: End initialization Joystick: Start initialization Joystick: End initialization Audio initialized - driver: alsa, frequency: 48000, channels: 2, samples: 1024 sdl_create_buffers: creating stream buffer of 57344 bytes ouput: unable to open output notifier file /tmp/sdlmame_out AY-3-8910/YM2149 using legacy output levels! AY-3-8910/YM2149 using legacy output levels! Soft reset 'maincpu' (6036): unmapped program memory byte write to 3D01 = 00 'maincpu' (615C): unmapped program memory byte write to 3D01 = 01
This one shows a single frame just as it is killed (<ESC>), so i'm guessing i messed up something with the display. Should it be compiled without OGL at all? I couldn't see a setting for it.
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
That looks like possibly an endian issue - is the N900 big-endian?
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
Little endian. This is still on the SDK running in scratchbox. I'm taking it one step at a time. Host PC -> SDK scratchbox in x86 VM -> armel target compile -> N900 test. To clarify, the SDK isn't an emulator. It tries to replicate the OS and repos of the N900, but runs binaries compiled for the x86 target. Only made it to step 2. 
Last edited by Flandry; 12/01/09 07:16 PM.
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
Understood. The scrachbox is x86 (ie, 32-bit) even though you're running on an x64 host?
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
The SDK supposedly has issues with AMD64, so for simplicity the virtual box image is Karmic x86. The host for that virtual box is indeed 64-bit. Whether that actually simplifies things i don't know, but it seemed a good idea at the time.
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
Understood. I was just trying to figure out if you needed PTR64=1, although normally having it set wrong results in many compiler errors 
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
I switched to armel target to see if it would build there. It didn't get very far:
Compiling src/osd/sdl/strconv.c... cc1: warnings being treated as errors In file included from src/emu/devintrf.h:19, from src/emu/video.h:18, from src/emu/mame.h:16, from src/osd/sdl/strconv.c:21: src/emu/memory.h: In function 'memory_decrypted_read_word': src/emu/memory.h:1109: warning: cast increases required alignment of target type src/emu/memory.h: In function 'memory_decrypted_read_dword': src/emu/memory.h:1116: warning: cast increases required alignment of target type src/emu/memory.h: In function 'memory_decrypted_read_qword': src/emu/memory.h:1123: warning: cast increases required alignment of target type src/emu/memory.h: In function 'memory_raw_read_word': src/emu/memory.h:1152: warning: cast increases required alignment of target type src/emu/memory.h: In function 'memory_raw_read_dword': src/emu/memory.h:1159: warning: cast increases required alignment of target type src/emu/memory.h: In function 'memory_raw_read_qword': src/emu/memory.h:1166: warning: cast increases required alignment of target type make: *** [obj/sdl/mame/osd/sdl/strconv.o] Error 1
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
Right, so i turned off optimizations so it wouldn't treat warnings as errors and ran into the expected problem of no opengl support. How do i disable opengl for building, not just during execution?
I think going to software rendering first and then worrying about gles is the way to go.
|
|
|
|
Joined: May 2008
Posts: 4,930 Likes: 24
Very Senior Member
|
Very Senior Member
Joined: May 2008
Posts: 4,930 Likes: 24 |
Change src/osd/sdl/sdl.mak and uncomment the line reading I've never tried it, though.
A mind is like a parachute. It doesn't work unless it's open. [Frank Zappa]
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
Hmm, that option compiles on the host (where opengl.h is available, so that doesn't necessarily mean much), but in the armel target, it fails like this: Linking mame... obj/sdl/mame/libocore.a(sdlwork.o): In function `osd_work_queue_wait': sdlwork.c:(.text+0x404): undefined reference to `osd_yield_processor' obj/sdl/mame/libocore.a(sdlwork.o): In function `osd_work_item_wait': sdlwork.c:(.text+0xbc8): undefined reference to `osd_yield_processor' obj/sdl/mame/libocore.a(sdlwork.o): In function `worker_thread_entry': sdlwork.c:(.text+0x10b0): undefined reference to `osd_yield_processor' collect2: ld returned 1 exit status make: *** [mame] Error 1 Progress, anyway--got as far as linking. 
Last edited by Flandry; 12/04/09 10:19 PM.
|
|
|
|
Joined: Feb 2004
Posts: 2,625 Likes: 332
Very Senior Member
|
Very Senior Member
Joined: Feb 2004
Posts: 2,625 Likes: 332 |
You're supposed to write those functions in assembly language.
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
Given we're talking about a single-processor target (at least for N900/Pandora - other ARM device mileage may vary) they can probably be relatively trivially stubbed in C.
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
Thanks, i knew there was some stuff like that in there and rbelmont is working on it. I guess it's not all been given a c alternative yet in this version.
edit: oops i'm slow. I'll take a look at the code later and see if i can do so.
Last edited by Flandry; 12/04/09 10:51 PM.
|
|
|
|
Joined: Jul 2006
Posts: 87
Member
|
Member
Joined: Jul 2006
Posts: 87 |
Wouldn't it be enough to define NO_THREAD_COOPERATIVE in the SDL makefile?
BTW, you'll probably be disappointed by the speed of MAME on Cortex-A8.
Last edited by ldesnogu; 12/08/09 11:13 AM.
|
|
|
|
Joined: Feb 2007
Posts: 507
Senior Member
|
Senior Member
Joined: Feb 2007
Posts: 507 |
Wouldn't it be enough to define NO_THREAD_COOPERATIVE in the SDL makefile? That is completely unrelated to inlined assembler functions. Apart from the fact that the NO_THREAD_COOPERATIVE define is history now...
|
|
|
|
Joined: Sep 2004
Posts: 392 Likes: 4
Senior Member
|
Senior Member
Joined: Sep 2004
Posts: 392 Likes: 4 |
Just use osdmini/miniwork.c instead of the one that comes with the SDL port and you should be good.
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
Actually there's code to that effect if SDLMAME_NOASM is defined, and it looks like that will be defined if NOASM is defined (i'm guessing that's what DEFS += -DSDLMAME_NOASM does), but i don't see where NOASM should be defined. Perhaps a (commented out by default) line is missing from the user config section of the make files?
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
The user config section of the makefile is deprecated. Pass parameters directly on the command line (or define them in your environment with .bashrc or whatever).
% make NOASM=1
Last edited by R. Belmont; 12/10/09 05:37 PM.
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
Ah, thanks.
Perhaps it should be removed or a comment noting that fact added, then. Anyway, presumably there's a reference if the makefile itself isn't it. where can a reference of valid parameters be found?
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
With the noasm flag, it compiles and links for the armel target. It also compiles (as it did before) and runs (with working output now) on the x86 SDK, but veeeerrry slowly.
I'm still waiting for delivery of my N900 to do on-device testing, but this seems like a good start!
What bit of the project should i focus on to start optimizing it? If you give me a specific task and context, i can probably contribute; i just don't have any general understanding of the project as a whole.
Last edited by Flandry; 12/10/09 06:44 PM.
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
If the SDK's in a VM without accelerated video that's probably to be expected. Run in a window and shrink the window so it's exactly the size of the game to remove scaling from the equation and you should get the actual performance. In the case of the x86 VM I'd expect it to be probably 20-30% slower than native in that instance. Making the window bigger or running full-screen in the VM will degrade performance substantially.
ETA: I should mention that the Pandora's slipped to January now (*sigh*) so that limits my ability to do much towards this at the moment.
Last edited by R. Belmont; 12/11/09 04:11 PM.
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
I'm definitely not expecting the Pandora to be out before i want to have this on the N900 in working condition. Also, there's only so much i can do to test performance without putting it on the N900 and seeing how it runs. I am expecting mine by the end of the month. The rule of thumb on the #maemo channel is that most things that work on N900, don't in SDK, so as you can imagine, it's not a great way to predict performance.
So what i'd like to know is what bits of code that are optimized asm for x86 would most greatly benefit by having armel asm instead of C, and work on that. Unless there's another obvious optimization you are aware of...
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
The general assembly helpers used by the actual emulation are all in src/emu/eigccx86.h (x86 and x64) and eigccppc.h (PowerPC). There's math, I think some endian-swap helpers, and thread-sync primitives. Having an ARM version of that file is the first necessary step.
|
|
|
|
Joined: Feb 2004
Posts: 2,625 Likes: 332
Very Senior Member
|
Very Senior Member
Joined: Feb 2004
Posts: 2,625 Likes: 332 |
There's some more sophisticated assembly in src/osd/sdl/sdlsync_ntc.c, but that should only be needed if you have multiple CPUs. But speaking of this, why is sdlsync_tc.c used on Linux and generic UINX? I would have thought sdlsync_ntc.c should still give better performance.
|
|
|
|
Joined: Jul 2006
Posts: 87
Member
|
Member
Joined: Jul 2006
Posts: 87 |
What bit of the project should i focus on to start optimizing it? If you give me a specific task and context, i can probably contribute; i just don't have any general understanding of the project as a whole. You won't get much out of a chip with such small L1 (16KB + 16 KB) and L2 (256KB) caches running MAME. Your best bet is probably to optimize the CPU cores, where optimize means use some existing ARM assembly core (but beware of those, they probably aren't very accurate). The next step probably is to write a new DRC back-end. (All of that should only be done once you'll have profiled SDLMAME on the real target of course.)
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
My N900 is supposed to be delivered around Christmas, so i'll be able to try it out and see what/how it works. Is there a WIP version of sdlmame being released soon? If not i'll make a package of this one to have available when i'm away from my dev box.
Re: profiling--is there a mame-specific way to do this?
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
Various standard profilers have been used successfully on MAME including Valgrind and gprof.
|
|
|
|
Joined: Jul 2006
Posts: 87
Member
|
Member
Joined: Jul 2006
Posts: 87 |
Sorry valgrind can't be trusted... especially as it won't run on ARM  oprofile is pretty good, but I don't know how well it runs on Cortex-A8 (especially given that there are CPU errata affecting performance counters).
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
I thought the point of ARM was that the core's so simple there can't be errata ;-)
|
|
|
|
Joined: Jul 2006
Posts: 87
Member
|
Member
Joined: Jul 2006
Posts: 87 |
Nah, the point of ARM is to replace Intel, so they have to keep the number of errata high enough 
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
Got my N900 this week and tried out the armel build of sdlmame. The program begins to load and the console screen pops up briefly but then it crashes. Here's a the last bit of -verbose etting SDL_VIDEO_GL_DRIVER = '' ... Build version: 0.135u3 (Dec 26 2009) Build architecure: SDLMAME_ARCH= Build defines 1: SDLMAME_UNIX=1 SDLMAME_X11=1 SDLMAME_LINUX=1 SDLMAME_NOASM=1 Build defines 1: LSB_FIRST=1 NDEBUG=1 DISTRO=generic SYNC_IMPLEMENTATION=tc SDL/OpenGL defines: SDL_COMPILEDVERSION=1212 USE_OPENGL=0 Compiler defines A: __GNUC__=4 __GNUC_MINOR__=2 __GNUC_PATCHLEVEL__=1 __VERSION__="4.2.1" Compiler defines B: __unix__=1 Compiler defines C: __USE_FORTIFY_LEVEL=0 SDL Device Driver : x11 SDL Monitor Dimensions: 800 x 480 Using SDL single-window soft driver (SDL 1.2) Keyboard: Start initialization Input: Adding Kbd #1: System keyboard Keyboard: Registered System keyboard Keyboard: End initialization Mouse: Start initialization Input: Adding Mouse #1: System mouse Mouse: Registered System mouse Mouse: End initialization Joystick: Start initialization Joystick: End initialization Audio initialized - driver: pulse, frequency: 48000, channels: 2, samples: 512 sdl_create_buffers: creating stream buffer of 57344 bytes ouput: unable to open output notifier file /tmp/sdlmame_out Illegal instruction
|
|
|
|
Joined: Feb 2007
Posts: 507
Senior Member
|
Senior Member
Joined: Feb 2007
Posts: 507 |
Which game did you try to run?
Given the "Illegal instruction", I recommend to compile with "OPTIMIZE=0 DEBUG=1" to get you started.
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
What, no GDB backtrace? 
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
That was the startup of the front end with no ROM specified, and compiled with OPTIMIZE=0. I'm guessing the console front end is a custom ROM (hence the bad instruction) and that suggests a pretty fundamental corruption of emulation. Perhaps its due to the endless warning on type casts from the earlier post. And yes, i guess it's time to break out the debugger.
The same build parameters work fine in the scratchbox (x86) target.
Last edited by Flandry; 12/26/09 11:17 PM.
|
|
|
|
Joined: May 2009
Posts: 2,225 Likes: 387
Very Senior Member
|
Very Senior Member
Joined: May 2009
Posts: 2,225 Likes: 387 |
I'm guessing the console front end is a custom ROM Err... what?
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
I'm guessing the console front end is a custom ROM Err... what? Ie what goes on when mame is run without specifying a ROM. For it to get an illegal instruction without even beginning an emulation there most be something exceptional about that code (pun intended). Anyway, building with DEBUG=1 fails. Compiling src/osd/sdl/sdlos_unix.c...
src/osd/sdl/sdlos_unix.c: In function 'osd_break_into_debugger':
src/osd/sdl/sdlos_unix.c:117: warning: implicit declaration of function 'kill'
src/osd/sdl/sdlos_unix.c:117: error: 'SIGTRAP' undeclared (first use in this function)
src/osd/sdl/sdlos_unix.c:117: error: (Each undeclared identifier is reported only once
src/osd/sdl/sdlos_unix.c:117: error: for each function it appears in.)
make[1]: *** [obj/sdl/mamed/osd/sdl/sdlos_unix.o] Error 1 The debug will have to wait until i have time to see what's going on.
Last edited by Flandry; 12/26/09 11:22 PM.
|
|
|
|
Joined: Jan 2006
Posts: 3,694
Very Senior Member
|
Very Senior Member
Joined: Jan 2006
Posts: 3,694 |
I'm guessing the console front end is a custom ROM Err... what? Ie what goes on when mame is run without specifying a ROM. For it to get an illegal instruction without even beginning an emulation there most be something exceptional about that code (pun intended). when you launch MAME without specifying game there is no custom ROM involved: it simply starts loading the empty driver from src/emu/drivers/empty.c (for your information, MAME could well support romless games with no problems, and indeed there are romless systems in MESS  ).
|
|
|
|
Joined: Feb 2007
Posts: 507
Senior Member
|
Senior Member
Joined: Feb 2007
Posts: 507 |
That's been mentioned before. Put a "include <signal.h>" in sdlos_unix.c. Without a stack trace honestly there is not much hope for support on a target which most (all?) devs do not have access to.
|
|
|
|
Joined: Jul 2006
Posts: 87
Member
|
Member
Joined: Jul 2006
Posts: 87 |
Well I got a N900 for Chrismas, so I will probably give SDLMAME a try... once I have installed the SDK on my Fedora 11 64-bit that is.
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
I've heard the N900 SDK is sufficiently Ubuntu-centric that on Fedora you're better off installing an Ubuntu VM image and running the SDK in that. Good luck 
|
|
|
|
Joined: Jul 2006
Posts: 87
Member
|
Member
Joined: Jul 2006
Posts: 87 |
I got the SDK installed using this page: http://www.niemueller.de/blog/id/234.It seems to work, though I don't know what to do next. What I probably need is some trivial example to compile. More time to spend reading doc I guess  The last time I tried to install Maemo on CentOS 5.x, I got a coredump from Xephyr. It looks like this was fixed.
|
|
|
|
Joined: Jul 2006
Posts: 87
Member
|
Member
Joined: Jul 2006
Posts: 87 |
I could reproduce Flandry Illegal instruction error. gdb tells me the error happens @0x0072f37c (gdb) disassemble 0x72f370 0x72f37f
Dump of assembler code from 0x72f370 to 0x72f37f:
0x0072f370 <floor+7473708>: mov r0, r4
0x0072f374 <floor+7473712>: bl 0x6fcaec <floor+7266728>
0x0072f378 <floor+7473716>: vldr s15, [r6, #52]
0x0072f37c <floor+7473720>: vcmp.f32 s15, #0.0
(gdb) info registers r6
r6 0x500b237 83931703
Looks bad, 0x0072f378 is doing an unaligned load... But the kernel is configured to fix that up, and the memory @r6 is accessible. Also as far as I know the vcmp instruction is valid. To sum up I don't know what the issue is  I'll rebuild with symbols. EDIT: I forgot to say that the executable runs fine on the QEMU ARM provided with the SDK. No big surprise, QEMU isn't that accurate  EDIT 2: I found the issue. Even though the kernel should fixup alignment issues, it fails to recognize the vldr instruction as a load instruction. So MAME will have to be fixed not to generate such loads  For info the offending line is 1029 in sdlmame0136/src/emu/video.c.
Last edited by ldesnogu; 01/01/10 10:37 AM.
|
|
|
|
Joined: Jul 2006
Posts: 87
Member
|
Member
Joined: Jul 2006
Posts: 87 |
The problem seems to be solved. The issue is that the inline_config field of _device_config is starting with a tag which length isn't necessarily a multiple of 4 and hence the data used isn't aligned. Something like this: --- src/emu/devintrf.c~ 2010-01-01 07:20:48.000000000 +0100
+++ src/emu/devintrf.c 2010-01-01 12:17:08.771259165 +0100
@@ -134,6 +134,7 @@
device_config *device, *tempdevice;
tagmap_error tmerr;
UINT32 configlen;
+ UINT32 additionalalign;
assert(devlist != NULL);
assert(devlist->map != NULL);
@@ -146,8 +147,11 @@
/* get the size of the inline config */
configlen = (UINT32)devtype_get_info_int(type, DEVINFO_INT_INLINE_CONFIG_BYTES);
+ /* add some room to make sure the device structure is aligned to a multiple of 4 */
+ additionalalign = (4 - (strlen(tag) + 1)) & 3;
+
/* allocate a new device */
- device = (device_config *)alloc_array_or_die(UINT8, sizeof(*device) + strlen(tag) + configlen);
+ device = (device_config *)alloc_array_or_die(UINT8, sizeof(*device) + strlen(tag) + configlen + additionalalign);
/* add to the map */
tmerr = tagmap_add_unique_hash(devlist->map, tag, device, FALSE);
@@ -180,7 +184,7 @@
device->clock = device->owner->clock * ((device->clock >> 12) & 0xfff) / ((device->clock >> 0) & 0xfff);
}
device->static_config = NULL;
- device->inline_config = (configlen == 0) ? NULL : (device->tag + strlen(tag) + 1);
+ device->inline_config = (configlen == 0) ? NULL : (device->tag + strlen(tag) + 1 + additionalalign);
/* ensure live fields are all cleared */
device->machine = NULL; Now I need to understand why SDL doesn't accept the keyboard return key, but given how lazy^Wbusy I am, I will perhaps leave that to someone else 
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
Thanks for tracking that down. I haven't used gdb for years and then only briefly and was having trouble getting it working, or rather getting an armel binary with symbols. Anyway plenty more time to play with that ahead i think...
The keymap issue was easy to resolve. I get Average speed: 34.46% (84 seconds) in Ghosts and Goblins, autosave worked. Average speed: 26.54% (77 seconds) in SF2
Sound is not really good but it's a start. Unfortunately the Fn key doesn't work as a key map but otherwise the control options seem pretty complete.
I'll grab the new version and put up the patched package.
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
Pandora's just been delayed again, but I'll be looking at ARM SDLMAME either way hopefully soonish since Palm just announced a native C/C++ SDK with SDL (!) and OpenGL for WebOS phones, and I have a Pre.
Last edited by R. Belmont; 01/07/10 08:27 PM.
|
|
|
|
Joined: Jul 2006
Posts: 87
Member
|
Member
Joined: Jul 2006
Posts: 87 |
OMAP3 has been obsoleted by Tegra2 :-)
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
Sure, if you can figure out some way to get a shell prompt in an Audi ;-)
|
|
|
|
Joined: Jul 2006
Posts: 87
Member
|
Member
Joined: Jul 2006
Posts: 87 |
No need for that  BTW nVidia T2 dev board is available and costs $400.
Last edited by ldesnogu; 01/08/10 10:53 PM.
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
Can random people order it? Otherwise I'll wait for the inevitable OMAP4 version of the BeagleBoard =)
|
|
|
|
Joined: Jul 2006
Posts: 87
Member
|
Member
Joined: Jul 2006
Posts: 87 |
As long as you live in the US or Canada you can order now. You need a project too and probably a company name. http://tegradeveloper.nvidia.com/te...-developer-kit-and-how-much-does-it-costCurrently, you must apply for and be accepted as a registered NVIDIA developer. While we cannot guarantee that every application will be accepted, applications that thoroughly answer the questions in the registered developer application have a higher rate of acceptance. Applications with misleading or false information will be removed with no further review. If the questions require that you breach confidentiality, please contact tegradev@nvidia.com and request an NVIDIA representative contact your company as soon as possible.
The Tegra developer kits are currently priced at $399 and are currently only available in the US and Canada. We are working on making the developer kits internationally available where export laws allow, but we have no specific date yet. EDIT: BTW the BeagleBoard OMAP4 will exist for sure, as I was told by the guy behind the project.
Last edited by ldesnogu; 01/08/10 11:24 PM.
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
Nice. A9 sounds great on paper (shorter pipeline, out-of-order), can't wait to see real-world benchmarks.
|
|
|
|
Joined: Jul 2006
Posts: 87
Member
|
Member
Joined: Jul 2006
Posts: 87 |
I don't have the right to quote numbers... The 25% speed increase in DMIPS over A8 quoted on ARM site isn't unusual. It's a really nice chip, but I'm very biased 
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
Alright, finally got back to sdlmame. The feeling completely lost was a bit off-putting.  I installed the oprofile kernel module and userspace tools on my N900 and figured out all i have to do to generate the missing symbols is SYMBOLS=1... I have recorded a few profiles and will paste them momentarily. Adding an extra ~80 MB of size to the executable noticeably slows things down on its own, but i guess it's a rough guide. I was wondering if there's a simple way to disable certain drivers without breaking things. Nothing >1999 is going to run on the OMAP, anyway, and it would be nice to speed up building and reduce binary size. Also, do you have any suggestions on which ROMs are best for testing? Here's the breakdown of usage by the mame executable. Notably, the second highest usage was by pulseaudio. Setting no sound in the cfg must not be enough? 369990 68.1283 mame-symbols
GPTIMER_CYCLES:16|
samples| %|
------------------
331149 89.5021 mame-symbols
28745 7.7691 no-vmlinux
3605 0.9744 libc-2.5.so
1745 0.4716 libpulsecommon-0.9.15.so
1219 0.3295 libpulse.so.0.8.0
1159 0.3133 libpthread-2.5.so
1045 0.2824 libm-2.5.so
721 0.1949 libSDL-1.2.so.0.11.1
59717 10.9960 pulseaudio
GPTIMER_CYCLES:16|
samples| %|
------------------
19093 31.9725 no-vmlinux
16969 28.4157 module-nokia-voice.so
5997 10.0424 libpulsecommon-0.9.15.so
4573 7.6578 libpulsecore-0.9.15.so And here's the breakdown by function. Two big winners here are gfx related. Counted GPTIMER_CYCLES events (32KiHz timer clock cycles between interrupts) with a unit mask of 0x00 (No unit mask) count 16
samples % image name symbol name
108496 29.3240 mame-symbols drawsdl_rgb565_draw_quad_palette16_none
102284 27.6451 mame-symbols get_texel_palette16_nearest
30087 8.1318 mame-symbols ay8910_update
28745 7.7691 no-vmlinux /no-vmlinux
14177 3.8317 mame-symbols drawsdl_rgb565_draw_rect
4520 1.2217 mame-symbols generate_resampled_data
4345 1.1744 mame-symbols chan_calc
4170 1.1271 mame-symbols drawsdl_rgb565_draw_quad_argb32_alpha
3939 1.0646 mame-symbols cpu_execute_m6809
3605 0.9744 libc-2.5.so /lib/libc-2.5.so
3343 0.9035 mame-symbols cpu_execute_z80
3277 0.8857 mame-symbols read_byte_generic
2827 0.7641 mame-symbols advance_eg_channel
This is for the Ghosts 'n Goblins ROM. I tried a few other older and slightly newer ROMs and every one runs at about 11% according to the message in xterm; i guess that's because those two drawing routines are overwhelmingly the bottleneck. Running the stripped binary from my last build it runs at about 33%. I'm building now with symbols and optimizations to see if it shows a different bottleneck.
Last edited by Flandry; 01/10/10 07:51 PM.
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
That's not at all surprising - it indicates that the software renderer is eating your lunch. You need to get OpenGL running.
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
Things look different with -o3:
samples % image name symbol name
65701 46.2634 mame-o3-symbols drawsdl_rgb565_setup_and_draw_textured_quad
14508 10.2158 no-vmlinux /no-vmlinux
6459 4.5481 mame-o3-symbols ay8910_update
4897 3.4482 libc-2.5.so /lib/libc-2.5.so
4539 3.1961 mame-o3-symbols cpu_execute_z80
4522 3.1842 mame-o3-symbols ym2203_update_one
3393 2.3892 mame-o3-symbols drawsdl_rgb565_draw_rect
That was quite the optimization... What exactly does that line mean: Average speed: 34.91% (42 seconds) Because it seemed to be running at full speed. Does that mean it was automatically dropping frames? What bits of armel conversion are you working on?
Last edited by Flandry; 01/10/10 10:57 PM.
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
Heh, yeah, trying to profile unoptimized MAME is probably not worthwhile. Press F11 during the emulation to see frameskips and actual speed - 34.91% means precisely that you were only getting 1/3rd speed.
|
|
|
|
Joined: Feb 2003
Posts: 168
Senior Member
|
Senior Member
Joined: Feb 2003
Posts: 168 |
Here's the breakdown of usage by the mame executable. Notably, the second highest usage was by pulseaudio. Setting no sound in the cfg must not be enough? If I remember correctly, two or more years ago MAME changed the nosound flag so that it only prevents you from hearing it, but it is still rendered (so it still uses cpu cycles).
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
Yeah, we had to do that because on a fair number of games the game won't run properly without the sound CPU (on Atari games it processes the coin inputs, for instance).
|
|
|
|
Joined: Jul 2006
Posts: 87
Member
|
Member
Joined: Jul 2006
Posts: 87 |
You don't need to set SYMBOLS to 1 to get symbols, that's a MAME oddity  Just add -g (and perhaps -fno-omit-frame-pointer for oprofile backtrace; I'm not sure whether it's needed or not) to CCOMFLAGS in the appropriate place. And also remove the stripping of the executable. As Arbee pointed out, you shouldn't waste profiling unoptimized code. Oh and BTW SDL is sometimes extremely crappy. On an emulator I was working on it was doing a stupid conversion that was eating 2/3 of the total time. Replacing the offending routine with my version made the blitting almost negligible. OTOH converting SDL OpenGL to OpenGL ES 2 probably is the way to go  EDIT: contrary to a popular belief adding -g to gcc doesn't change anything to the quality of generated code.
Last edited by ldesnogu; 01/11/10 07:21 PM.
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
Hey exciting news with the closer tracking of upstream!
I found that using an yuv overlay gives pretty good performance but the setup and drawing of the quad is still the highest single user of processor. Have you made a start on optimizing that for arm or implementing GL ES support that i would be duplicating?
Most of the problems i have run into in the learning process here are due to weird/obsolete stuff in makefile, so hopefully the new merged project will clean that process up. If not i may submit some patches.
Ldesnogu: did you get .136 to build with the extra extern var declaration added? I couldn't get it to link.
|
|
|
|
Joined: Feb 2010
Posts: 2
Member
|
Member
Joined: Feb 2010
Posts: 2 |
Hello, Flandry:
Currently, I have compile the mame 0.136 for s3c6410(arm11 running linux 2.6.28.6) soc chip. But the mame speed is very slow, I found you say that using the yuv overlay will give pretty good performance. Can you tell me how to open the yuv overlay in mame.
Thank you!
zircon
|
|
|
|
Joined: May 2008
Posts: 4,930 Likes: 24
Very Senior Member
|
Very Senior Member
Joined: May 2008
Posts: 4,930 Likes: 24 |
You probably want to set the 'scalemode' option:
$ mame64 -showusage | grep scalemode
-scalemode Scale mode: none, async, yv12, yuy2, yv12x2, yuy2x2 (-video soft only)
But I'm not sure if that makes it faster for you... or if that answers your question  .
Last edited by qmc2; 02/28/10 11:05 AM.
A mind is like a parachute. It doesn't work unless it's open. [Frank Zappa]
|
|
|
|
Joined: Feb 2010
Posts: 2
Member
|
Member
Joined: Feb 2010
Posts: 2 |
I have try the none, async, yv12, yuy2, yv12x2, yuy2x2 options. But if using any y* options, the speed is slower than the none option.
I think the N900 have the yuv overlay accelerated function in framebuffer driver, but my linux no support it.
|
|
|
|
Joined: May 2008
Posts: 4,930 Likes: 24
Very Senior Member
|
Very Senior Member
Joined: May 2008
Posts: 4,930 Likes: 24 |
Is it just a scaling problem, that is, how well does it perform in "native" resolutions? Try "-window -nomaximize"... if it's not fast enough either, than it's no X driver issue anyway.
Or to just test the CPU, try running with "-str 30 -video none".
Did you even run with "-video soft" -- the scalemode settings are only valid with software video (that's the default).
IIRC, the N900 has an SGX chip and the latest X.org fbdev drivers should at least contain parts of the hw-accel features, but I've no clue about the details...
Last edited by qmc2; 03/01/10 01:25 PM.
A mind is like a parachute. It doesn't work unless it's open. [Frank Zappa]
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
After running into problems during 0.136 releases, 0.137 builds and runs on Maemo 5 with just a few tweaks to the src/osd/osdmini/miniwork.c file. I added a cast and changed the calls to free and malloc to osd_free and osd_malloc, respectively. Very nice work. 
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
I'm pretty sure changing those calls means you're going to trash memory or something, but couriersud knows the details and I don't.
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
Eek, that doesn't sound good.
From looking at the .h file it seemed like the wrapper functions were the appropriate fixes. Hopefully he'll chime in if that's not the case.
It seems to work ok so far anyway.
Did you give up on your Pandora ever coming?
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
Apparently the power bricks for the Pandora arrived at the assembly site in the UK but not the cases or PCBs yet. I'm pretty much just laughing it off at this point.
Palm's PDK (native-code devkit) is out but annoyingly it requires either Mac or Windows, which is funny since the actual devices run Linux, SDL, and OpenGL.
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
Currently using 0.138 on Maemo 5 (N900). There's a pronounced slowdown at the beginning of rtype that lasts for about 30 seconds, and then it runs at ~90%.
Profiling shows that libgcc_s.so is consuming 21% of the cycles during this initial period and the highest mame routine (cpuexec_timeslice(running_machine*)) 17%.
After that initial slow period, the distribution of cycles changes, with ym2151_update_one consuming 11.6% and all the rest of the routines less than 10%.
What is going on here? Is there some kind of dynarec built in?
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
There is no dynarec for rtype. Probably just that the startup code of the game incurs more reschedules due to testing timers and such. I gather from ym2151 being expensive that the N900 doesn't have hardware floating point?
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
There is no dynarec for rtype. Probably just that the startup code of the game incurs more reschedules due to testing timers and such. Would that kind of MAME activity cause a lot of time to be spent in libgcc? I gather from ym2151 being expensive that the N900 doesn't have hardware floating point? It does, and has both neon and vfp modes, but it seems that gcc doesn't support them well. I've just discovered this post on FP optimizations on the Pandora hardware (~N900) and am trying out some of them. Previously i was just specifying -mcpu=cortex-a8 and -mfpu=neon. After reading that page i am thinking it may be worthwhile trying to add arm ASM helper functions after all, but it's daunting with no ARM ASM experience.
Last edited by Flandry; 07/21/10 12:22 AM.
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
Sorry, I misread your post, I thought you had fingered some reschedule-related function and you didn't. I don't know what's spending time in libgcc without seeing better data, but on most platforms that means either 64-bit integer or floating point being emulated in software. MAME does make extensive use of both, which generally is only an issue on ARM targets nowadays. You may be better off sticking with MAME4ALL on that target.
|
|
|
|
Joined: Dec 2009
Posts: 24
Member
|
Member
Joined: Dec 2009
Posts: 24 |
Thanks.
It's true that MAME4All performs great. My (possibly perverse) goal is to get modern MAME running well on ARM and i think it can do a lot better with some optimizations.
To move forward, i need a bit of guidance if you please. You mention software emulation of FP. I'm still using osd/miniwork.c (NOASM) and am trying to see where optimizations might be made. Assuming a minimal core work function is the starting point, where should i be looking?
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
I'd love to get modern MAME running well, I agree it can, but I don't have anything near the data I need. Valgrind's sample profiler can give you a complete call trace for hotspots - it's pretty much necessary to know that to understand why libgcc is eating all the CPU time.
|
|
|
|
Joined: Feb 2003
Posts: 168
Senior Member
|
Senior Member
Joined: Feb 2003
Posts: 168 |
Just my 2�, but wouldn't a machine like the new toshiba ac100 be a more adequate hardware to try to port mame to ARM? I hear it is going to be priced around $500 USD so it is not too expensive.
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
Sure, but it should be possible to get at least the classics to work well on N900/BeagleBoard/Pandora level hardware. It may simply require waiting for a stable version of Clang 2.0. Apple's slides from WWDC indicate that Clang's generated code averages 2 to 5 times faster for ARM targets (it's much less flashy for x86/x64, proving once again that GCC for non-x86 targets can get pretty dire).
Last edited by R. Belmont; 07/21/10 03:22 PM.
|
|
|
|
Joined: Jul 2006
Posts: 87
Member
|
Member
Joined: Jul 2006
Posts: 87 |
2 to 5 times faster?!? Your bullshit detector should have warned you  I've tested recent versions of gcc for ARM and they are pretty good. I'd be surprised to see Clang beat them. Except for one thing: IIRC Clang can use NEON floating-point instructions instead of standard FP ones; that would give a boost for Cortex-A8 (but not for Cortex-A9 such as Tegra2), but then you lose IEEE-754 compliance; so twice faster for carefully chosen small FP loops, yes; for real programs even 10% would be nice. Even armcc isn't 10% faster than gcc... Back to Flandry issue, I don't think Nokia SDK would rely on FP emulation. We'd need a real profiling of MAME to see what's happening...
|
|
|
|
Joined: Feb 2004
Posts: 2,625 Likes: 332
Very Senior Member
|
Very Senior Member
Joined: Feb 2004
Posts: 2,625 Likes: 332 |
I'd believe it - SunPRO gets 2 to 3 times the performance of GCC on SPARC because GCC's codegen is atrocious.
|
|
|
|
Joined: Feb 2007
Posts: 507
Senior Member
|
Senior Member
Joined: Feb 2007
Posts: 507 |
Sure, but it should be possible to get at least the classics to work well on N900/BeagleBoard/Pandora level hardware. Some of the classics have discrete sound emulation. This is 100% floating point. It does however not rely on IEEE compliance.
|
|
|
|
Joined: Jul 2006
Posts: 87
Member
|
Member
Joined: Jul 2006
Posts: 87 |
I'd believe it - SunPRO gets 2 to 3 times the performance of GCC on SPARC because GCC's codegen is atrocious. I'm sorry but I've never found armcc (ARM Ltd own compiler) that much faster than gcc except for some *very* specific things (e.g., detecting widening multiplications). So I won't believe it... until proven wrong 
|
|
|
|
Joined: May 2009
Posts: 2,225 Likes: 387
Very Senior Member
|
Very Senior Member
Joined: May 2009
Posts: 2,225 Likes: 387 |
Sure, but it should be possible to get at least the classics to work well on N900/BeagleBoard/Pandora level hardware. Some of the classics have discrete sound emulation. This is 100% floating point. It does however not rely on IEEE compliance. If Aaron is willing and you're willing, I wonder if there would be any interest in defining an "mFloat" data type. On platforms with hardware FP, it would be a #define to float. On platforms without hardware FP, it would fall back to a core class that overloads the necessary operators to implement decent fixed-point. Thoughts?
|
|
|
|
Joined: Mar 2001
Posts: 17,258 Likes: 267
Very Senior Member
|
Very Senior Member
Joined: Mar 2001
Posts: 17,258 Likes: 267 |
Couriersud: you're correct of course, but e.g. Dig Dug runs quite well on the SDLMAME port someone did for the Wii, which is a 730 MHz PowerPC. So I think non-discrete sound games should be capable of running well on the sort of ARM hardware under discussion.
|
|
|
|
Joined: Feb 2007
Posts: 507
Senior Member
|
Senior Member
Joined: Feb 2007
Posts: 507 |
Couriersud: you're correct of course, but e.g. Dig Dug runs quite well on the SDLMAME port someone did for the Wii, which is a 730 MHz PowerPC. So I think non-discrete sound games should be capable of running well on the sort of ARM hardware under discussion. Of course. Anything e.g. using a 8910 should work. I just wanted to inject some "expectation-management" into this thread in order to avoid that people later wonder why e.g. Donkey Kong does not perform on such a build.
|
|
|
1 members (yugffuts),
177
guests, and
0
robots. |
Key:
Admin,
Global Mod,
Mod
|
|
Forums9
Topics9,355
Posts122,424
Members5,082
|
Most Online1,283 Dec 21st, 2022
|
|
These forums are sponsored by Superior Solitaire, an ad-free card game collection for macOS and iOS. Download it today!
|
|
|
|