Originally posted by Neverbirth: Could you post some notes or benchmarks about the speed differences between MAME and ZiNc with the dynarec disabled, software plug-ing, OpenGl, etc?
Is dynarec really the main reason why ZiNc is so fast?
From what I heard, someone (can't remember who) tried using an interpreter with ZiNc and saw how it went. Aparrently, it was was still a lot faster than MAME, so the speed difference must be coming from somewhere else.
The dynarec actually doesnt do too much if you have a modern CPU (1 Ghz and over and it is not Celeron) AND A DECENT GRAPHICS CARD!. The CPU on the original arcade board is pretty slow and pretty simple.
The reason ZiNc is faster is still mainly from the GPU plugins (OpenGL or DirectX) and from more efficient memory routines than MAME has. MAME is running a lot more than just ZN based hardware so the memory routines are much more flexible and this flexibility makes them slower.
The funny thing is, that the memory routines in ZiNc are still not exactly optimal either, they could be done with some more radical aggressive optimizations, but with ZiNc running fast enough on hardware that isnt even sold any longer, why bother ? We'd still be limited by our Geforce MXes
[RB: Yes, forbidden topics are even forbidden for DynaChicken ]
Originally posted by DynaChicken: The dynarec actually doesnt do too much if you have a modern CPU (1 Ghz and over and it is not Celeron) AND A DECENT GRAPHICS CARD!.
Maybe if I'm feeling funky I'll try putting together a frankenmame that has the ZiNc dynarec. I still need to put the MAME interpreter in ZiNc as well, but I cannot remember what rendering bug I was trying to track down ( which I believe isn't a GTE or GPU, so is probably the same problem as tekken 2... ).
Mainly MAME is held back because I want to produce a 100% emulator. I haven't achieved it yet, but it's likely to get slower when I do. The original plan was to optimise the software rendering & do a x86-32 dynarec once everything was known to be correct so that it could be easily tested. However I reckon it's more likely to be a x86-64 dynarec ( unless they ramp clock speeds in the mean time ). Time is on my side, when I started I had a p2-300mhz.
ZiNc and MAME have benefited from each other & there is no real reason for competing. MAME runs fast enough on my laptop for testing & I rarely actually play games anyway.
Why bother with x86-64 for a 33 Mhz starved cache CPU ? Is it intentional forcing people to adopt x86-64 or you just want to play with a clean x86 ?
Even if you just do a dynarec with all calculations in EAX, ECX, EDX and no register caching it is still easy to code and should be more than fast enough. x86-64 still doesnt have enough register to hold all MIPS registers natively and from what I found with the average blocksize you are loading/unloading your register cache about as much as when it isnt cached at all.
My next experiment (if ever!) is going to have fixed register assignments, uses EAX, ECX, EDX for calculations and uses EBX, EBP, ESI, EDI and then the SSE registers for *fixed* register storage. I dont think that 'MOV EAX, EDX' / 'MOV EDX, EAX' or the SSE equivalent really are that much overhead with internal register renaming and out of order stuff going on in modern x86's. It also allows for 'free' MMU or memory exceptions as you no longer have to (potentially) flush a register cache on each memory access.
Originally posted by DynaChicken: Why bother with x86-64 for a 33 Mhz starved cache CPU ? Is it intentional forcing people to adopt x86-64 or you just want to play with a clean x86 ?
I suspect that x86-64 will be established by the time I get to it anyway :-) I'm also intrigued how it affects the GTE operations as they can be calculated natively.
Recompiling has the added benefit that you aren't fetching opcodes repeatedly, which avoids going through another layer of code.
We're still getting odd issues that are probably down to PSX chip set emulation, so I doubt I'll get to it anytime soon ( plus I've been playing with more interesting hardware like pacman recently :-) ).
For PSX based HW, its all about getting rid of opcode decoding and replacing it with *whatever* else.
I am pretty sure that the Author is complete done with ZiNc or PSX in general, but you never know.
There are some other things that he started that are interesting and I will be working on as well if he shows that he is actively continueing it.
Since I'm back on fulltime (paid) coding now, I dont feel the need to keep up my coding skills, but on the other hand it makes putting in an hour a day of hobby coding a bit easier (less "getting into coding").
For years I've been looking forward to x86-64 as a clean way to do recompilers, but now that it finally starts to become mainstream I dont really see the need for it anymore for 32bit emulation. More interesting for me would be to put graphics emulation on a thread (especially if its software emulation) and take full advantage of upcoming dual core and existing hyperthreading CPUs.
But who knows, if I finally buy a desktop it will be AMD-64 and if its running Windows 64 I might be tempted to do something in 64bit mode as well.
I think a caching interpreter (such as ElSemi uses for Model 2's i960 and DSPs in his standalone emulator) would be more than fast enough for PSXCPU on current hardware without the portability hassles of a full dynamic recompiler.
How about a little experiment with putting the software GPU in MAME or ZiNc on a seperate thread?
Should give hyperthreading/multicore CPUs a decent boost since most processing is memory anyway which means it can switch between threads on cachemisses and actually get something done in this 20+ stages of traffic jam called P4.
RB, you have one of these fancy P4s for sure, I'd love to hear if it actually helps.
PeteB, how about adding this to a new ZiNc GPU renderer and see if it helps ? For sure should help Pete's SoftGPU (does anyone really use that?)