Previous Thread
Next Thread
Print Thread
Page 2 of 2 1 2
Joined: Jun 2008
Posts: 205
B
Senior Member
Offline
Senior Member
B
Joined: Jun 2008
Posts: 205
Yeah, I'm not sure if you are primarily targeting x86/amd64 or not; but my experiences are that branch reduction in the form of bit-twiddling is almost universally slower on x86, and usually an improvement on PPC and other archs. x86 is really, really good with minimizing branching costs.

I'd also avoid profiling anything on P4. That's a really slanted, dead architecture with many oddities that nothing modern exhibits.

Joined: Sep 2007
Posts: 40
A
Member
Offline
Member
A
Joined: Sep 2007
Posts: 40
Originally Posted By angrylion

creating a separate 16-entry LUT for this?
That would be inserting another memory read instead of cmp/cmov.


This results in no profiler-detectable improvement or speed hit. Tested in Mario 64 and California Speed (I remember the latter used mirroring more often). It looks like reading from a LUT is on average a tiny bit slower, but still all differences are within the margin of error.

Joined: May 2009
Posts: 2,071
Likes: 101
J
Very Senior Member
OP Offline
Very Senior Member
J
Joined: May 2009
Posts: 2,071
Likes: 101
Originally Posted By byuu
Yeah, I'm not sure if you are primarily targeting x86/amd64 or not; but my experiences are that branch reduction in the form of bit-twiddling is almost universally slower on x86, and usually an improvement on PPC and other archs. x86 is really, really good with minimizing branching costs.

I'd also avoid profiling anything on P4. That's a really slanted, dead architecture with many oddities that nothing modern exhibits.


I'm primarily targeting the i7-960 that I have. Quad-core, 3.2GHz.

Ultimately, I can't really deny any particular performance profiles on the P4 because I don't have one. I'm just at a loss as to how to best help people, on account of it seems like an optimization that made 20 seconds of Mario 64 run 0.5% faster overall (and around 30% faster based on tick counts for the texture pipeline-related functions) would end up being less speedy on P4.

That said, I'm not sure that the N64 driver could even run full-speed on P4 architecture. I've still got a 3.8GHz dual-core Pentium D that I run my internal SVN server on, once hardware rendering and/or threaded rendering is implemented, but I doubt even that could really hack it that well. By contrast, I'm pretty confident it should be possible to get at least Mario 64 to run full-speed on my i7, which is why I target it.

Edit: Also, byuu, consider the fact that this is an inner rendering loop, as opposed to an opcode in a CPU core. Considering roughly 2x overdraw, which I think is fair for an average scene in Mario 64 when considering the skybox, followed by the level geometry, followed by sprites, we'll be drawing roughly 153,600 pixels in a scene.

Consider that in MESS, at least, from the invocation of a triangle being drawn to the first pixel being written to the framebuffer, we're looking at probably 10-20 functions being invoked and perhaps 3-4x that number of compare/branches. In between each pixel, we maybe have half that amount, but probably 75% of our compare/branches and function invocation is occurring between each pixel. In addition, certain inner-loop functions - such as pixel fetching and blending - are called twice in two-cycle mode, pushing certain functions up to 307,200 calls per frame. Ouch!

Also, just so it doesn't get lost from the shoutbox, I suspect that you're overstating the extent to which I used bit ops. While I exchanged a ~ for a ^, on the whole I took advantage of the fact that any integer multiplied by 1 will be itself, and any integer multiplied by 0 will be 0. Scalar ops do great nowadays, which I suspect is where some of the benefits came from.

Page 2 of 2 1 2

Link Copied to Clipboard
Who's Online Now
1 members (mixmaster), 31 guests, and 3 robots.
Key: Admin, Global Mod, Mod
ShoutChat
Comment Guidelines: Do post respectful and insightful comments. Don't flame, hate, spam.
Forum Statistics
Forums9
Topics9,133
Posts119,660
Members5,029
Most Online890
Jan 17th, 2020
Our Sponsor
These forums are sponsored by Superior Solitaire, an ad-free card game collection for macOS and iOS. Download it today!

Superior Solitaire
Forum hosted by www.retrogamesformac.com