Previous Thread
Next Thread
Print Thread
Page 1 of 3 1 2 3
Job in 30-60 days; air code grievances here #72774 08/27/11 11:29 AM
Joined: May 2009
Posts: 1,804
J
Just Desserts Offline OP
Very Senior Member
OP Offline
Very Senior Member
J
Joined: May 2009
Posts: 1,804
When we last left off, yours truly had little to no free time left to devote to MAME and MESS pursuits, putting HLSL et al aside in favor of trying to make a living after being one of a whole bunch of ex-Activision folks in an ex-Guitar Hero world.

Fast forward to today: Things are within a stone's throw of going into the black with the venture that me and a bunch of friends have founded, and with positive cash inflow comes free time!

With that in mind, I want to get together a strike list of action items for the HLSL system. I heard that nimitz got the faux scanline jitter working again. Hooray! That having been taken care of, here's what I've got on my list off the top of my head, please feel free to add to it:

1) (Near term) Clean up HLSL .ini support. It currently always generates a .ini for a game; instead, there should be an -readhlslini and -writehlslini flag to explicitly enable parameter writing and to explicitly specify the use of a .ini.

2) (Long term) "Fast64" build of MESS. I've been all talk, no walk with some of my theories as to how the N64 driver can be accelerated while maintaining low-level emulation, but even then I've kept my theories mostly behind closed doors. For the sake of goading myself into actually implementing some of this stuff in a custom build of MESS, my plan of getting 60fps N64 in MESS on a commodity i7 is twofold, addressing the two major performance choke points in the driver.

3) Fast64, Preface: 2D games on the N64 driver are ostensibly a non-issue. On my non-Sandy Bridge i7, Bust-a-Move 2 runs on the high side of the 90% range, Rampage: World Tour tops out at about 60% due to the higher resolution choking out the RDP more than the 320x240 resolution of the former, and Namco Museum 64 does every last thing on the main CPU with a simple framebuffer, clearing several hundred percent when unthrottled, thus absolving the main CPU of any real hand in our performance woes. With that in mind, I ran profiles across a number of 3D N64 games. Among them were Super Mario 64, Diddy Kong Racing, and Castlevania 64; each from a different company from roughly different times. Each ran between 25% and 35% at a maximum. Repeated gprof runs showed the worst performance offenders to be a tossup between the RSP's vector operations and the RDP's drawing operations. Similar types of operations were performed regardless of game - on the RSP: vector multiply-and-accumulate, vector multiply, vector comparison, vector loads, vector stores, but with the individual ops varying on a per-game basis. On the RDP: blending, color combination, triangle setup, scanline rendering, and auxiliary functions such as Z evaluation, again with the individual configurations varying on a per-game basis. In no case did a game hit all possible RSP vector opcodes or even approach rendering using all possible RDP configurations; an important point for compatibility regression testing, as it is imperative to find a combination of games that provide a more or less complete sweep of opcodes and RDP configurations. Performance bottlenecks shifted depending on scene triangle count (RSP load) and scene resolution (RDP load). This information is important for the following two points.

4) Fast64, RSP: SSSE3 is a godsend in this regard. One of the most terrible things for performance given a lack of deep branch prediction is cascading comparisons. The more complex operations in the RSP are essentially nothing but an 8-iteration loop of comparisons wrapped around simple operations on 16-bit words. This is terrible for performance, but if these simple operations can be made "horizontal" by way of equivalent vector opcodes on modern CPUs, then a great deal of performance can be gained back. The 128-bit / 8x16-bit-word / etc. XMM registers present on modern x86 and x64 chips, as well as SSE2 and SSE3, are already useful in this regard, but all present unacceptable performance loss in the case of the RSP due to the strange way in which the RSP can perform indirection on elements in its vector opcodes. pshufhw and pshuflw in the SSE2 opcodes are essentially "close but no cigar"; they can only operate on the upper four or lower four words in a 128-bit XMM register, and cannot perform an arbitrary shuffle. Enter the pshufb opcode with SSSE3, which allows arbitrary permutation of the bytes in an XMM register, which is overkill, but lets us perform a two-pass mix of the 16-bit words in two XMM registers. pshufb combined with the various SSE2 opcodes will likely allow greatly accelerated geometry calculations on the emulated N64.

5) Fast64, RDP: Simplicity itself, though doing so in a core-compatible way will be a bit finicky. It would not be difficult, given the relatively high instruction count allowed by HLSL 3.0, programmable texture samplers, and a few sufficiently large pre-calculated lookup textures to get even the most complex RDP setup implemented in HLSL. Certain aspects may need additional thinking - for example, performing three render passes per triangle, one for a coverage pre-pass, one for a Z pre-pass, and one for everything else. It would result in unacceptably high VRAM usage to compile the shaders for all possible combinations of RDP flags at once, but astute readers will recall the finale of point 3: no games use all combinations of RDP flags all the time, and no games use all combinations of RDP flags throughout the entire game in the first place. Thus, a decent caching system should keep the majority (if not all) shader programs "hot" in memory, but not use terribly much of it while doing so. Lastly, many games - even first-party titles like Super Mario 64 - will intentionally run with a double-buffered rate of 30Hz rather than 60Hz, using the RDP to build the rendered frame across two frames. This being the case, it should not even be necessary to attain 60Hz rendering in most cases.

Fire away!

Re: Job in 30-60 days; air code grievances here [Re: Just Desserts] #72775 08/27/11 11:36 AM
Joined: Sep 2009
Posts: 239
D
Dr. Spankenstein Offline
Senior Member
Offline
Senior Member
D
Joined: Sep 2009
Posts: 239
Thought it might be worth mentioning this thread. (In case you didn't already know). smile

Re: Job in 30-60 days; air code grievances here [Re: Just Desserts] #72778 08/27/11 01:39 PM
Joined: Nov 1999
Posts: 655
B
Bletch Online Content
Senior Member
Online Content
Senior Member
B
Joined: Nov 1999
Posts: 655
I'm in the process of a CoCo driver rewrite. This is an ideal opportunity to embrace HLSL.

Re: Job in 30-60 days; air code grievances here [Re: Just Desserts] #72785 08/27/11 04:59 PM
Joined: Apr 2011
Posts: 288
B2K24 Offline
Senior Member
Offline
Senior Member
Joined: Apr 2011
Posts: 288
Hi Just Desserts, Glad things in RL are going better for you.

There appears a problem with HLSL and Genesis system. Please when you have time view it at my dropbox link.
If you were already aware of this then disregard the post smile

Best of luck with your future projects and endeavors.

http://dl.dropbox.com/u/10573028/sonic%20%282%29.png




Re: Job in 30-60 days; air code grievances here [Re: Just Desserts] #72786 08/27/11 05:01 PM
Joined: Mar 2001
Posts: 16,300
R
R. Belmont Online Content
Very Senior Member
Online Content
Very Senior Member
R
Joined: Mar 2001
Posts: 16,300
That's caused by dynamic resolution switching. I'm surprised nimitz or whoever didn't fix that already smile

Re: Job in 30-60 days; air code grievances here [Re: R. Belmont] #72797 08/28/11 05:27 AM
Joined: Jul 2007
Posts: 4,625
A
Anna Wu Offline
Very Senior Member
Offline
Very Senior Member
A
Joined: Jul 2007
Posts: 4,625
Nice to see you are back here, JD. smile
I wish you all the best in your new job.

Re: Job in 30-60 days; air code grievances here [Re: Anna Wu] #72798 08/28/11 05:32 AM
Joined: May 2009
Posts: 1,804
J
Just Desserts Offline OP
Very Senior Member
OP Offline
Very Senior Member
J
Joined: May 2009
Posts: 1,804
Originally Posted By Anna Wu
Nice to see you are back here, JD. smile
I wish you all the best in your new job.


Thanks, Anna! smile

Just wondering, can anyone provide more info on the HLSL "crease" issue that is still being reported? I'm wondering if it's an issue that is specific to certain brands of GPU chipsets, as my HD 5970 doesn't appear to render anything out of the ordinary for, say, Gyruss.

Edit: Oh hey, Space Invaders looks all kindsa screwed-up on my machine. Time to fix!

Re: Job in 30-60 days; air code grievances here [Re: Just Desserts] #72802 08/28/11 06:54 AM
Joined: May 2009
Posts: 1,804
J
Just Desserts Offline OP
Very Senior Member
OP Offline
Very Senior Member
J
Joined: May 2009
Posts: 1,804
Well, I've got phosphorescence properly fixed again, and I've also added -hlsl_write_ini and -hlsl_read_ini. With the former you can tweak settings at runtime without obliterating your finely-crafted custom INI settings, and with the latter you can optionally forgo the custom INI in favor of the primary INI settings. Checking into the MAME baseline shortly.

Re: Job in 30-60 days; air code grievances here [Re: Just Desserts] #72803 08/28/11 07:33 AM
Joined: Sep 2008
Posts: 66
J
John IV Offline
Member
Offline
Member
J
Joined: Sep 2008
Posts: 66
Does the addition of the new write and read ini's prevent the disappearance of the shadow_mask_texture aperture.png from the \hlsl\*.ini file? If you previously specified a file in mame.ini to write to like hlslini foo. That foo.ini file would lose its shadow_mask_texture setting when a new game was encountered.

Last edited by John IV; 08/28/11 07:36 AM.
Re: Job in 30-60 days; air code grievances here [Re: John IV] #72804 08/28/11 07:51 AM
Joined: May 2009
Posts: 1,804
J
Just Desserts Offline OP
Very Senior Member
OP Offline
Very Senior Member
J
Joined: May 2009
Posts: 1,804
Originally Posted By John IV
Does the addition of the new write and read ini's prevent the disappearance of the shadow_mask_texture aperture.png from the \hlsl\*.ini file? If you previously specified a file in mame.ini to write to like hlslini foo. That foo.ini file would lose its shadow_mask_texture setting when a new game was encountered.


That's fixed locally as well, and will be checked in along with the phosphor fix either this morning or tomorrow. smile

Page 1 of 3 1 2 3

Who's Online Now
5 registered members (rfka01, zino, Bletch, 2 invisible), 157 guests, and 3 spiders.
Key: Admin, Global Mod, Mod
ShoutChat Box
Comment Guidelines: Do post respectful and insightful comments. Don't flame, hate, spam.
Forum Statistics
Forums9
Topics8,683
Posts114,012
Members4,863
Most Online510
Aug 26th, 2019
Powered by UBB.threads™ PHP Forum Software 7.7.3