|
Joined: May 2009
Posts: 2,208 Likes: 354
Very Senior Member
|
OP
Very Senior Member
Joined: May 2009
Posts: 2,208 Likes: 354 |
Make this faster:
INLINE void COMBINER_EQUATION(UINT8 *out, UINT8 *A, UINT8 *B, UINT8 *C, UINT8 *D)
{
INT32 color = (((*A-*B)* *C) + (*D << 8) + 0x80);
color >>= 8;
if (color > 255)
{
*out = 255;
}
else if (color < 0)
{
*out = 0;
}
else
{
*out = (UINT8)color;
}
}
|
|
|
|
Joined: Mar 2006
Posts: 1,079 Likes: 6
Very Senior Member
|
Very Senior Member
Joined: Mar 2006
Posts: 1,079 Likes: 6 |
INLINE void COMBINER_EQUATION(UINT8 *out, UINT8 *A, UINT8 *B, UINT8 *C, UINT8 *D)
{
INT32 color = (((*A-*B)* *C) + 0x80);
color >>= 8;
color += *D;
if (color > 255)
{
*out = 255;
}
else if (color < 0)
{
*out = 0;
}
else
{
*out = (UINT8)color;
}
}
saves a shift, but MIGHT have issues because of this? not sure. do you have a testbench for this?
"When life gives you zombies... *CHA-CHIK!* ...you make zombie-ade!"
|
|
|
|
Joined: Mar 2006
Posts: 1,079 Likes: 6
Very Senior Member
|
Very Senior Member
Joined: Mar 2006
Posts: 1,079 Likes: 6 |
oh boy, i just thought of something truly gross which may be faster still:
INLINE void COMBINER_EQUATION(UINT8 *out, UINT8 *A, UINT8 *B, UINT8 *C, UINT8 *D)
{
INT32 color = (((*A-*B)* *C) + (*D << 8) + 0x80);
// at this point color is located in the 0x0000nn00 bits
if (color&0x7FFF0000) *out=(UINT8)(~color>>23);
else
{
*out = (UINT8)(color>>8);
}
}
The trick is if any bits 0x7FF00000 are set, the value must be below 0, and if any bits 0x000F0000 are set, the value must be above 256. Because of the way the numbers are multiplied together to produce the result, the bits at 0x7F80000 will always be zero (or 1 if the number is negative) due to insufficient range of the multipliers. Hence if you invert bits 0x7F80000 and put them in the result if any bits were set, you get instant clamping. Now, can I make it faster...
Last edited by Lord Nightmare; 12/22/09 02:23 AM. Reason: add explanation pt 2.
"When life gives you zombies... *CHA-CHIK!* ...you make zombie-ade!"
|
|
|
|
Joined: Sep 2004
Posts: 392 Likes: 4
Senior Member
|
Senior Member
Joined: Sep 2004
Posts: 392 Likes: 4 |
Put the common case first and test for the outliers using a single unsigned compare. Also agree that you can move the addition of D after the shift.
INLINE void COMBINER_EQUATION(UINT8 *out, UINT8 *A, UINT8 *B, UINT8 *C, UINT8 *D)
{
INT32 color = (((*A-*B)* *C) + 0x80);
color >>= 8;
color += *D;
if ((UINT32)color < 256)
{
*out = color;
}
else if (color < 0)
{
*out = 0;
}
else
{
*out = 255;
}
}
A bigger win would be to identify a few common cases (like C == 0x100, B == 0, D == 0, etc) and check for those in the outer loops so that you can avoid a bunch of the math in the innermost loop, and might even be able to avoid the clamping (if B and D are 0, for example).
|
|
|
|
Joined: Mar 2006
Posts: 1,079 Likes: 6
Very Senior Member
|
Very Senior Member
Joined: Mar 2006
Posts: 1,079 Likes: 6 |
I think so, by combining both ways:
INLINE void COMBINER_EQUATION(UINT8 *out, UINT8 *A, UINT8 *B, UINT8 *C, UINT8 *D)
{
INT32 color = (((*A-*B)* *C) + 0x80);
color >>= 8;
color += *D;
// at this point color is located in the 0x000000nn bits
if (color&0x7FFFFF00) *out=(UINT8)(~color>>11);
else
{
*out = (UINT8)(color);
}
}
"When life gives you zombies... *CHA-CHIK!* ...you make zombie-ade!"
|
|
|
|
Joined: Sep 2004
Posts: 392 Likes: 4
Senior Member
|
Senior Member
Joined: Sep 2004
Posts: 392 Likes: 4 |
Cute, that should work even better. Still, put the common case first (i.e., swap the if/else).
|
|
|
|
Joined: Mar 2006
Posts: 1,079 Likes: 6
Very Senior Member
|
Very Senior Member
Joined: Mar 2006
Posts: 1,079 Likes: 6 |
Like this?
INLINE void COMBINER_EQUATION(UINT8 *out, UINT8 *A, UINT8 *B, UINT8 *C, UINT8 *D)
{
INT32 color = (((*A-*B)* *C) + 0x80);
color >>= 8;
color += *D;
// at this point color is located in the 0x000000nn bits
if ((color&0x7FFFFF00)==0) *out = (UINT8)color;
else *out=(UINT8)(~color>>11);
}
"When life gives you zombies... *CHA-CHIK!* ...you make zombie-ade!"
|
|
|
|
Joined: May 2009
Posts: 2,208 Likes: 354
Very Senior Member
|
OP
Very Senior Member
Joined: May 2009
Posts: 2,208 Likes: 354 |
I'm going to have to profile against this, because I'm pretty sure I win:
In VIDEO_START(n64):
for(i = 0; i < (1 << 24); i++)
{
UINT8 A = (i >> 16) & 0x000000ff;
UINT8 B = (i >> 8) & 0x000000ff;
UINT8 C = i & 0x000000ff;
cc_lut1[i] = (INT16)((((((INT32)A - (INT32)B) * (INT32)C) + 0x80) >> 8) & 0x0000ffff);
}
for(i = 0; i < (1 << 16); i++)
{
for(j = 0; j < (1 << 8); j++)
{
INT32 temp = (INT32)((INT16)i) + j;
if(temp > 255)
{
cc_lut2[(i << 8) | j] = 255;
}
else if(temp < 0)
{
cc_lut2[(i << 8) | j] = 0;
}
else
{
cc_lut2[(i << 8) | j] = (UINT8)temp;
}
}
}
...
INLINE void COMBINER_EQUATION(UINT8 *out, UINT8 *A, UINT8 *B, UINT8 *C, UINT8 *D)
{
/* The speedy, lookup table enabled version */
*out = cc_lut2[(cc_lut1[(*A << 16) | (*B << 8) | *C] << 8) | *D];
}
Brings COMBINER_EQUATION from 13.98% of total execution time to 9.09%. ETA: Also, I'm not trying to show you guys up or anything, it's just my 'net access went down shortly after I made this post, so I had to come up with it on my own in the meantime.
|
|
|
|
Joined: Mar 2006
Posts: 1,079 Likes: 6
Very Senior Member
|
Very Senior Member
Joined: Mar 2006
Posts: 1,079 Likes: 6 |
Yes that works, but you can optimize the table creation a little by using some stuff from the thread. Not that an 'only run once on startup' thing is a major speed loss.
LN
"When life gives you zombies... *CHA-CHIK!* ...you make zombie-ade!"
|
|
|
|
Joined: May 2009
Posts: 2,208 Likes: 354
Very Senior Member
|
OP
Very Senior Member
Joined: May 2009
Posts: 2,208 Likes: 354 |
Actually, your final suggestion performed worse by nearly 2% total execution time versus the baseline case...
|
|
|
0 members (),
59
guests, and
1
robot. |
Key:
Admin,
Global Mod,
Mod
|
|
Forums9
Topics9,308
Posts121,693
Members5,070
|
Most Online1,283 Dec 21st, 2022
|
|
These forums are sponsored by Superior Solitaire, an ad-free card game collection for macOS and iOS. Download it today!
|
|
|
|