With the help of blargg's step-based multiplication algorithm:
if((WRMPYA >> shift) & 1) RDMPY += WRMPYB << shift;

I have successfully reverse engineered the ALU multiplication process. Details are posted here:
http://board.byuu.org/viewtopic.php?p=12235#p12235

Now I'm just hoping blargg or someone else can come up with the step-based division algorithm, and we can finally emulate this behavior correctly!