With the help of blargg's step-based multiplication algorithm:
if((WRMPYA >> shift) & 1) RDMPY += WRMPYB << shift;

I have successfully reverse engineered the ALU multiplication process. Details are posted here:

Now I'm just hoping blargg or someone else can come up with the step-based division algorithm, and we can finally emulate this behavior correctly!