It's been a while, so I'll do my best.

Kirby's Dreamland 3 and Dragon Ball Z: Hyper Dementia are the easiest. They don't use any features at all, IIRC. Just clone the S-CPU, add in the MMIO regs to start and stop, and you can have them running in 8 hours. Or at least, that's what it took me. For you guys probably 2 hours :P

Marvelous makes use of standard SA-1 DMA for the sprites, but it'll "run" even without it.

Parodius makes use of IRQs, I believe. And Super Mario RPG is very tough, using IRQs and character conversion mode 2 for level up text.

The two golf games use both character conversion DMA modes. Not even ZSNES or Snes9X support those yet.

SD Gundam G-NEXT uses CC1 DMA, and also has an expansion slot for BS-X cartridges.

Jumpin' Derby uses the NMI override thing that nothing else does, and I think it also uses the variable bit-length feature.

There's about a half-dozen features bsnes emulates that no known game even uses; such as the SA-1-side IRQ timers.

And worst of all, not a damn SA-1 game, out of all ~26 of them or whatever, even uses a fraction of the true power of the SA-1. Almost all of them could easily be made without the chip at all. I strongly suspect most of the time it was simply used as an anti-piracy device.

As always, take a look at my WIP forum. The SA-1 testing phase mentions many bugs I ran into and how to fix them.

Not sure if this test program is meaningful...

It looks fairly primitive, but may be pointing out some useful bugs. If that is a PD ROM, may I ask where you found it? If it's copyrighted, forget I said anything.