today I have, somewhat by accident, that optimization flags can have huge impact on performance of at least some drivers. As some of you might remember, I am packaging mame for Fedora. The package uses standard Fedora OPTFLAGS, which currently are:
-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection

I have today compiled git master with --march=native (I am on Ryzen 5 2600) and standard -O3. Here are the results (first run is with packaged 0.214, second with --march=native -O3):
umk3 is almost 15 % faster:
$ mame -window -noautosave -bench 120 umk3
Average speed: 546.00% (119 seconds)
$ ./mame64 -rompath /mnt/openmediavault/emu/mame/roms/  -window -nomaximize -bench 120 umk3
Average speed: 613.19% (119 seconds)

vf2, on the other hand, shows not much difference:
$ mame -window -noautosave -bench 120 vf2
Average speed: 122.41% (119 seconds)
$ ./mame64 -rompath /mnt/openmediavault/emu/mame/roms/  -window -nomaximize -bench 120 vf2
Average speed: 126.37% (119 seconds)

Sadly in my case the speed gain is not there where one needs it most - on this machine vf2 hovers slightly above 90 % ingame. I just wanted to put it out there - if somebody needs to squeeze a couple percent out of a driver to get full speed, recompiling with more aggressive optimization flags might just do the trick.