Ok, so the fundamental idea is simple: Going through a memory space is expensive, so you don't want to go through two if you can avoid it. OTOH, modifying a memory map is expensive too (even if I hope to reduce that one day), so you don't want to change it too often either.
In the case of laying out devices in memory, it's usually done only once or twice (firmware, os) in a session's life, so the best way is, I think, directly modifying the main cpu memory map.
You can see an example of that point of view in the new pci stuff (pci.h/pci.cpp and derivatives). Each pci device publishes five address maps (four that go in the main cpu's memory, one for configuration). The root device (and bridge devices) takes all these maps and install then in the appropriate address space with the correct offset and limits.
See pci_device::map_device, pci_bridge_device::map_device and pci_host_device::regenerate_mapping to see how it can be done.