"These two C65s behave differently when executing this routine, which behavior must be considered correct?"
You find out why, and provide emulation of both models, assuming one isn't just faulty due to HW failure.
This is far from specific to the C65, different ZX Spectrum models (even ones that look the same externally) have slightly different behavior and there are some games that are incompatible with certain board revisions. The professional emulators emulate every timing detail / side-effect of every official model at a low level.
"This C65 does something that does not make sense, and the manual says it should do something else: should the emulator reproduce the former behavior for hardware accuracy, or the latter for logical consistency?"
You do what the hardware does, what the manual says is irrelevant.
"This part is described in the manual but is missing in the actual machine: in order to achieve maximum accuracy, should I emulate it or not?"
If it's not there you don't emulate it, if it's optional, you make it an option.
"The manual describes a feature, but does not say anything about how it's accessed on the actual machine: should I avoid to implement it and break compatibility with the actual machine, or code it the way I guess it's used and break compatibility with the actual machine?"
You run tests on the hardware to figure out how it works, or figure out how it works based on software that needs it. We do this all the time, for hundreds of machines.
In short, it can be done that way, but it's just not worth the hassle, which is why I did not do it that way.
This is your opinion, trying to HLE everything is actually more hassle in the long run and will only lead to people writing bad code for the system that doesn't stand a chance of running on any real machine.