Previous Thread
Next Thread
Print Thread
Page 1 of 4 1 2 3 4
Joined: Apr 2013
Posts: 75
Likes: 2
G
geecab Offline OP
Member
OP Offline
Member
G
Joined: Apr 2013
Posts: 75
Likes: 2
Hi All!

There has been lots of interesting discussions about contended memory on this site and unfortunately it has caught my interest.

With MAME in its current state, I ran the Nirvana game engine and the display looked corrupt (as expected because this game engine relies on contended memory in order for the sprites to display correctly). Then I hacked around with the z80.cpp execute_run() function, periodically eating extra tstates every 100 tstates that went by (As that is kind of what happens when you have contended memory). When I ran the Nirvana engine again, I could see a few multi-color effects trying to do-their-thing (but only as the sprites went past a certain area of the screen). After that I was hooked!

Then I decided to have a go at implementing a complete contended memory solution based on the technical information from the World of Spectrum (WOS) website. I focused mainly on 'getting something working' rather than worrying about how my changes would effect other drivers/clones in MAME (To be honest, I didn't think I would get very far so I wasn't thinking ahead). Anyway, as time went on I cleaned up my hack more and more to the point where I now have things working well. I'm not sure if my implementation is a step in the right direction for mame, or just a step in a direction. Either way, its been an interesting learning experience and if you think that with a few/many tweaks it could find its way into mame then that's cool smile

About my implementation:

- The WOS information indicates which opcode groups are associated with which contended memory 'scripts' (For example, "LD A,I" and "LD I,A" both use the "pc:4,pc+1:5" script). In a similar way z80.cpp has tables containing relationships between opcodes and their uncontended cycle delays (cc_op[], cc_cb[] etc.), I added tables containing relationships between opcodes and their script (There are 37 scripts which I defined CM01 to CM37. These are used in tables cm_op_contended[], cm_cb_contended[] etc.).

- For each opcode processed, I keep a history of all the addresses that are read/written to.

- After the opcode has finished, I work out what script the opcode uses, run the script (Which processes all the read/write history), and adjust the tstate counter accordingly.

- Currently (For simplicity), the tstate counter adjustment is always carried out after the opcode has finished processing (I.e. After all the address read/writes for that opcode have been carried out). This could be improved so that the tstate counter is adjusted on each read/write. This would require a bit of a code change but I think it should be possible (Pre-determining the contended memory script prior to the opcode being run, and running parts of the script on each read/write). I might have a go at doing this next.

- I've changed spectrum_UpdateScreenBitmap() so the raster beam's pixel position is determined solely on the tstate counter. This function needs be called more regularly than it currently does (Currently, it is called 'once every 224 cycles' which looks a bit 'unstable'. I found 'once every 16 cycles' to look better but this does have a slight detrimental effect on performance).

- Currently only works with the "spectrum" and "spec128" driver.


Here are some screenshots of it working:
screenshots

My diff is here (Its still a bit hacky at the moment. Please bear in mind its early days):
diff

Just for completeness, I thought I'd mention that the ZX spectrum game engines capable of producing games that avoid colour clash (I've heard these engines described as providing 'Rainbow Graphics', 'Multicolor Graphics' and 'BiColor Graphics') are:-
ZXodus (Released in 2011)
BiFrost (Released in 2012)
Nirvana (Released in 2013)

These engines can be downloaded from various places on the web in '.TAP' format. The engines themselves, after you've loaded the TAP, run a little demo (So you can see the engine working without writing any code). I tried Nirvana and BiFrost out. I also played a game that uses the Nirvana engine called "SnakeShake" which is a pretty cool puzzle game but I am stuck on level 9.

OMG I've written absolutely loads... Well done if you've got this far and not fell asleep!

Thanks for reading!

Ben


Joined: Apr 2013
Posts: 75
Likes: 2
G
geecab Offline OP
Member
OP Offline
Member
G
Joined: Apr 2013
Posts: 75
Likes: 2
Hi again!

For anyone who is interested, I just thought I'd post my progress so far.

Here is my second attempt...
diff 2nd attempt

In this version:
- I've got rid of all my global variables
- The screen is updated using a callback (CPU usage is much lower now as I'm not thrashing the scanline timer)
- I've finished populating all my cc_**_contended[] tables.
- Fixed a bug with my logic when working out the amount of contention delay to apply based on the tstate. To elaborate - There was a bug with my cm_get_ula_sequence_delay() function that decides if the delay should be 6,5,4,3,2,1,0 or 0 based on tstate. Basically, even line numbers would have the correct sequence, odd line numbers wouldn't. I'm actually quite surprised that even with this bug, the visual results were still ok.
- Implemented a more accurate contention delay table. Whilst scratching my head looking into my cm_get_ula_sequence_delay bug, I discovered that the SinclairFAQWiki page appears to have an improved version of the contended memory script table (That takes into account the IR register for certain opcodes) compared to the table on the WorldOfSpectrum site.

I've now run the same game Nirvana game (SnakeEscape) on mame along side emuzwin and can confirm that cycle for cycle, for all operations done in a complete frame, both emulators synchronize perfectly for 48K and 128k models (So I'm happy that my contended memory tables are correct).

This is however still a small problem (So I will be making a third attempt at this). The problem is that when running the Nirvana engine the first 8 to 16 pixels of every line look corrupt. I'll try to explain why with an example. Running SnakeEscape using the spec128 driver, the first screen pixel on the 8th row happens at tstate 16185 (Calculated by 14631+(228*8)). The first screen pixel on the 9th row happens at tstate 16413 (Calculated by 14631+(228*9)). The colour attribute at both these times should be 0x46 (which is Yellow brush, White Paper). My debugging below shows that opcode 0x31 is being processed between tstates 16181 and 16191 (I.e. around the time the raster starts displaying the pixels for row 8). Once the opcode has finished, the screen updates to the new tstate, and the attribute data 0x46 is used for the first 8 pixels on row 8 (All good so far)...

execute_run - TState=16181, LastOpcodeTstates=11 PC=DDB4
MEM TState=16181 ULA=0 - Addr=DDB4 Val=31 (C:0, N:4)
MEM TState=16185 ULA=6 - Addr=DDB5 Val=3E (C:0, N:3)
MEM TState=16188 ULA=3 - Addr=DDB6 Val=58 (C:0, N:3)
spectrum_UpdateScreenBitmap - y=8 x=0 attr=46 (ink=e pap=8)
spectrum_UpdateScreenBitmap - y=8 x=8 attr=7 (ink=7 pap=0)
execute_run - TState=16191, LastOpcodeTstates=10 PC=DDB7

Now the problem is when we come to the next row of pixels down. Opcode 0x22 is being processed between tstates 16405 and 16430 (I.e. around the time the raster starts displaying the pixels for row 9). Here, Opcode 0x22 ("LD (5820H),7BH") is trying to change the attribute at the start of row 9 from 0x46 to 0x7B. On a real spec128, the attribute 0x46 gets displayed because, at tstate 16413, the opcode hasn't had time to complete its job of writing 0x7B to address 5820H. On mame currently, because I update the screen at the end of the operation, 0x7B get displayed. So even though operation timing maybe perfect, it is not perfectly synchronized to the raster...
.
execute_run - TState=16405, LastOpcodeTstates=7 PC=DDE5
MEM TState=16405 ULA=0 - Addr=DDE5 Val=22 (C:0, N:4)
MEM TState=16409 ULA=0 - Addr=DDE6 Val=20 (C:0, N:3)
MEM TState=16412 ULA=0 - Addr=DDE7 Val=58 (C:0, N:3)
MEM TState=16415 ULA=4 - Addr=5820 Val=7B (C:4, N:3)
MEM TState=16422 ULA=5 - Addr=5821 Val=07 (C:5, N:3)
spectrum_UpdateScreenBitmap - y=9 x=0 attr=7b (ink=b pap=f)
spectrum_UpdateScreenBitmap - y=9 x=8 attr=7 (ink=7 pap=0)
spectrum_UpdateScreenBitmap - y=9 x=16 attr=7 (ink=7 pap=0)
spectrum_UpdateScreenBitmap - y=9 x=24 attr=7 (ink=7 pap=0)
spectrum_UpdateScreenBitmap - y=9 x=32 attr=7 (ink=7 pap=0)
execute_run - TState=16430, LastOpcodeTstates=25 PC=DDE8

The only way round this is to process the contended memory script (and update the screen display) whilst the opcode is happening, rather than after the opcode has finished. So its onto my third attempt then!

OMG I've written loads again! Thanks for reading and hope its been of interest!

BTW. I can now get past level 9 of SnakeEscape smile


Joined: Apr 2013
Posts: 75
Likes: 2
G
geecab Offline OP
Member
OP Offline
Member
G
Joined: Apr 2013
Posts: 75
Likes: 2
Hi again!

Here is my third attempt (And probably the last as I think everything is finished):
diff 3rd attempt

In this version:
  • Implemented 2 contended memory script tables to support all major ZX Spectrum models (The 'Sinclair' script table for 48K/128K/+2. The 'Amstrad' script table for +2A/+3).
  • Each contended memory script is run during the opcode's execution, rather than after the opcode has finished.
  • Added a macro (MCFG_Z80_CFG_CONTENDED_MEMORY) allowing you to configure how the z80 device should contend memory. If you don't call this macro, then the z80 will not contended memory and will rely on the cc_op[]/cc_ed[] etc.. tables for eating cycles (Thus, will work the old way).
  • The ZX spectrum clones (Timex, ATM, Scorpion, Pentagon - and anything else that inherits the ZX Spectrum machine configuration) all work as they did before (I.e. without contended memory and without using the raster callback).
  • Floating bus support now working.
  • Reduced the amount of times the screen/border bitmaps are updated. Rather than render to the new raster beam position every time tstates are eaten. I realised that it was only necessary to render to the new raster beam position:
    • Just before memory is written to that could effect screen colour attributes
    • Just before memory is written to that could effect border color
    • Always at the end of each scanline (To fix flickering/missing graphics issues that the SCANLINE timer previously fixed (Discussed in the 'firefly' thread)).
  • Tidy up of all my code.


Regarding CPU usage. According to top, without my contented memory changes Amaurote runs at 22% CPU. With my contented memory changes it runs at 26%.

In terms of games/demos, this now means the following:
  • Nirvana (SnakeEscape/DreamWalker/Elstompo/MultiDude/Stormfinch/Sunbucket) MultiColour games look perfect on 48K and 128K Spectrums.
  • Shock Megademo looks perfect on 48K and 128K Spectrums. Note - You must use the '.TAP' file from WOS, don't use (like I did) the '.Z80' dump floating around on the net as it does not display correctly on 48K or 128K spectrums (The dodgy Z80 dump doesn't look right on Emuzwin or Fuse either).
  • Aquaplane - Horizon/Border position is correct (Contended Memory)
  • Darkstar - Hi-Score table border pattern position is correct (Contended Memory)
  • Sidewize - Now fully playable at the correct speed. It used to freeze just as you started the game (Floating Bus).
  • Zynaps - MultiColour text is correct. Speed is now perfect. Previously game play would speed up and slow down (Floating Bus).
  • Arkanoid - Original release, now fully playable. It used to freeze just as you started the game (Floating Bus).
  • Cobra - Original release, now fully playable at the correct speed (Floating Bus).
  • Short Circuit - Original release, Number 5 no longer flickers (Floating Bus).
  • Uridium - MultiColour text is correct.


Here is a selection of screen shots of the games/demos that I've tested:
screenshots

Thinking about whether or not I'd like to see this committed to mame and I'm really not sure. I've tried hard not to break anything and make modifications that fit in with mame 'as is'. I've done this but at the expense of making the Z80 CPU code more complex which has had a slight detrimental effect on MAME's CPU performance. Can the cost of my modifications be forgiven given based on the titles that now work? - In the end its all down to you mamedev experts to decide. Either way, I totally understand if don't decide to go with it, so no pressure smile

As I said in an earlier post, making/seeing this stuff work is what it has been about for me, take from it what you will and hope it helps smile

Joined: May 2004
Posts: 1,772
Likes: 34
H
Very Senior Member
Offline
Very Senior Member
H
Joined: May 2004
Posts: 1,772
Likes: 34
just fwiw, this was discussed a bit in the shoutbox

however, from a MAME point of view it's still a bit 'backwards' and relying far too heavily on system specific stuff in a CPU core, rather than improving MAME / the CPU core as a whole

doing it properly will have a much more significant performance impact, so don't worry too much about that

since it will likely require a z80 rewrite, and the ability to slow down the CPU on the fly etc. a proper solution is likely some way off, so personally I'd quite like to see something like this as a temporary measure, but IMHO to stand any chance of being accepted you'll probably need to make some C++ derived 'specz80' CPU type with and use virtual functions + overrides to keep all the speccy specific code as far out of the actual Z80 core as possible.

that said, there's a fair chance the devs just aren't going to accept this kind of solution at all, that would not be my say.

Joined: Mar 2001
Posts: 17,215
Likes: 234
R
Very Senior Member
Online Content
Very Senior Member
R
Joined: Mar 2001
Posts: 17,215
Likes: 234
I'd accept it with the forked Z80, and OG sounded somewhat positive about that too. I guess the question is if it would bother Vas.

Joined: Feb 2004
Posts: 2,597
Likes: 300
Very Senior Member
Offline
Very Senior Member
Joined: Feb 2004
Posts: 2,597
Likes: 300
I guess I'd tolerate a forked Z80 in the short term provided it isn't completely gross. A more granular Z80 is substantially more complicated than some other CPUs.

Joined: Apr 2013
Posts: 75
Likes: 2
G
geecab Offline OP
Member
OP Offline
Member
G
Joined: Apr 2013
Posts: 75
Likes: 2
Hi all & thanks for the responses!

I'm a bit confused by the term 'forking the Z80'. Do you mean create a new repository branch? Or do you mean create a copy of the Z80.cpp (Calling it, say, specz80.cpp, where I'd put the contended memory modifications) and have the zx spectrum driver family use that? Or do you mean (With regards to what Haze mentioned) deriving a specz80_device class from the z80_device class?

Joined: Mar 2001
Posts: 17,215
Likes: 234
R
Very Senior Member
Online Content
Very Senior Member
R
Joined: Mar 2001
Posts: 17,215
Likes: 234
Derive specz80_device from the main Z80 and put the Spectrum mods in there (and of course have the Spectrum driver use it).

Joined: Apr 2013
Posts: 75
Likes: 2
G
geecab Offline OP
Member
OP Offline
Member
G
Joined: Apr 2013
Posts: 75
Likes: 2
Ok cool thanks I understand now smile I'll give it my best shot, back soon!

Joined: Mar 2013
Posts: 344
Likes: 3
I
Senior Member
Offline
Senior Member
I
Joined: Mar 2013
Posts: 344
Likes: 3
Just wondering, would these changes not improve compatibility when applied to systems like the Amstrad CPC?

Page 1 of 4 1 2 3 4

Link Copied to Clipboard
Who's Online Now
1 members (Lord Nightmare), 273 guests, and 1 robot.
Key: Admin, Global Mod, Mod
ShoutChat
Comment Guidelines: Do post respectful and insightful comments. Don't flame, hate, spam.
Forum Statistics
Forums9
Topics9,320
Posts121,929
Members5,074
Most Online1,283
Dec 21st, 2022
Our Sponsor
These forums are sponsored by Superior Solitaire, an ad-free card game collection for macOS and iOS. Download it today!

Superior Solitaire
Forum hosted by www.retrogamesformac.com