There's an old strip poker game for the Amiga called Hollywood Poker Pro. The game had an introduction screen featuring the silhouette of a woman dancing to a catchy 80's tune. I wanted to rip that animation, and thought it could be an interesting read to see how it was done.
Just to get an idea of what I'm against, I loaded the game into WinUAE and took a full memory snapshot. Unfortunately, all I could find was the current frame, so it was obvious that the image was rendered in real-time somehow.
I then decided to take a closer look at the game files. Since the diskette image was NDOS (i.e. not directly readable from by AmigaDOS) I had to lift out the files somehow. I saved some time as I found the WHDLoad image of the game, with all files in a readable directory. One file in particular caught my interest, namely
I examined the dance.dat file in a hex editor, but it left me clueless. I rolled up my sleeves and prepared for some debugging to figure this out. I loaded the game loaded in WinUAE with just 512kB RAM, 68000 CPU, OCS/ECS compatibility.
You see, in the good old days almost every program was run from a run loop. This is a loop that calls several subroutines to get things going.To identify this loop, I fired up the WinUAE debugger by hitting SHIFT+F12 several times.
After a few iterations, I started to notice a pattern in the breakpoints. It landed in the $66Exx address space every time. Dissassembling this memory neighbourhood revealed that the run loop started at $66E1C. Here is the run loop, and those of you who know Amiga assembly will recognize the classical
btst #6,$bfe001 (test for left mouse button) at the end:
As you can see there are three JSRs from here, to $66B0C, $67152 and $66EAC. Interestingly, each of these subroutines begin with the instruction NOP ("no operation", opcode $4E71), so by replacing this with RTS ("return to subroutine", opcode $4E75) the function could easily be skipped without too much hassle. Using this technique, I quickly noticed that $66EAC was responsible of the upscroller, $67152 drawing the animation and $66B0C did the frame selection.
Time to debug $67152. You may scroll past this section, I'll try my best to explain it further down.
Okay, let's examine the code and try to figure out what's happening here. The first few lines are simple, the A0, A3 and A4 address registeres are loaded with three different memory addresses. By examining the addresses, I realized that A0 was the destination screen, and A3 and A4 were two lookup tables which I will cover further down.
In the next code you can see that we are loading the next byte from A1, and post-incrementing the address pointer. But what's in A1? As you may remember from the run loop above, A1 was set just before calling thus subroutine. After further inspection, it turned out that A1 is indeed pointing to the datas from
dance.dat, although slightly offset.
The byte from A1 was loaded in to D7, and the code now attempts to clear the contents of (A0) "D7 times", and then one more time without incrementing the address pointer. This is probably an attempt to clear the previous image before applying the next.
The next byte from (A1) is loaded into D0. If it's zero, we'll jump ahead to $6720A, meaning we want to advance to the next scanline. If it's $ff (255) the code will jump to $67216, which is just an RTS - meaning the frame is finished.
This is where it gets interesting - and slightly confusing. First, D0 is copied to D1, and D1 is ANDed by $1F and shifted left by 2. In C, this would be D1=(D1&0x1f)*4;. D1 will later be used as an offset to look up in the table stored in A3, and this bit manipulation was to make sure the value is between 0..31 and multiplied by 4 to make sure it evens out with the 32-bit values in the table.
D0 is shifted left by 3 and then ANDed by $FC, which in C would be D0=(D0*8)&0xfc;. It's shifted left by 3 to multiply by 8, and ANDed by $FC to make sure it's not hitting an odd address - the 68000 doesn't like writing 16- and 32-bit values to odd addresses.
Now it's time to check if D2 is equal to D0. If it is, it means we're ready to draw the 32-bit value from the A4 table to the screen right now. If the carry is clear, we'll jump to $671F6. If neither of these kicked in, we'll draw a full 32-bit value of pixels before looping again.
This is the last part of the code, and it draws graphics from the A4 table.
The tables included in A3 and A4 are actually quite clever:
l_6051e: dc.l %11111111111111111111111111111111 dc.l %01111111111111111111111111111111 dc.l %00111111111111111111111111111111 dc.l %00011111111111111111111111111111 ; etc...
In other words, the above table is used for looking up when to switch on a pixel. The l_605a2 table was done in similar fashion:
l_605a2: dc.l %00000000000000000000000000000000 dc.l %10000000000000000000000000000000 dc.l %11000000000000000000000000000000 dc.l %11100000000000000000000000000000 ; etc...
Using this technique, one can "compress" a single line of (simple) graphics down to just two bytes - a start byte and a stop byte.
So it all makes sense. I entered the code on my A600 and included the dance.dat file to my project. After correcting a few typos and inserting a few skipped lines, stuff started happening on my screen! We have a rendered frame. The next part was looking through the $66B0C subroutine to find the offset table, which was located at $3a000 if you ever need to find it. Turns out there are a total of 106 available frames in the animation - not half bad!
The original code is very slow, and leaves very little wiggle room for other routines. It spends anywhere between 60-100% CPU each frame rendering the graphics. I did some optimizations, such as using a few faster instructions, and double-buffering where the blitter clears one frame while the CPU renders in the next. I was able to get it down to 40-70% CPU which is good enough. I have plans to use this animation in an Amiga demo I will release later. I will also rip the music, but I'll save that for later...
If you made it this far and understood all my ramblings, my hat's off to you.