Running pouët Demos
I recently tested some public domain gameboy demos from pouet.net with binjgb. They’re good to run as tests because they’re pretty fun, and often do weirder stuff than a commercial game would do.
One of my favorites is “Is That a Demo in Your Pocket” by snorpung and nordloef. It’s goes by pretty quick, but it has a bunch of nice effects:
As I went down the list of demos on pouet.net, I got to one called Man Meets Machine - Alternative Party 2009 by jumalauta. I ran it and got this:
H A N G S !
OK, so the first thing to determine is where it is hanging. I’ve recently been working on a debugger UI using dear imgui, so I ran the demo in the debugger and found where it was hanging:
Here’s what it’s doing:
- Read from
- Read from
- If they are not equal, jump back to step 4.
It seems like this is an infinite loop, right? It keeps reading
expecting it to change, but never changes it. Well, it turns out that
is actually the
LY memory-mapped register. It basically is used to tell the
program what horizontal line is currently being drawn, from 0 to 153. So this
loop is actually just waiting for the next line to start.
A bit of an aside about the number 153: the gameboy has a screen resolution of 160 by 144 pixels. So why 153? Lines 144 to 153 are for the vertical blank (vblank) period. On a CRT screen, the vertical blank is the time that it takes the raster gun to go from the bottom-right corner to the top-left corner. The gameboy has an LCD screen, so it doesn’t actually have a vblank, but I guess it was convenient to pretend that there was one because people expected it.
So why would it hang waiting for the next line? You’ve probably already guessed: if the initial read (step 1 above) reads 153, then it will wait for line 154 which never happens.
Let’s take a look at the registers and see if we’re right:
The registers are all listed in hexadecimal, so
E is 0x99, which 153. But
that means that we are waiting for line 153, not line 154. That should be fine,
Well, things are a little more complicated than they seem…
Each line doesn’t take the same amount of time. Lines 1 through 152 take 456 cycles. But lines 0 and 153 are special: line 153 only lasts for 4 cycles before looping back around to line 0 for cycles.
So our loop will only exit if we read from
0xFF44 (step 4) at exactly the
right time, when
LY is 153 for those 4 cycles.
How likely is this to happen? Let’s look at the timing of our loop:
0x80b2: f0 44 LD A, (FF44H) 0x80b4: bb CP E 0x80b5: 20 fb JR NZ, -5
Each instruction on the gameboy CPU takes a multiple of 4 cycles. Each memory read (including the read of the opcode) takes 4 cycles. Some instructions, like calls, jumps, and 16-bit operations, take more.
LD A, (FF44H)
The first instruction takes 12 cycles: 8 to read the instruction (the bytes
0x44) and 4 more to read from the address
The second instruction takes just 4 cycles.
JR NZ, -5
The final instruction takes 16 cycles if the branch is taken, 12 if it isn’t.
Here is the only scenario where we’ll exit the loop. The * marks when we
actually read the
|N + 0||LD A, (FF44H)||152|
|N + 4||LD A, (FF44H)||152|
|N + 8||LD A, (FF44H) *||153|
|N + 12||CP E||0|
|N + 16||JR NZ, -5||0|
|N + 20||JR NZ, -5||0|
|N + 24||JR NZ, -5||0|
|N + 28||JR NZ, -5||0|
We want to know if this will occur eventually. To find this out we can use the
greatest common divisor. Our loop is cycles, and our
frame takes cycles. The greatest common divisor of
70224 and 32 is 16. This means that LY=153 will only ever occur at two of the
cycle counts above:
N + 0 and
N + 16, or
N + 4 and
N + 20, or
N + 8
N + 24 or
N + 12 and
N + 28.
As we can see from the table above, we want LY=153 to occur at
N + 8.
I hope this makes sense, I’m having a hard time explaining it. Anyway, the point is that we only have to loop for two frames before we can know if we’ll ever read 153. Spoiler: we never do.
Testing on Hardware
Who knows, maybe my emulator got it right? We need to test on real hardware to know the truth. Fortunately, I bought a cart writer and a couple of writable cartridges.
And the truth is: my emulator got it wrong.
Now that we know the demo should work on hardware, we need to figure out where the emulator is going wrong. Here are the options:
- Waiting for line 153 is OK, the bug is that we never read it
- We shouldn’t be waiting for line 153, something screwed up before this
It’s hard to know which is right without more info. The debugger has some features to trace interrupts, I/O reads/writes, etc. Turning those and piping to a file, we get:
... >> LCD_STAT interrupt [...H] >> trigger STAT [LY: 142] [cy: 107863268] (0x44 [LY]) = 0x8e (0x44 [LY]) = 0x8e (0x44 [LY]) = 0x8f (0x43 [SCX], 0x1d) (0x42 [SCY], 0x83) >> LCD_STAT interrupt [...H] >> trigger STAT [LY: 143] [cy: 107863724] (0x44 [LY]) = 0x8f (0x44 [LY]) = 0x8f (0x44 [LY]) = 0x90 (0x43 [SCX], 0x1c) (0x42 [SCY], 0x83) >> VBLANK interrupt [frame = 1591] (0x46 [DMA], 0xce) >> LCD_STAT interrupt [...H] (0x44 [LY]) = 0x98 (0x44 [LY]) = 0x98 (0x44 [LY]) = 0x98 (0x44 [LY]) = 0x98 (0x44 [LY]) = 0x98 (0x44 [LY]) = 0x98 (0x44 [LY]) = 0x98 (0x44 [LY]) = 0x98 (0x44 [LY]) = 0x98 (0x44 [LY]) = 0x98 (0x44 [LY]) = 0x98 (0x44 [LY]) = 0x98 (0x44 [LY]) = 0x98 (0x44 [LY]) = 0x98 (0x44 [LY]) = 0x98 (0x44 [LY]) = 0x98 (0x44 [LY]) = 00 (0x44 [LY]) = 00 ...
Before we go further, let me briefly explain interrupts. An interrupt is basically just a signal to the CPU to “interrupt” its normal behavior and do something different. The gameboy has a few different kinds of interrupts, generated by the hardware: VBLANK, STAT, TIMER, SERIAL, and JOYPAD. In our demo, we only care about the VBLANK and STAT interrupts.
The VBLANK interrupt is triggered when a vblank occurs, once every 70224 cycles, which is approximately 16.74ms.
The STAT interrupt can be triggered for a few different reasons, but in this case, I know that it is for a horizontal blank (hblank) interrupt. Just like the vertical blank, CRT monitors also have a horizontal blank period where the raster gun would move from the right side of the screen to the left side, one line down. The gameboy hblank interrupt is triggered when the current line has finished drawing. The hblank period doesn’t last very long. It’s variable, but it’s always less than or equal to 204 cycles.
When an interrupt is triggered, all that really happens is a bit is set in a
memory-mapped I/O register called
IF. The interrupt isn’t handled until the
CPU is ready to do so: for example, an interrupt might be triggered in the
middle of an instruction, but it won’t be handled until the instruction is
The CPU can also disable interrupts via the
DI instruction. In that case,
the interrupt won’t actually be handled until interrupts are enabled again
When an interrupt is finally handled, the CPU will branch to a specific
location in memory (address
0x40 for VBLANK,
0x48 for STAT, etc.) and start
executing code there.
Reading the Log
So coming back to the log above, we have a few different kinds of messages:
trigger STAT [LY: 142] [cy: 107863268]
This message means the graphics hardware has triggered a STAT interrupt at line 142.
cyis the current cycle count.
LCD_STAT interrupt [...H]
This particular message means we’ve started executing the hblank interrupt handler (
His for hblank).
VBLANK interrupt [frame = 1591]
This message means we’ve started executing the vblank interrupt handler, and this is the 1591st frame rendered.
(0x44 [LY]) = 0x8f
This message means the we read the
LYregister and got the value
This output shows exactly when everything goes wrong. Notice at the end that we
0x98 (152) from
LY, then wrap around to reading 0. But what happens
before that? Something strange happens right here:
>> VBLANK interrupt [frame = 1591] (0x46 [DMA], 0xce) >> LCD_STAT interrupt [...H]
The hblank interrupt only fires for lines 0 through 143. After that, we’re in the vblank period and no hblanks interrupts should occur. So why do we see the hblank interrupt handler executing after the vblank interrupt handler?
Handling Multiple Interrupts
Notice that the hblank interrupt is triggered before the vblank interrupt, but it isn’t handled until after. We can see something else strange looking at where the hblank interrupt is triggered:
>> LCD_STAT interrupt [...H] >> trigger STAT [LY: 143] [cy: 107863724]
Normally an interrupt is triggered, then is handled. But this is backward – the interrupt is being handled before it is triggered!
When an interrupt is handled, interrupts are disabled automatically, just as if
you executed the
DI instruction. If any other interrupts are triggered at
this point, they won’t be handled until the current interrupt handler
So here’s what actually is happening: for some reason, the hblank interrupt is being triggered while we’re already handling the previous hblank interrupt. This also explains why the last hblank interrupt occurs after the vblank interrupt: it’s the hblank interrupt for line 143!
Where Does It Go Wrong?
This answers some of our questions, but asks some new ones. Why is the hblank interrupt firing while we’re already handling the previous one? If we look back in the logs we find this:
>> trigger STAT [LY: 128] [cy: 107856884] >> LCD_STAT interrupt [...H] (0x44 [LY]) = 0x81 (0x44 [LY]) = 0x81 (0x44 [LY]) = 0x81 (0x44 [LY]) = 0x81 (0x44 [LY]) = 0x81 (0x44 [LY]) = 0x81 (0x44 [LY]) = 0x81 (0x44 [LY]) = 0x81 (0x44 [LY]) = 0x81 (0x44 [LY]) = 0x81 >> trigger STAT [LY: 129] [cy: 107857340] (0x44 [LY]) = 0x81 (0x44 [LY]) = 0x81 (0x44 [LY]) = 0x81 (0x44 [LY]) = 0x81 (0x44 [LY]) = 0x81 (0x44 [LY]) = 0x81 (0x44 [LY]) = 0x81 (0x44 [LY]) = 0x82 (0x43 [SCX], 0x1e) (0x42 [SCY], 0x7f) >> LCD_STAT interrupt [...H] >> trigger STAT [LY: 130] [cy: 107857796]
Aha! This is where we flip from trigger=>handle to handle=>trigger. In
between, we see that we must be in the same loop reading
LY waiting for 82.
But this one takes much longer than it normally does. Usually the loop only
takes 1 or 2 reads before it gets to the next line (aside from when it
To figure out why this happens, we need a bit more information. Here’s the real log output from the emulator:
107856424: :>> trigger STAT [LY: 127] [cy: 107856428] 107856428: :>> LCD_STAT interrupt [...H] 107856608: :(0x44 [LY]) = 0x7f 107856628: :(0x44 [LY]) = 0x80 ... 107856880: :>> trigger STAT [LY: 128] [cy: 107856884] 107856904: :>> LCD_STAT interrupt [...H] 107857084: :(0x44 [LY]) = 0x81 107857104: :(0x44 [LY]) = 0x81 107857132: :(0x44 [LY]) = 0x81 ...
The number on the right is the current cycle count. The first hblank interrupt in this example is handled very quickly, cycles after the interrupt is triggered.
The second hblank interrupt is handled cycles after it was triggered. Those extra 20 cycles mean that we miss the end of the previous line. Remember that our loop takes 32 cycles, so missing by 20 cycles is quite a lot.
OK, so now we know why the hblank interrupt is firing after the hblank
interrupt, and we know why the hblank interrupt is triggered after the previous
hblank interrupt is handled. But why does the interrupt handler take 4 cycles
some times and 24 cycles others? It all depends on what instruction is being
executed when the interrupt is triggered. As we learned above, some
instructions are fast (like
CP E above, 4 cycles) and some instructions are
CALL, 24 cycles). The interrupt is only handled when the
instruction has finished, so a long running instruction can delay the interrupt
While writing this, I had an interesting thought: what if I make the interrupt handler check for interrupts earlier than that? That is, what if we do something like this:
- Read initial byte of opcode
- If there are pending interrupts:
- Stop executing that instruction
- Jump to the interrupt handler and start executing the new instruction
- Execute the rest of the instruction
This is basically the same as triggering all interrupts 4 cycles earlier. Interestingly, when I run this:
W O R K S !
But it is the right fix? Well… that’s the tricky part. It fixes this hang, and some other tests, but breaks some other tests. It also doesn’t match the specified interrupt behavior of a Z80 CPU (the gameboy CPU is not a Z80, but is pretty close).
I started writing this blog post last week, expecting to start and finish it in a couple of days. Since I discovered this fix, I’ve been trying to figure out how to keep it working, and all the other tests, so I could have a really awesome finish to this blog post! But no such luck. I guess this is a more accurate ending, instead – things are a little better here, a little worse there; the never-ending struggle of accuracy-driven emulator development.