`DRAW` instruction timing

Question

I asked a question here about CHIP-8 instruction timing, and this answer mentions that DRAW waits for vblank:

For Chip-8 code that draws anything, the limiting factor will be the wait-for-vblank that is built into all of the draw operations

Trying to dig deeper into this, I got
this helpful comment:

@Cactus Yes. Look at memory address $00AC in this disassembly of COSMAC VIP's CHIP-8 interpreter: http://web.archive.org/web/20190819144645/http://laurencescotford.co.uk/wp-content/uploads/2013/08/CHIP-8-Interpreter-Disassembly.pdf (interestingly, CLS does not wait for an interrupt)

What is still unclear to me is whether this means that each individual DRAW instruction will wait for a full frame, or just that they will stall execution until the vertical blanking area is reached?
In other words, suppose I have a sequence of DRAW instructions. Will the first one stall the bytecode interpreter until we reach vblank and then they all will execute quickly, or will the first one take a full 1/60th of a second, then the second one will take another 1/60th of a second, and so on? Does that mean it is impossible to change more than one 8x16 rectangle of the screen per frame?
Edit: hopeful clarification of my question: suppose I have the following sequence of instructions:
DRAW v0 v1 1
DRAW v0 v1 1
...
DRAW v0 v1 1

60 times total. Is this going to take a whole second on the original CHIP-8, or is this going to take until the end of the current frame and then however long it takes to change these 60 bytes?
(If it matters, my angle for this question is that I am working on a book where one of the chapters implements a CHIP-8 machine on an FPGA. So there are at least two goals here:

I'd like to describe the intended behaviour of the DRAW instruction properly
I'd like to implement DRAW such that it is compatible with the existing corpus of CHIP-8 software.)

supercat · Answer

The WAIT instruction will wait until the next interrupt received by the processor, and the CHIP8 implementation that was shown is designed for use with a CDP1861 video timing control chip that generates an interrupt shortly before the beam reaches the area where the visible portion of the display should start.  The interrupt handler needs to save some registers and set up register 0 so that by the time the control chip asserts the DMA-read wire, it is pointing at the first line of video.  This will clock out eight bytes, stalling the CPU for eight cycles out of a fourteen-cycle scan line (most instructions take two cycles).  The six remaining cycles of the line will (enough to run three cycles) typically be used to reload register 0 with the address it had before clocking out those eight bytes, so the line will be displayed a second time.  The next three instructions will reload the register a second time, and the next three will reload it a third time.  The three instructions after that will check whether the display is done and loop if not, allowing register 0 to point to data for the next line.
I'm not a fan of the design of the CDP1861, since the design is hard coded to activate DMA for eight cycles on each of 128 scan lines, and it provides no useful information about where the display is being scanned other than an interrupt which occurs a hard-coded distance before the start of DMA, and a pulse which occurs a hard-coded distance before the first and last line of DMA.  If the DMA start and stop were under CPU control, and the device allowed the CPU to select when interrupts occurred, that would allow programs to trade off screen height for memory usage and CPU time.  Additionally, something like an eight-digit display could avoid having to spend memory on a bitmap to hold the the complete shapes for all eight digits, instead being able to blank the display for a couple scan lines while it prepares an eight-byte buffer with the first byte of each shape, show the buffer twice, blank for two scan lines while loading the buffer with the second byte of each shape, show the buffer twice, etc.  If one were using a five-line font, this would make it possible to produce a six-line by eight-character display using 48 bytes to hold the text characters, rather than having to use 256 bytes (half of some computers' memory!) to hold a 64x32 bitmap.  Such tricks can be managed to some extent even with the 1861 as it exists, but being able to execute only three instructions per scan line instead of seven is a severe limitation.
As it is, however, the design does make it rather simple to determine how long the system will wait, since there can only be one interrupt source (interrupt timing needs to be very precise for the system to work).  Any WAIT instruction will stall the CPU until the beam reaches a spot a few lines above the displayed portion of the frame, at which point the CPU will run the interrupt handler until the beam has reached the bottom of the displayed portion, whereupon execution will resume with the instruction following the WAIT.

`DRAW` instruction timing

One Answer

Add your own answers!

Ask a Question