A multicore solution for a ZX Spectrum Emulator


I made a multicore baremetal ZX Spectrum Emulator app for a dual core ARM microcontroller (Cortex M4 & M0 Cores). All this work is part of my programming training with EduCIAA, an educational version of the (first) "Argentinian Open Industrial Computer (CIAA)" board.
Initially I used a low cost TFT as output device and later VGA screen. As no video hardware support is available, software signals generation was needed for componentless VGA.

I developed a multicore solution, runnnig emulator on M4 core and generating VGA signals with M0 core (Asymmetric Multi Processing).
This technique can be useful for other projects since critical timings involved in VGA generation remains isolated from emulation or any task running on other core.

This project invloved many issues, like componentless COLOR VGA generation (with GPIO DMA), Inter Process Comunication and bus sharing.
In this post you may find descriptions for each topic resolved and link to source code, enjoy.


EduCIAA Board
EduCIAA Board - NXP LPC4337
I used the EduCIAA NXP version, an Argentinan designed microcontroller board, based on NXP LPC4337, a Dual Core (M4 & M0)  32bit microcontroller.
The CIAA Project was born in the year 2013 as a group initiative between argentinian academic and industrial sector, CIAA firmware is available here, with the work of people much more expert than I.

Emulator
I found Aspectrum, an open source C coded Spectrum Emulator for Windows and Linux and adapted it for this project. The CIAA Firmware gives support for the emulator in a bare metal environment.

SPI Display

SPI TFT displays are slower than parallel ones, so obtaining a playable system (with aceptable fps) was a challange. First code tests, using complete screen redrawing for each emulated frame resulted in too low fps, even without color.

The key here was to implement a diferential drawing routine, that only redraw modified pixels.
In order to do this the emulator retain a copy of last frame (Spectrum screen area) and compares with the new frame (resulting after 69888 cycles of Z80 emulation). And then only differences are updated to TFT RAM via SPI comand/data.
The result for non-scroll games is very good and system becomes playable.

ILI9341
I adapted a great ILI9341 library written by Tilen Majerle for STM32F to EduCIAA (only trivial changes was needed, like GPIO pins selection and initialization).

VGA

Componentless VGA
For VGA signals generation I used the remaining M0 core, while emulator runs on M4.
M0 Core trigger a DMA transfer from IPC RAM in order to display a video line and uses timers for syncs generation.
Timmings correspond to 640x480 VGA resolution and image size is fixed, based on ZXSpectrum resolution and DMA frequency.
The code is not optimized, so signals generation and involved timmings are very explicit.


IPC (Inter Porcess Comunication)
A shared memory IPC was implemented: the emulator, running on M4 Core, writes screen data to a RAM area and the VGA routines, running on M0 Core, uses that memory data for each video line drawed. As cores run asynchronously access conflicts appears, and are quite frequent because they are at bus level. The bus used to access the IPC RAM is also used to access many peripherals.
That access conflicts generates delays for IPC RAM reading and are noticeable on the screen (a dot becomes wider).
I reduced conflicts reducing IPC RAM write accesses: using the same diferential drawing routine used for the TFT screen. A more serious solution would be to keep the bus during IPC RAM reading.


DMA
As is know ARM Cortex M aren't realtime oriented (no deterministic). Take a look to screen result for a C loop that turns on and off the green signal. I spected dots, but you can see heterogeneous lines, as pipelining and cache makes system no deterministic.

CPU switching G signal; dots expected
The key for VGA generation was the use of DMA transfers.
DMA VGA generation
Color:
A very simple color scheme was implemented. Each color signal (R,G&B) is driven by an individual GPIO pin, that gives 8 possible colors (similar to  ZX Spectrum without bright info).
But DMA transfers take complete bytes (or words), meaning that only 3 bits from the 8 bits transfers are color data, and the remaining 5 bits are discarded.
As each video line (196 pixels) involves a 196 bytes DMA transfer a (very inefficient) frame buffer is needed, it looks like this:

XXXXXRGB XXXXXRGB ... XXXXXRGB XXXXXRGB
.
.
.
XXXXXRGB XXXXXRGB ... XXXXXRGB XXXXXRGB

As the selected GPIO pins are on the same port, they are accessed with one register (memoy mapped). Using 8bit, Memory to Memory DMA transfers form the (inefficient) buffer (in incremental mode) to the (memory mapped) GPIO register  (in no incremental mode) each video line is drawed.



First ZX Spectrum Monochromatic VGA image obtained




First ZX Spectrum Color image
First Emulator generated video