AMP with Raspberry Pi: 6502 emulation

This is an example of "Asymmetric Multi Processing (AMP) with Raspberry Pi"
Previous steps involved are datiled in the series of articles
"AMP with Raspberry Pi: Cookbook".

Based on previous work with 6502 emulation an AMP example app was developed:
AMP on Linux GUI (Raspbian)
A remote process (bare-metal) runs a 6502 emulator (fake6502) with a Basic Interpreter (EhBASIC), using two (shared) memory locations for data exchange, one as (keyboard) input for basic interpreter and one for (monitor) output.
One local process (Linux)  send keystrokes to a shared memory location (remote keyboard)
Second local process (Linux)  show Ascii data comming to the other shared memory location (remote monitor)
The process run asyncronously and the IPC (inter process comunication) is imperfect, as no signaling was implemented.



In order to get the example working you need:
AMP framework: (the same as previous posts) Linux on Cores 0,1,2 using lower RAM and Bare Metal on Core 3 using upper RAM (above 0x20000000). See Step 1 for details.

Get example files from git:
git clone https://github.com/telmomoya/AMP

Enough Linux privileges

In a Linux terminal do
cd /bare-metal
./start-metal.sh
cd ..
./monitor6510-char 0x20002ed1

This will start the bare-metal process and the Linux process for monitor basic output.

Keep that terminal open and in another one type:
./keyboard6510-char 0x200058d4

Test the EhBASIC interpeter (answer only "enter" for Memory size)

You can stop emulator with
./stop-metal.sh

And restart it ( again with  ./start-metal.sh) so you have LCM.
If you start ehbasic with Warm option you can list your previous "sesion" basic program, as it remained in memory.

Using bare-metal app via ssh


Under the hood

In bare-metal folder you will find 3 scripts:

start-metal.sh
This script can be used for load the img file (up-metal-6510.img) at 0x2000000 (upper 512Mb) and point Core3 Mailbox3 to that address to start it's execution.

stop-metal.sh
This script stops bare-metal execution

build-up-metal.sh
Used for bare-metal compilation & linking (uses rpi.x linker script)

In root folder you have two files for local (Linux) run.

keyboard6510-char
Waits one text line and sends characters to remote process (bare-metal)

monitor6510-char
Looks to mailbox variable physical address and print if value changes (no signaling implemented, must be improved)


Life control managment
When Linux boots puts all unused cores in a loop, looking for their mailbox 3. When that mailbox is no 0 the core jumps to the address contained there.
The linker script used sets at execution start the code contained in the startup assembler file armc-08-start.S (file listing and comments in this post).
That file prepare the environment (stack and variables initialization) and junps to C code kernel_main function, located at 6510.c in this case.
When kernel_main returns to assembler startup (armc-08-start.S) encounters a loop, similar to Linux one, looking for mailbox3.
So, the restart procedure is identical to initial start: write address execution in mailbox 3.

Now let's see how the stop process was implemented. Take a look to kernel_main function, included at 6510.c

volatile char live=0x1; //LCM flag
volatile char mailbox=0x20; //Emulator OUT (Monitor)
volatile char mailbox2=0x0; //Emulator IN(Keyboard)

#include <stdint.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include "6502/6502.h"
#include "io/gpios.h"


void kernel_main( unsigned int r0, unsigned int r1, unsigned int atags )
{
reset6502();
while(live){
step6502();
}
}

Please note: live and mailbox definitions as volatile, in order to avoid compiler optimizations, ensuring variables in memory, accesible for Linux process.

You can see that kernel_main execution depends on "live" value. The idea here is make live=0 from Linux to stop execution (really return Core 3 to the loop looking to mailbox 3 in armc-08-start.S).
In order to determine the physical address for "live" variable we can  lists symbols from object file:

nm up-metal-6510.elf | grep live

In that way the provided script stop-metal.sh does:

lcm_control=0x$(nm up-metal-6510.elf | grep 'D live' | awk '{print $1}')
./devmem $lcm_control b 0x00

First obtain physical address for "live" variable and the writes 0 to it.


IPC
Inter Process Comunication is implemented using shared memory, specifically locations used by mailbox and mailbox2 variables (volatile). Lets list mailbox symbols:

nm up-metal-6510.elf | grep mailbox
20002ed1 D mailbox

200058d4 B mailbox2

On linux you can call
./monitor6510-char 0x20002ed1

that prints on terminal any change to that physicall address, that is mailbox bare-metal variable.

And in another terminal 
./keyboard6510-char 0x200058d4

Sends to mailbox2 the typed chars.

-------------------------------------------------------------
If you want to compile provided sourcefiles do:

Linux keyboard and monitor apps:
cd linux
gcc -o monitor6510-char monitor6510-char.c
gcc -o keyboard6510-char keyboard6510-char.c

Bare-metal emulator:
cd bare-metal
build-up-metal.sh

A multicore solution for a ZX Spectrum Emulator


I made a multicore baremetal ZX Spectrum Emulator app for a dual core ARM microcontroller (Cortex M4 & M0 Cores). All this work is part of my programming training with EduCIAA, an educational version of the (first) "Argentinian Open Industrial Computer (CIAA)" board.
Initially I used a low cost TFT as output device and later VGA screen. As no video hardware support is available, software signals generation was needed for componentless VGA.

I developed a multicore solution, runnnig emulator on M4 core and generating VGA signals with M0 core (Asymmetric Multi Processing).
This technique can be useful for other projects since critical timings involved in VGA generation remains isolated from emulation or any task running on other core.

This project invloved many issues, like componentless COLOR VGA generation (with GPIO DMA), Inter Process Comunication and bus sharing.
In this post you may find descriptions for each topic resolved and link to source code, enjoy.


EduCIAA Board
EduCIAA Board - NXP LPC4337
I used the EduCIAA NXP version, an Argentinan designed microcontroller board, based on NXP LPC4337, a Dual Core (M4 & M0)  32bit microcontroller.
The CIAA Project was born in the year 2013 as a group initiative between argentinian academic and industrial sector, CIAA firmware is available here, with the work of people much more expert than I.

Emulator
I found Aspectrum, an open source C coded Spectrum Emulator for Windows and Linux and adapted it for this project. The CIAA Firmware gives support for the emulator in a bare metal environment.

SPI Display

SPI TFT displays are slower than parallel ones, so obtaining a playable system (with aceptable fps) was a challange. First code tests, using complete screen redrawing for each emulated frame resulted in too low fps, even without color.

The key here was to implement a diferential drawing routine, that only redraw modified pixels.
In order to do this the emulator retain a copy of last frame (Spectrum screen area) and compares with the new frame (resulting after 69888 cycles of Z80 emulation). And then only differences are updated to TFT RAM via SPI comand/data.
The result for non-scroll games is very good and system becomes playable.

ILI9341
I adapted a great ILI9341 library written by Tilen Majerle for STM32F to EduCIAA (only trivial changes was needed, like GPIO pins selection and initialization).

VGA

Componentless VGA
For VGA signals generation I used the remaining M0 core, while emulator runs on M4.
M0 Core trigger a DMA transfer from IPC RAM in order to display a video line and uses timers for syncs generation.
Timmings correspond to 640x480 VGA resolution and image size is fixed, based on ZXSpectrum resolution and DMA frequency.
The code is not optimized, so signals generation and involved timmings are very explicit.


IPC (Inter Porcess Comunication)
A shared memory IPC was implemented: the emulator, running on M4 Core, writes screen data to a RAM area and the VGA routines, running on M0 Core, uses that memory data for each video line drawed. As cores run asynchronously access conflicts appears, and are quite frequent because they are at bus level. The bus used to access the IPC RAM is also used to access many peripherals.
That access conflicts generates delays for IPC RAM reading and are noticeable on the screen (a dot becomes wider).
I reduced conflicts reducing IPC RAM write accesses: using the same diferential drawing routine used for the TFT screen. A more serious solution would be to keep the bus during IPC RAM reading.


DMA
As is know ARM Cortex M aren't realtime oriented (no deterministic). Take a look to screen result for a C loop that turns on and off the green signal. I spected dots, but you can see heterogeneous lines, as pipelining and cache makes system no deterministic.

CPU switching G signal; dots expected
The key for VGA generation was the use of DMA transfers.
DMA VGA generation
Color:
A very simple color scheme was implemented. Each color signal (R,G&B) is driven by an individual GPIO pin, that gives 8 possible colors (similar to  ZX Spectrum without bright info).
But DMA transfers take complete bytes (or words), meaning that only 3 bits from the 8 bits transfers are color data, and the remaining 5 bits are discarded.
As each video line (196 pixels) involves a 196 bytes DMA transfer a (very inefficient) frame buffer is needed, it looks like this:

XXXXXRGB XXXXXRGB ... XXXXXRGB XXXXXRGB
.
.
.
XXXXXRGB XXXXXRGB ... XXXXXRGB XXXXXRGB

As the selected GPIO pins are on the same port, they are accessed with one register (memoy mapped). Using 8bit, Memory to Memory DMA transfers form the (inefficient) buffer (in incremental mode) to the (memory mapped) GPIO register  (in no incremental mode) each video line is drawed.



First ZX Spectrum Monochromatic VGA image obtained




First ZX Spectrum Color image
First Emulator generated video