Uxn32 CPU Emulator Primer
Here’s an explanation and some notes about the Uxn CPU implementation in the Uxn32 project. It’ll serve both as notes to myself, and to anyone else interested in learning how it works. For their own project, or for curiosity.
The two source code files from the Uxn32 project we’re interested in are uxn_core.h
and uxn_core.c
.
uxn_core.h
defines a couple of struct
s, UxnStack
and UxnCore
, and one function, unsigned int UxnExec(UxnCore *u, unsigned int limit)
.
UxnStack and UxnCore
typedef struct UxnStack {
UxnU8 num, mem[255];
} UxnStack;
num
is the number of values on the stack. mem
is the array to hold the values.
typedef struct UxnCore {
UxnU8 *ram;
UxnStack *wst, *rst;
UxnU8 (*dei)(struct UxnCore *u,
unsigned int address);
void (*deo)(struct UxnCore *u,
unsigned int address,
unsigned int value);
UxnU16 pc, fault;
} UxnCore;
UxnCore
is the bundle of input parameters that UxnExec
needs to emulate the Uxn CPU. After UxnExec
returns, the state of the UxnCore
will have been modified, and it can be reused for subsequent calls to UxnExec
. There’s no hidden state or special care that needs to be taken to keep things proper across calls, except that all of the pointers need to be valid, and the memory pointed at by ram
has a minimum size requirement.
ram
should be a pointer to0x10001
count bytes.1wst
andrst
are pointers toUxnStack
s, which can be stored anywhere the caller wants, including aliased or in theram
memory region.dei
anddeo
are the device input and device output callbacks, which the caller ofUxnExec
uses to implement whatever IO devices are connected to the Uxn virtual machine.- My own personal preference is to avoid inverting control when feasible. In this case, I think it makes sense to use callbacks. The VM execution loop is pretty tight, and returning in and out of it repeatedly to handle virtual device IO is more annoying than using a pair of callbacks.
pc
is the program counter, or instruction pointer. It’s the location in Uxn RAM of the next instruction to be executed. Uxn RAM size covers the full 16-bit numeric range that this can be set to.fault
is a numeric code indicating the reasonUxnExec()
returned.
There are a few predefined fault codes:
#define UXN_FAULT_DONE 1
#define UXN_FAULT_STACK_UNDERFLOW 2
#define UXN_FAULT_STACK_OVERFLOW 3
#define UXN_FAULT_DIVIDE_BY_ZERO 4
The numbers
2
and higher mean some kind of error occurred in the virtual machine:2
: The stack needed to be popped, but it was empty.3
: The stack needed to be pushed, but it was already full.4
: Division by zero.
1
means execution ended normally, with theBRK
instruction being executed by the CPU.0
means the CPU still has more work to do, butUxnExec()
returned from execution anyway because the execution limit was reached.- If this happens, it means you need to call
UxnExec()
again to keep going with execution. But, you’ll probably want to take a break and handle other stuff in your Uxn emulator, like dispatching input events or sound output.
- If this happens, it means you need to call
Actually, there is one thing you need to do between calls to UxnExec()
: if fault
was set to something other than 0, set fault
back to 0.
Within the implementation of your deo
function, if you want to generate a fault from your virtual device, you can set the fault
field of the UxnCore
to any non-zero value, including custom ones. After the callback returns, UxnExec()
will break out of its execution loop the same as any of the fault codes listed above. This is useful for implementing things like debugging devices or handling virtual hardware faults.
UxnExec() declaration
Here’s the only function:
unsigned int UxnExec(UxnCore *u, unsigned int limit);
The parameter UxnCore *u
should make sense already.
unsigned int limit
is the upper limit on the number of Uxn instructions to be executed during the call. This limit exists to stop infinite loops from causing the emulator program to lock up forever. It also lets the emulator program take some breaks to do other work, like handle window events or process audio, in case the Uxn virtual machine is taking a long time to do something.
The Uxn Varvara design is event-based, so it’s expected that Uxn programs will issue a BRK
instruction when they’re done with work. This will return the virtual machine to an idle state, until the next event wakes it up.
The return value is limit - (number of instructions executed)
.
If u->fault
is still 0 after UxnExec()
returns, that means it needs to be called again to keep doing work, because a BRK
instruction hasn’t yet been executed. If u->fault
is 1, that means everything went OK. Anything else means some kind of fault or error occurred.
Implementation: uxn_core.c
At the top of uxn_core.c
:
#define PUSH8(s, x) { if(s->num == 0xFF) goto fault_3; s->mem[s->num++] = (x); } #define PUSH16(s, x) { if((j = s->num) >= 0xFE) goto fault_3; k = (x); s->mem[j] = k >> 8; s->mem[j + 1] = k; s->num = j + 2; } #define PUSH(s, x) { if(bs) PUSH16(s, (x)) else PUSH8(s, (x)) } #define POP8(o) { if(!(j = *snum)) goto fault_2; o = src->mem[--j]; *snum = j; } #define POP16(o) { if((j = *snum) <= 1) goto fault_2; o = src->mem[j - 1]; o += src->mem[j - 2] << 8; *snum = j - 2; } #define POP(o) { if(bs) POP16(o) else POP8(o) } #define POKE(x, y) { if(bs) { u->ram[(x)] = (y) >> 8; u->ram[(x) + 1] = (y); } else u->ram[(x)] = y; } #define PEEK16(o, x) { o = (u->ram[(x)] << 8) + u->ram[(x) + 1]; } #define PEEK(o, x) { if(bs) PEEK16(o, x) else o = u->ram[(x)]; } #define DEVR(o, x) { o = u->dei(u, x); if(bs) o = (o << 8) + u->dei(u, ((x) + 1) & 0xFF); } #define DEVW(x, y) { if(bs) { u->deo(u, (x), (y) >> 8); u->deo(u, ((x) + 1) & 0xFF, (y)); } else u->deo(u, x, (y)); } #define JUMP(x) { if(bs) pc = (x); else pc += (UxnI8)(x); }
These macros define the building blocks that we’ll use to implement the full Uxn instruction set. They’re hard to read. Let’s look at the first one, PUSH8
, in detail.
PUSH8
PUSH8(s, x)
is the 8-bit version of pushing a value to a UxnStack
. The value it pushes is x
, which is probably going to be the name of a variable, and the UxnStack
it pushes it to is s
. s
will usually be the name src
, which will be a pointer to a UxnStack
. But, one of the Uxn instructions we implement will need it to be dst
, so it needs to be a macro parameter.
The definition of PUSH8
is this:
#define PUSH8(s, x) {
if(s->num == 0xFF) goto fault_3;
s->mem[s->num++] = (x);
}
Breaking it down:
if(s->num == 0xFF)
If
s->num
is 255, then there’s not enough room in theUxnStack
for another byte. Interrupt execution and trigger fault code 3, stack overflow.s
is aUxnStack
, andnum
is the number of items in thatUxnStack
. 255 is the maximum.By convention, most code that interacts with Uxn’s CPU uses hexadecimal notation for numbers.
s->mem[s->num++] = (x);
Otherwise, there’s room on this
UxnStack
for another 8-bit value. Store it in the appropriate array location (counting up from 0) and increment the count of items in thatUxnStack
.To avoid ending up with an unexpected expression after macro expansion, C macro authors will habitually put parentheses around macro parameters.
You can probably imagine some problem scenarios with a parameter like
3 + 5 / 2
.It’s not actually needed in this particular instance for our code in this file, but that’s hard to safely recognize.
Some compilers and tools may issue warnings if you don’t do this.
PUSH16
Let’s take a look at PUSH8
’s big sis, PUSH16
:
#define PUSH16(s, x) {
if((j = s->num) >= 0xFE) goto fault_3;
k = (x);
s->mem[j] = k >> 8;
s->mem[j + 1] = k;
s->num = j + 2;
}
if((j = s->num) >= 0xFE) goto fault_3;
If there isn’t enough room for 2 items in this stack, interrupt execution and trigger fault code 3.
And during the comparison, assign
s->num
to the temporary scratch variablej
.I did it with a C syntax trick to save on source code size. Some people don’t like it. I’m evil, and I like it.
We’re going to be using
s->num
multiple times in the next few statements. To avoid reading through a pointer multiple times, we can read it once and assign it to a C stack value. This makes the job easier for the compiler, because it “knows” that writes through other pointers can’t affectj
, but it doesn’t “know” that abouts->num
.We never take the address of
j
(by doing&j
) in the function, and since it’s a C stack variable, that means it can’t be indirectly modified.- The C specification says that the layout of stack variables doesn’t have to be defined by the compiler. Therefore, adjusting the address of a pointer, like
my_pointer + 5
ormy_pointer++
, can’t make a pointer point at a different stack variable.
- The C specification says that the layout of stack variables doesn’t have to be defined by the compiler. Therefore, adjusting the address of a pointer, like
We do this to make it easier for the compiler to generate code that avoids redundant loads and stores from memory, and give it the opportunity to leave values in registers.
It will end up being useless in some cases, but sometimes it won’t. Uxn32 compiles with at least 5 different C compilers right now, and has no architecture-dependent code. Our goal is passable code generation while staying short and portable.
k = (x);
Assign whatever
x
is tok
, a temporary scratch variable.In case
k
is an expresison likea + b
, assigning it to a temporary variable avoids evaluating the expression multiple times.Later on, for readability reasons, we will actually be passing expressions like that to this macro.
s->mem[j] = k >> 8;
Store the high byte of the value in
k
to theUxnStack
.k
isunsigned int
, so the>>
expression stays as anunsigned int
. The target is anunsigned char
. No problems with this conversion.
s->mem[j + 1] = k;
- Store the low byte to the next-next location in the stack.
s->num = j + 2;
- Add 2 to the count of items in the array.
PUSH
Now, let’s look at the plain PUSH
, without an 8
or 16
in its name. All it does is wrap the 8
and 16
versions, and automatically pick which one to use:
#define PUSH(s, x) {
if(bs) PUSH16(s, (x))
else PUSH8(s, (x))
}
If bs
is a true value, then use PUSH16
. Otherwise, if it’s a false value, use PUSH8
. Wait, where is bs
defined? More on that later.
POP8
Let’s look at POP8
, which pops a number off of a UxnStack
and assigns it to something:
#define POP8(o) {
if(!(j = *snum)) goto fault_2;
o = src->mem[--j];
*snum = j;
}
#define POP8(o)
- The
POP
macro trio has a single parameter,o
, which is the variable name the output will be written to. There’s nos
variable for whichUxnStack
to use. That’s because thePOP
macros always use aUxnStack
with the namesrc.
- The
if(!(j = *snum)) goto fault_2;
Read the number of items in the
UxnStack
and store it in a temporary scratch variablej
.If it’s a false value, 0, interrupt execution and trigger fault code 2.
But, wait. If the
POP
s always use aUxnStack
namedsrc
, what’s this*snum
we’re reading from instead ofsrc->num
?*snum
is a pointer to eithersrc->num
or a throwaway variable, depending on which instruction is being executed. More on that later.
o = src->mem[--j];
- Assign the byte at
j - 1
to the variable named by the macro parametero
, and decrementj
by 1.
- Assign the byte at
*snum = j
- Assign the new value of
j
back to*snum
.
- Assign the new value of
You can probably guess how POP16
and POP
work. And, you can probably figure out how the rest of the macros work.
MODE
Let’s move on to the next chunk of macro code: the MODE
macro. Most of the Uxn instructions are modal, with 3 bit flags leading to 8 different modes. The MODE
macro takes a fragment of code and replicates it 8 times, using a different “environment” each time. The input code fragment will take on a different meaning in each environment once it’s expanded by the preprocessor, even though the fragment itself expands the same each time.
#define MODE(op, body)\ case 0x00|0x00|0x00|op: {enum{bs=0}; src = u->wst, dst = u->rst; snum = &src->num; body break;}\ case 0x00|0x00|0x20|op: {enum{bs=1}; src = u->wst, dst = u->rst; snum = &src->num; body break;}\ case 0x00|0x40|0x00|op: {enum{bs=0}; src = u->rst, dst = u->wst; snum = &src->num; body break;}\ case 0x00|0x40|0x20|op: {enum{bs=1}; src = u->rst, dst = u->wst; snum = &src->num; body break;}\ case 0x80|0x00|0x00|op: {enum{bs=0}; src = u->wst, dst = u->rst; knum = src->num, snum = &knum; body break;}\ case 0x80|0x00|0x20|op: {enum{bs=1}; src = u->wst, dst = u->rst; knum = src->num, snum = &knum; body break;}\ case 0x80|0x40|0x00|op: {enum{bs=0}; src = u->rst, dst = u->wst; knum = src->num, snum = &knum; body break;}\ case 0x80|0x40|0x20|op: {enum{bs=1}; src = u->rst, dst = u->wst; knum = src->num, snum = &knum; body break;}
It’s 8 different cases meant to go in a C switch
. Let’s look at the last line from that block:
case 0x80|0x40|0x20|op: {
enum{bs=1};
src = u->rst, dst = u->wst;
knum = src->num, snum = &knum;
body
break;
}
The 0x80|0x40|0x20|op
is a bitwise combination of op
(the lowest 5 bits of the instruction, 0x00
through 0x1F
) with 0x80
, 0x40
, and 0x20
.
Modes op ┌───────┐┌─────────────┐ Bit # │8 7 6││5 4 3 2 1│ └▲──▲──▲┘└─────────────┘ │ │ │ │ │ 0x20 (8 or 16 bit) │ │ │ 0x40 (Working or Return stack) │ 0x80 (Consume or keep)
The 3 mode bits control the behavior of the instruction. The bottom 5 bits are the instruction.
The case we showed above has every mode bit set to 1
instead of 0
by or
ing these 3 numbers together: 0x80|0x40|0x20
.
Which results in the “environment” for the instruction’s body
code being like this:
enum{bs=1};
- Declare an enum value named
bs
to be the value1
. This enum value will stay in scope for the rest of this switch case, and it can’t be modified, making it as easy as possible for a C compiler to eliminate unreachableif(bs)
branches.
- Declare an enum value named
src = u->rst, dst = u->wst;
- The names
src
anddst
are used by thePUSH
,POP
, and other macros defined above. In this line, we setsrc
to be the “return stack,” anddst
to be the “working stack.”
- The names
knum = src->num, snum = &knum;
This line of setup code handles the “keep mode” bit being enabled in this instruction. When the “keep mode” bit is enabled, the instruction won’t consume the input value on the stack like it normally would have.
To implement this in code, we’ll redirect the code that adjusts the number of items in the stack to instead operate on a scratch variable, making it have no effect on the actual
UxnStack
.knum
is a scratch variable, and we’re setting it to the current value ofsrc->num
.snum
is set to the address of the scratchknum
variable.snum
is the name used within thePOP
macros.
body
- This pastes in the code that defines the instruction.
This line had every mode bit enabled for the instruction. You can probably figure out how the other lines work – they’re just different combinations of bits being enabled and disabled.
UxnExec() implementation
unsigned int
UxnExec(UxnCore *u, unsigned int limit)
{
unsigned int a, b, c, j, k; UxnU16 pc = u->pc;
UxnU8 knum, *snum; UxnStack *src, *dst;
while(limit) {
limit--;
switch(u->ram[pc++]) {
Here’s the setup for the execution loop. There’s nothing tricky going on here. We create a local C stack copy of the pc
from the UxnCore
so that we can use it without having to read and write through the u
pointer over and over, which will probably generate smaller and faster code.
The switch
picks the behavior for the instruction being executed. Uxn instructions are 8 bits, so there are 256 cases to handle.
Special Instructions
Most of the cases will be generated by the MODE()
macro. But, the first 8 cases are special, and we’ll be handling them manually:
/* BRK */ case 0x00: u->fault = 1; goto done;
/* JCI */ case 0x20: snum = &u->wst->num, src = u->wst; POP8(b)
if(b) goto JMI; pc += 2; break;
/* JMI */ case 0x40: JMI: PEEK16(a, pc) pc += a + 2; break;
/* JSI */ case 0x60: PUSH16(u->rst, pc + 2) goto JMI;
/* LIT */ case 0x80: a = u->ram[pc++]; PUSH8(u->wst, a); break;
case 0xA0: PEEK16(a, pc) PUSH16(u->wst, a) pc += 2; break;
case 0xC0: a = u->ram[pc++]; PUSH8(u->rst, a); break;
case 0xE0: PEEK16(a, pc) PUSH16(u->rst, a) pc += 2; break;
The cases where the bottom 5 bits are set to 0 are handled differently than the other cases, because these instructions are special. For example, the LIT
instruction in Uxn doesn’t need a “keep mode” bit, because this instruction can only push stuff to a stack. It only cares about the “return mode” and “8-bit/16-bit mode” bits. The other instructions in this 0x0
group, BRK
, JCI
, JMI
, and JSI
, don’t care about the mode bits at all. This lets them be packed together without wasting space in the instruction set.
LIT
has 4 cases for it, for each combination of the “keep” and “8-bit/16-bit” bits. The other instructions only have 1 case.
Modal Instructions
And here we list the rest of the instructions, one at a time, while using the MODE()
macro:
/* INC */ MODE(0x01, POP(a) PUSH(src, a + 1))
/* POP */ MODE(0x02, POP(a))
/* NIP */ MODE(0x03, POP(a) POP(b) PUSH(src, a))
/* SWP */ MODE(0x04, POP(a) POP(b) PUSH(src, a) PUSH(src, b))
/* ROT */ MODE(0x05, POP(a) POP(b) POP(c) PUSH(src, b)
PUSH(src, a) PUSH(src, c))
/* DUP */ MODE(0x06, POP(a) PUSH(src, a) PUSH(src, a))
...
The 0x01
, 0x02
, etc. arguments are the first 5 bits of the instruction without the mode flags. The second argument is, perhaps surprisingly, a chunk of C code.
The MODE()
macro then duplicates that code 8 times, applying different combinations of the bit flags for the case, and a different “environment” (block scope) where the chunk of code is pasted into.
Now, you should be able to figure out what each instruction case does.
Wrapping Up
Here’s the bottom of the UxnExec()
function:
}
}
done: u->pc = pc; return limit;
fault_2: u->fault = 2; goto done;
fault_3: u->fault = 3; goto done;
There are goto
labels for setting fault codes 2 and 3 because they occur frequently in the macro code. These faults are going to happen in exceptional cases where execution will probably halt, anyway. Rather than have set the value through the u
pointer duplicated over and over again in the expanded macros, it’s smaller to jump to shared code that will set the fault code, and then jump to the exit code. At least, that’s what I saw when looking at the disassembly.
The done
label, which is also where control will fall through to naturally if the limit
variable reaches 0, sets the C stack’s copy of pc
back into the UxnCore
.
Comments & suggestions ➥ cancel@cancel.fm
There’s an extra byte because, in the Uxn design spec, a 16-bit read or write from the last byte of memory, at address
0xFFFF
, isn’t supposed to wrap the second byte back to0x0000
. Instead, the read or write accesses a secret extra byte of RAM. ↩︎