Uxn32 CPU Emulator Primer

2023-01-23

Here’s an explanation and some notes about the Uxn CPU implementation in the Uxn32 project. It’ll serve both as notes to myself, and to anyone else interested in learning how it works. For their own project, or for curiosity.

The two source code files from the Uxn32 project we’re interested in are uxn_core.h and uxn_core.c.

uxn_core.h defines a couple of structs, UxnStack and UxnCore, and one function, unsigned int UxnExec(UxnCore *u, unsigned int limit).

UxnStack and UxnCore

typedef struct UxnStack {
	UxnU8 num, mem[255];
} UxnStack;

num is the number of values on the stack. mem is the array to hold the values.

typedef struct UxnCore {
	UxnU8 *ram;
	UxnStack *wst, *rst;
	UxnU8 (*dei)(struct UxnCore *u,
	             unsigned int address);
	void  (*deo)(struct UxnCore *u,
	             unsigned int address,
				 unsigned int value);
	UxnU16 pc, fault;
} UxnCore;

UxnCore is the bundle of input parameters that UxnExec needs to emulate the Uxn CPU. After UxnExec returns, the state of the UxnCore will have been modified, and it can be reused for subsequent calls to UxnExec. There’s no hidden state or special care that needs to be taken to keep things proper across calls, except that all of the pointers need to be valid, and the memory pointed at by ram has a minimum size requirement.

ram should be a pointer to 0x10001 count bytes.¹
wst and rst are pointers to UxnStacks, which can be stored anywhere the caller wants, including aliased or in the ram memory region.
dei and deo are the device input and device output callbacks, which the caller of UxnExec uses to implement whatever IO devices are connected to the Uxn virtual machine.
- My own personal preference is to avoid inverting control when feasible. In this case, I think it makes sense to use callbacks. The VM execution loop is pretty tight, and returning in and out of it repeatedly to handle virtual device IO is more annoying than using a pair of callbacks.
pc is the program counter, or instruction pointer. It’s the location in Uxn RAM of the next instruction to be executed. Uxn RAM size covers the full 16-bit numeric range that this can be set to.
fault is a numeric code indicating the reason UxnExec() returned.

There are a few predefined fault codes:

#define UXN_FAULT_DONE 1
#define UXN_FAULT_STACK_UNDERFLOW 2
#define UXN_FAULT_STACK_OVERFLOW 3
#define UXN_FAULT_DIVIDE_BY_ZERO 4

The numbers 2 and higher mean some kind of error occurred in the virtual machine:
- 2: The stack needed to be popped, but it was empty.
- 3: The stack needed to be pushed, but it was already full.
- 4: Division by zero.
1 means execution ended normally, with the BRK instruction being executed by the CPU.
0 means the CPU still has more work to do, but UxnExec() returned from execution anyway because the execution limit was reached.
- If this happens, it means you need to call UxnExec() again to keep going with execution. But, you’ll probably want to take a break and handle other stuff in your Uxn emulator, like dispatching input events or sound output.

Actually, there is one thing you need to do between calls to UxnExec(): if fault was set to something other than 0, set fault back to 0.

Within the implementation of your deo function, if you want to generate a fault from your virtual device, you can set the fault field of the UxnCore to any non-zero value, including custom ones. After the callback returns, UxnExec() will break out of its execution loop the same as any of the fault codes listed above. This is useful for implementing things like debugging devices or handling virtual hardware faults.

UxnExec() declaration

Here’s the only function:

unsigned int UxnExec(UxnCore *u, unsigned int limit);

The parameter UxnCore *u should make sense already.

unsigned int limit is the upper limit on the number of Uxn instructions to be executed during the call. This limit exists to stop infinite loops from causing the emulator program to lock up forever. It also lets the emulator program take some breaks to do other work, like handle window events or process audio, in case the Uxn virtual machine is taking a long time to do something.

The Uxn Varvara design is event-based, so it’s expected that Uxn programs will issue a BRK instruction when they’re done with work. This will return the virtual machine to an idle state, until the next event wakes it up.

The return value is limit - (number of instructions executed).

If u->fault is still 0 after UxnExec() returns, that means it needs to be called again to keep doing work, because a BRK instruction hasn’t yet been executed. If u->fault is 1, that means everything went OK. Anything else means some kind of fault or error occurred.

Implementation: uxn_core.c

At the top of uxn_core.c:

#define PUSH8(s, x) { if(s->num == 0xFF) goto fault_3; s->mem[s->num++] = (x); }
#define PUSH16(s, x) { if((j = s->num) >= 0xFE) goto fault_3; k = (x); s->mem[j] = k >> 8; s->mem[j + 1] = k; s->num = j + 2; }
#define PUSH(s, x) { if(bs) PUSH16(s, (x)) else PUSH8(s, (x)) }
#define POP8(o) { if(!(j = *snum)) goto fault_2; o = src->mem[--j]; *snum = j; }
#define POP16(o) { if((j = *snum) <= 1) goto fault_2; o = src->mem[j - 1]; o += src->mem[j - 2] << 8; *snum = j - 2; }
#define POP(o) { if(bs) POP16(o) else POP8(o) }
#define POKE(x, y) { if(bs) { u->ram[(x)] = (y) >> 8; u->ram[(x) + 1] = (y); } else u->ram[(x)] = y; }
#define PEEK16(o, x) { o = (u->ram[(x)] << 8) + u->ram[(x) + 1]; }
#define PEEK(o, x) { if(bs) PEEK16(o, x) else o = u->ram[(x)]; }
#define DEVR(o, x) { o = u->dei(u, x); if(bs) o = (o << 8) + u->dei(u, ((x) + 1) & 0xFF); }
#define DEVW(x, y) { if(bs) { u->deo(u, (x), (y) >> 8); u->deo(u, ((x) + 1) & 0xFF, (y)); } else u->deo(u, x, (y)); }
#define JUMP(x) { if(bs) pc = (x); else pc += (UxnI8)(x); }

These macros define the building blocks that we’ll use to implement the full Uxn instruction set. They’re hard to read. Let’s look at the first one, PUSH8, in detail.

PUSH8

PUSH8(s, x) is the 8-bit version of pushing a value to a UxnStack. The value it pushes is x, which is probably going to be the name of a variable, and the UxnStack it pushes it to is s. s will usually be the name src, which will be a pointer to a UxnStack. But, one of the Uxn instructions we implement will need it to be dst, so it needs to be a macro parameter.

The definition of PUSH8 is this:

#define PUSH8(s, x) {
	if(s->num == 0xFF) goto fault_3;
	s->mem[s->num++] = (x);
}

Breaking it down:

if(s->num == 0xFF)
- If s->num is 255, then there’s not enough room in the UxnStack for another byte. Interrupt execution and trigger fault code 3, stack overflow.
- s is a UxnStack, and num is the number of items in that UxnStack. 255 is the maximum.
- By convention, most code that interacts with Uxn’s CPU uses hexadecimal notation for numbers.
s->mem[s->num++] = (x);
- Otherwise, there’s room on this UxnStack for another 8-bit value. Store it in the appropriate array location (counting up from 0) and increment the count of items in that UxnStack.
- To avoid ending up with an unexpected expression after macro expansion, C macro authors will habitually put parentheses around macro parameters.
- You can probably imagine some problem scenarios with a parameter like 3 + 5 / 2.
- It’s not actually needed in this particular instance for our code in this file, but that’s hard to safely recognize.
- Some compilers and tools may issue warnings if you don’t do this.

PUSH16

Let’s take a look at PUSH8’s big sis, PUSH16:

#define PUSH16(s, x) {
	if((j = s->num) >= 0xFE) goto fault_3;
	k = (x);
	s->mem[j] = k >> 8;
	s->mem[j + 1] = k;
	s->num = j + 2;
}

if((j = s->num) >= 0xFE) goto fault_3;
- If there isn’t enough room for 2 items in this stack, interrupt execution and trigger fault code 3.
- And during the comparison, assign s->num to the temporary scratch variable j.
  - I did it with a C syntax trick to save on source code size. Some people don’t like it. I’m evil, and I like it.
  - We’re going to be using s->num multiple times in the next few statements. To avoid reading through a pointer multiple times, we can read it once and assign it to a C stack value. This makes the job easier for the compiler, because it “knows” that writes through other pointers can’t affect j, but it doesn’t “know” that about s->num.
  - We never take the address of j (by doing &j) in the function, and since it’s a C stack variable, that means it can’t be indirectly modified.
    - The C specification says that the layout of stack variables doesn’t have to be defined by the compiler. Therefore, adjusting the address of a pointer, like my_pointer + 5 or my_pointer++, can’t make a pointer point at a different stack variable.
  - We do this to make it easier for the compiler to generate code that avoids redundant loads and stores from memory, and give it the opportunity to leave values in registers.
  - It will end up being useless in some cases, but sometimes it won’t. Uxn32 compiles with at least 5 different C compilers right now, and has no architecture-dependent code. Our goal is passable code generation while staying short and portable.
k = (x);
- Assign whatever x is to k, a temporary scratch variable.
  - In case k is an expresison like a + b, assigning it to a temporary variable avoids evaluating the expression multiple times.
  - Later on, for readability reasons, we will actually be passing expressions like that to this macro.
s->mem[j] = k >> 8;
- Store the high byte of the value in k to the UxnStack.
- k is unsigned int, so the >> expression stays as an unsigned int. The target is an unsigned char. No problems with this conversion.
s->mem[j + 1] = k;
- Store the low byte to the next-next location in the stack.
s->num = j + 2;
- Add 2 to the count of items in the array.

We could have done this, instead:

s->mem[j++] = k >> 8;
s->mem[j++] = k;
s->num = j;

I'm not sure which is better. I think most compilers (maybe not tcc) will generate the same machine code, but I didn't check the disassembly.

PUSH

Now, let’s look at the plain PUSH, without an 8 or 16 in its name. All it does is wrap the 8 and 16 versions, and automatically pick which one to use:

#define PUSH(s, x) {
	if(bs) PUSH16(s, (x))
	else PUSH8(s, (x))
}

If bs is a true value, then use PUSH16. Otherwise, if it’s a false value, use PUSH8. Wait, where is bs defined? More on that later.

POP8

Let’s look at POP8, which pops a number off of a UxnStack and assigns it to something:

#define POP8(o) {
	if(!(j = *snum)) goto fault_2;
	o = src->mem[--j];
	*snum = j;
}

#define POP8(o)
- The POP macro trio has a single parameter, o, which is the variable name the output will be written to. There’s no s variable for which UxnStack to use. That’s because the POP macros always use a UxnStack with the name src.
if(!(j = *snum)) goto fault_2;
- Read the number of items in the UxnStack and store it in a temporary scratch variable j.
- If it’s a false value, 0, interrupt execution and trigger fault code 2.
- But, wait. If the POPs always use a UxnStack named src, what’s this *snum we’re reading from instead of src->num?
- *snum is a pointer to either src->num or a throwaway variable, depending on which instruction is being executed. More on that later.
o = src->mem[--j];
- Assign the byte at j - 1 to the variable named by the macro parameter o, and decrement j by 1.
*snum = j
- Assign the new value of j back to *snum.

You can probably guess how POP16 and POP work. And, you can probably figure out how the rest of the macros work.

MODE

Let’s move on to the next chunk of macro code: the MODE macro. Most of the Uxn instructions are modal, with 3 bit flags leading to 8 different modes. The MODE macro takes a fragment of code and replicates it 8 times, using a different “environment” each time. The input code fragment will take on a different meaning in each environment once it’s expanded by the preprocessor, even though the fragment itself expands the same each time.

#define MODE(op, body)\
	case 0x00|0x00|0x00|op: {enum{bs=0}; src = u->wst, dst = u->rst; snum = &src->num; body break;}\
	case 0x00|0x00|0x20|op: {enum{bs=1}; src = u->wst, dst = u->rst; snum = &src->num; body break;}\
	case 0x00|0x40|0x00|op: {enum{bs=0}; src = u->rst, dst = u->wst; snum = &src->num; body break;}\
	case 0x00|0x40|0x20|op: {enum{bs=1}; src = u->rst, dst = u->wst; snum = &src->num; body break;}\
	case 0x80|0x00|0x00|op: {enum{bs=0}; src = u->wst, dst = u->rst; knum = src->num, snum = &knum; body break;}\
	case 0x80|0x00|0x20|op: {enum{bs=1}; src = u->wst, dst = u->rst; knum = src->num, snum = &knum; body break;}\
	case 0x80|0x40|0x00|op: {enum{bs=0}; src = u->rst, dst = u->wst; knum = src->num, snum = &knum; body break;}\
	case 0x80|0x40|0x20|op: {enum{bs=1}; src = u->rst, dst = u->wst; knum = src->num, snum = &knum; body break;}

It’s 8 different cases meant to go in a C switch. Let’s look at the last line from that block:

case 0x80|0x40|0x20|op: {
	enum{bs=1};
	src = u->rst, dst = u->wst;
	knum = src->num, snum = &knum;
	body
	break;
}

The 0x80|0x40|0x20|op is a bitwise combination of op (the lowest 5 bits of the instruction, 0x00 through 0x1F) with 0x80, 0x40, and 0x20.

        Modes        op
      ┌───────┐┌─────────────┐
Bit # │8  7  6││5  4  3  2  1│
      └▲──▲──▲┘└─────────────┘
       │  │  │
       │  │ 0x20 (8 or 16 bit)
       │  │
       │ 0x40 (Working or Return stack)
       │
      0x80 (Consume or keep)

The 3 mode bits control the behavior of the instruction. The bottom 5 bits are the instruction.

The case we showed above has every mode bit set to 1 instead of 0 by oring these 3 numbers together: 0x80|0x40|0x20.

Which results in the “environment” for the instruction’s body code being like this:

enum{bs=1};
- Declare an enum value named bs to be the value 1. This enum value will stay in scope for the rest of this switch case, and it can’t be modified, making it as easy as possible for a C compiler to eliminate unreachable if(bs) branches.
src = u->rst, dst = u->wst;
- The names src and dst are used by the PUSH, POP, and other macros defined above. In this line, we set src to be the “return stack,” and dst to be the “working stack.”
knum = src->num, snum = &knum;
- This line of setup code handles the “keep mode” bit being enabled in this instruction. When the “keep mode” bit is enabled, the instruction won’t consume the input value on the stack like it normally would have.
- To implement this in code, we’ll redirect the code that adjusts the number of items in the stack to instead operate on a scratch variable, making it have no effect on the actual UxnStack.
- knum is a scratch variable, and we’re setting it to the current value of src->num.
- snum is set to the address of the scratch knum variable. snum is the name used within the POP macros.
body
- This pastes in the code that defines the instruction.

This line had every mode bit enabled for the instruction. You can probably figure out how the other lines work – they’re just different combinations of bits being enabled and disabled.

UxnExec() implementation

unsigned int
UxnExec(UxnCore *u, unsigned int limit)
{
	unsigned int a, b, c, j, k; UxnU16 pc = u->pc;
	UxnU8 knum, *snum; UxnStack *src, *dst;
	while(limit) {
		limit--;
		switch(u->ram[pc++]) {

Here’s the setup for the execution loop. There’s nothing tricky going on here. We create a local C stack copy of the pc from the UxnCore so that we can use it without having to read and write through the u pointer over and over, which will probably generate smaller and faster code.

The switch picks the behavior for the instruction being executed. Uxn instructions are 8 bits, so there are 256 cases to handle.

Special Instructions

Most of the cases will be generated by the MODE() macro. But, the first 8 cases are special, and we’ll be handling them manually:

/* BRK */ case 0x00: u->fault = 1; goto done;
/* JCI */ case 0x20: snum = &u->wst->num, src = u->wst; POP8(b)
                     if(b) goto JMI; pc += 2; break;
/* JMI */ case 0x40: JMI: PEEK16(a, pc) pc += a + 2; break;
/* JSI */ case 0x60: PUSH16(u->rst, pc + 2) goto JMI;
/* LIT */ case 0x80: a = u->ram[pc++]; PUSH8(u->wst, a); break;
          case 0xA0: PEEK16(a, pc) PUSH16(u->wst, a) pc += 2; break;
          case 0xC0: a = u->ram[pc++]; PUSH8(u->rst, a); break;
          case 0xE0: PEEK16(a, pc) PUSH16(u->rst, a) pc += 2; break;

The cases where the bottom 5 bits are set to 0 are handled differently than the other cases, because these instructions are special. For example, the LIT instruction in Uxn doesn’t need a “keep mode” bit, because this instruction can only push stuff to a stack. It only cares about the “return mode” and “8-bit/16-bit mode” bits. The other instructions in this 0x0 group, BRK, JCI, JMI, and JSI, don’t care about the mode bits at all. This lets them be packed together without wasting space in the instruction set.

LIT has 4 cases for it, for each combination of the “keep” and “8-bit/16-bit” bits. The other instructions only have 1 case.

And here we list the rest of the instructions, one at a time, while using the MODE() macro:

/* INC */ MODE(0x01, POP(a) PUSH(src, a + 1))
/* POP */ MODE(0x02, POP(a))
/* NIP */ MODE(0x03, POP(a) POP(b) PUSH(src, a))
/* SWP */ MODE(0x04, POP(a) POP(b) PUSH(src, a) PUSH(src, b))
/* ROT */ MODE(0x05, POP(a) POP(b) POP(c) PUSH(src, b)
                     PUSH(src, a) PUSH(src, c))
/* DUP */ MODE(0x06, POP(a) PUSH(src, a) PUSH(src, a))
...

The 0x01, 0x02, etc. arguments are the first 5 bits of the instruction without the mode flags. The second argument is, perhaps surprisingly, a chunk of C code.

The MODE() macro then duplicates that code 8 times, applying different combinations of the bit flags for the case, and a different “environment” (block scope) where the chunk of code is pasted into.

Now, you should be able to figure out what each instruction case does.

Wrapping Up

Here’s the bottom of the UxnExec() function:

		}
	}
done: u->pc = pc; return limit;
fault_2: u->fault = 2; goto done;
fault_3: u->fault = 3; goto done;

There are goto labels for setting fault codes 2 and 3 because they occur frequently in the macro code. These faults are going to happen in exceptional cases where execution will probably halt, anyway. Rather than have set the value through the u pointer duplicated over and over again in the expanded macros, it’s smaller to jump to shared code that will set the fault code, and then jump to the exit code. At least, that’s what I saw when looking at the disassembly.

The done label, which is also where control will fall through to naturally if the limit variable reaches 0, sets the C stack’s copy of pc back into the UxnCore.

Comments & suggestions ➥ cancel@cancel.fm

There’s an extra byte because, in the Uxn design spec, a 16-bit read or write from the last byte of memory, at address 0xFFFF, isn’t supposed to wrap the second byte back to 0x0000. Instead, the read or write accesses a secret extra byte of RAM. ↩︎