The bytecode intepreter used by this project will expand on the techniques I developed for Zoom. Zoom is a Z-Code intepreter; amongst its features is a specialised bytecode interpreter[13], which enables Zoom to execute programs at around twice the speed of conventional interpreters.
Z-Code[8] was designed for machines with very low memory capacities. Consequently, the design attempts to pack as much data into the space available, usually by using bitfields. The obvious technique for obtaining the values in these bitfields is to use masks and shifts to store a given fields values in a variable. This requires multiple accesses to the value containing the bitfield - memory overhead, as well as several accesses to the location that stores the value - interpretation overhead (compiler optimisation will tend to remove the memory access overhead, but most optimisers will not remove the interpretation overhead). Zoom uses two automatic code generators to reduce both these overheads to a minimum. The first generates a specialised instruction evaluator. This takes an input file containing entries of the form:
OPCODE "add" 2OP:0x14 STORE VERSION all %{ store(stack, st, arg1+arg2); %}
and uses this entry to generate specialised code for every possible value of the first byte of an instruction that has the given opcode. For add, this code looks like:
case 0x14: /* add */ #ifdef DEBUG printf("add\n"); #endif arg1 = GetCode(pc+1); arg2 = GetCode(pc+2); st = GetCode(pc+3); pc += 4; goto op_add; case 0x34: /* add */ #ifdef DEBUG printf("add\n"); #endif arg1 = GetCode(pc+1); arg2 = GetVar(GetCode(pc+2)); st = GetCode(pc+3); pc += 4; goto op_add; case 0x54: /* add */ #ifdef DEBUG printf("add\n"); #endif arg1 = GetVar(GetCode(pc+1)); arg2 = GetCode(pc+2); st = GetCode(pc+3); pc += 4; goto op_add; case 0x74: /* add */ #ifdef DEBUG printf("add\n"); #endif arg1 = GetVar(GetCode(pc+1)); arg2 = GetVar(GetCode(pc+2)); st = GetCode(pc+3); pc += 4; goto op_add; case 0xd4: /* add - variable form */ #ifdef DEBUG printf("add (var form)\n"); #endif padding = zmachine_decode_varop(stack, &GetCode(pc+1), &argblock)-2; st = GetCode(pc+4+padding); pc += 5+padding; goto op_add; . . . #ifndef ZCODE_OP_add # define ZCODE_OP_add op_add: { #line 623 "../../src/zcode.ops" store(stack, st, arg1+arg2); } goto loop; #endif
As many ZCode operation types encode their argument types in the top few bits of an instruction byte, this technique removes the need to interpret these bits. The code generator uses goto to avoid the overhead associated with using a function call (which, on x86 architechture, is surprisingly large). Around 50% of Zoom's speed increase comes from this technique. A further bonus is that the instruction set becomes much easier to expand when a format file such as zcode.ops is used.
The second optimisation concerns ZCodes 'VAR ops', which are operations that can take 0-4 parameters of various types, with the types encoded in a single byte (a 2OP, such as the example 'add', may also have its paremeters encoded in this form). The type of each parameter is stored in a 2-bit field, so the usual technique is to shift each value out in turn. Zoom has another code generator which produces a specialised interpreter for these bytes. This makes the observation that there are only a maximum of 256 possible values for this specifier byte, and many of these are made illegal due to the design of the Z-Machine.
int zmachine_decode_varop(ZStack* stack, ZByte* param, ZArgblock* argblock) { switch (param[0]) { case 0x00: /* arg1 - LC arg2 - LC arg3 - LC arg4 - LC */ argblock->arg[0] = (param[1]<<8)|param[2]; argblock->arg[1] = (param[3]<<8)|param[4]; argblock->arg[2] = (param[5]<<8)|param[6]; argblock->arg[3] = (param[7]<<8)|param[8]; argblock->arg[7] = 0; argblock->n_args = 4; return 8; case 0x01: /* arg1 - LC arg2 - LC arg3 - LC arg4 - SC */ argblock->arg[0] = (param[1]<<8)|param[2]; argblock->arg[1] = (param[3]<<8)|param[4]; argblock->arg[2] = (param[5]<<8)|param[6]; argblock->arg[3] = param[7]; argblock->arg[7] = 0; argblock->n_args = 4; return 7; . . . default: zmachine_fatal("Illegal encoding of a VARop"); return 0; } }
The argblock->arg[7] = 0;
statement is used for
internal reasons by Zoom. This creates a considerable speed
increase when decoding varops, but many Z-Code programs avoid
using them as most interpreters execute them slowly. The final
speed increase produced by Zoom comes from its use of inlining
and macros. Some experimentation has revealed that inlining the
memory operations (reading and writing memory, stack operations
and variable operations) yields a good speed increase for a
minimal increase in code size.