Logical shift Project home

Analysis: Bytecode Interpreter Design

The bytecode intepreter used by this project will expand on the techniques I developed for Zoom. Zoom is a Z-Code intepreter; amongst its features is a specialised bytecode interpreter[13], which enables Zoom to execute programs at around twice the speed of conventional interpreters.

Zoom - a case study in interpreter speciliasation

Z-Code[8] was designed for machines with very low memory capacities. Consequently, the design attempts to pack as much data into the space available, usually by using bitfields. The obvious technique for obtaining the values in these bitfields is to use masks and shifts to store a given fields values in a variable. This requires multiple accesses to the value containing the bitfield - memory overhead, as well as several accesses to the location that stores the value - interpretation overhead (compiler optimisation will tend to remove the memory access overhead, but most optimisers will not remove the interpretation overhead). Zoom uses two automatic code generators to reduce both these overheads to a minimum. The first generates a specialised instruction evaluator. This takes an input file containing entries of the form:

OPCODE "add"          2OP:0x14 STORE  VERSION all
%{ store(stack, st, arg1+arg2); %}
    

Figure 3 - An entry from the 'zcode.ops' file used to generate Zoom's specialised interpreter

and uses this entry to generate specialised code for every possible value of the first byte of an instruction that has the given opcode. For add, this code looks like:

case 0x14: /* add */
#ifdef DEBUG
printf("add\n");
#endif
  arg1 = GetCode(pc+1);
  arg2 = GetCode(pc+2);
  st = GetCode(pc+3);
  pc += 4;
  goto op_add;

case 0x34: /* add */
#ifdef DEBUG
printf("add\n");
#endif
  arg1 = GetCode(pc+1);
  arg2 = GetVar(GetCode(pc+2));
  st = GetCode(pc+3);
  pc += 4;
  goto op_add;

case 0x54: /* add */
#ifdef DEBUG
printf("add\n");
#endif
  arg1 = GetVar(GetCode(pc+1));
  arg2 = GetCode(pc+2);
  st = GetCode(pc+3);
  pc += 4;
  goto op_add;

case 0x74: /* add */
#ifdef DEBUG
printf("add\n");
#endif
  arg1 = GetVar(GetCode(pc+1));
  arg2 = GetVar(GetCode(pc+2));
  st = GetCode(pc+3);
  pc += 4;
  goto op_add;

case 0xd4: /* add - variable form */
#ifdef DEBUG
printf("add (var form)\n");
#endif
    padding = zmachine_decode_varop(stack, &GetCode(pc+1), &argblock)-2;
  st = GetCode(pc+4+padding);
  pc += 5+padding;
  goto op_add;

.
.
.
#ifndef ZCODE_OP_add
# define ZCODE_OP_add
op_add:
  {
#line 623 "../../src/zcode.ops"
 store(stack, st, arg1+arg2); 
  }
  goto loop;
#endif
    

Figure 4 - The code generated by the first helper program used by Zoom

As many ZCode operation types encode their argument types in the top few bits of an instruction byte, this technique removes the need to interpret these bits. The code generator uses goto to avoid the overhead associated with using a function call (which, on x86 architechture, is surprisingly large). Around 50% of Zoom's speed increase comes from this technique. A further bonus is that the instruction set becomes much easier to expand when a format file such as zcode.ops is used.

The second optimisation concerns ZCodes 'VAR ops', which are operations that can take 0-4 parameters of various types, with the types encoded in a single byte (a 2OP, such as the example 'add', may also have its paremeters encoded in this form). The type of each parameter is stored in a 2-bit field, so the usual technique is to shift each value out in turn. Zoom has another code generator which produces a specialised interpreter for these bytes. This makes the observation that there are only a maximum of 256 possible values for this specifier byte, and many of these are made illegal due to the design of the Z-Machine.

int zmachine_decode_varop(ZStack* stack, ZByte* param, ZArgblock* argblock)
{

  switch (param[0])
    {
    case 0x00: /* arg1 - LC arg2 - LC arg3 - LC arg4 - LC */
      argblock->arg[0] = (param[1]<<8)|param[2];
      argblock->arg[1] = (param[3]<<8)|param[4];
      argblock->arg[2] = (param[5]<<8)|param[6];
      argblock->arg[3] = (param[7]<<8)|param[8];
      argblock->arg[7] = 0;
      argblock->n_args = 4;
      return 8;

    case 0x01: /* arg1 - LC arg2 - LC arg3 - LC arg4 - SC */
      argblock->arg[0] = (param[1]<<8)|param[2];
      argblock->arg[1] = (param[3]<<8)|param[4];
      argblock->arg[2] = (param[5]<<8)|param[6];
      argblock->arg[3] = param[7];
      argblock->arg[7] = 0;
      argblock->n_args = 4;
      return 7;
.
.
.
    default:
      zmachine_fatal("Illegal encoding of a VARop");
      return 0;
    }
}
    

Figure 5 - a section of code produced by Zoom's second code generator

The argblock->arg[7] = 0; statement is used for internal reasons by Zoom. This creates a considerable speed increase when decoding varops, but many Z-Code programs avoid using them as most interpreters execute them slowly. The final speed increase produced by Zoom comes from its use of inlining and macros. Some experimentation has revealed that inlining the memory operations (reading and writing memory, stack operations and variable operations) yields a good speed increase for a minimal increase in code size.


Andrew Hunter
Last modified: Sun Nov 12 18:19:56 GMT 2000