In optimizing code for performance, it’s crucial to know the critical path. Once you know which part of your code is most impacting performance, you know where to spend most of your effort optimizing. In Aeon, this means focusing on the instruction decoding, since every instruction must be decoded before it can be emulated. The x86 instruction set is a CISC architecture, so this means there’s a lot of instructions and a lot of data is packed into a small number of bits. For example, most of the x86 instructions accept multiple operands, with the operands consisting of any combination of CPU register, memory address, or immediate value. Add to this the fact that memory addressing has implicit register-based offsets encoded in the operand, with two completely different meanings depending on whether the instruction uses 16 or 32-bit addressing, and you have a lot to decode on potentially every instruction.
The more I learned about the x86 machine code, the more I wanted to decouple it from the actual instruction implementation. In other words, I wanted to be able to write code that would emulate the function of an instruction without worrying about the form its operands are in, and I wanted this to be fast. Eventually, I did succeed in this…mostly. Here’s what my 16-bit MOV instruction emulator looks like:
[Opcode("89/r rmw,rw|8B/r rw,rmw|8C/r rmw,sreg|8E/r sreg,rmw|A1 ax,moffsw|A3 moffsw,ax|B8+ rw,iw|C7/0 rmw,iw", AddressSize = 16 | 32)]
public static void MoveWord(VirtualMachine vm, out ushort dest, ushort src)
{
dest = src;
}
Before I get into that giant Opcode attribute on the method, let’s dissect the method itself. The first parameter is present on every instruction, and it contains a reference to the current emulated system – this is mainly used by instructions that need to cause certain side-effects, such as updating processor flags or accessing memory in a specific way; for the MOV instruction, it isn’t used, but is still required due to the way this method gets invoked. Following the VirtualMachine are the operands for the instruction; in this case, the first one is marked with the out keyword, meaning that the operand is only written-to by the method. The second operand is only read from, so it is passed by value. Both operands are 16-bit values, implying that this emulates the 16-bit operand-size version of MOV. Finally, the body of the method simply assigns the value of the second operand to the first operand. The dest and src parameters could be registers, values in emulated RAM, or immediates, but to MoveWord they are all the same.
One of the things I like about C# is that you can add custom metadata to things like methods and then inspect these at runtime using reflection. In this case, the custom OpcodeAttribute class specifies all of the opcodes and operand formats for each instruction implementation. The | symbol inside the long opcode string above basically means “or,” so all of the substrings between them specify the different MOV opcodes and their operands. Note that all of them take exactly two operands and all of them have a writable first operand (immediate values only ever show up as the second operand). I’ll break down the first form of MOV:
89 |
Specifies a 1-byte hexadecimal opcode |
/r |
The byte following the opcode specifies at least one register |
rmw |
The first operand is either a register or a value in memory of the current processor word-size (16-bit, in this case) |
rw |
The second operand is a register of the current processor word-size (16-bit) |
This information, along with the out keyword in the method signature, allows Aeon to generate code for decoding these operands, fetching initial values, and then writing any out or ref values back. Basically, when emulation begins for the first time, all of the opcode methods are enumerated and the OpcodeAttributes on them are parsed into more manageable objects. From this information, a small IL method is generated to decode, call the implementation method, and then write back values if necessary. Because the resulting generated method gets compiled by the JIT like everything else, simple instructions like MOV end up getting automatically inlined, despite the runtime binding.
Besides the speed, another benefit to this is that it’s really easy for me to add new instructions (provided they don’t encode some type of operand I haven’t dealt with yet). All I need to do is add a static method with the appropriate attribute and the runtime binding handles the rest. The downside is that writing this IL generator was a big pain, and it was very hard to test and nearly impossible to debug. Hmm… I guess it was worth it…