内容简介:In the previous parts we've covered some basics, examined the Windows x64 calling convention, and demonstrated the use of Flat Assembler's powerful macro system. It is now time for us to do something interesting with the tools that we've acquired.
Let's Learn x86-64 Assembly! Part 2 - We're Writing a Virtual Machine
This post is a part of a series on x86-64 assembly programming that I'm writing. Check outpart 0 andpart 1! .
In the previous parts we've covered some basics, examined the Windows x64 calling convention, and demonstrated the use of Flat Assembler's powerful macro system. It is now time for us to do something interesting with the tools that we've acquired.
I really like the idea of building emulators for weird fantasy hardware. Back when Notch published docs for DCPU-16 (RIP), I wrote one of the first emulators for it in C++. And in the unlikely event that you're a returning reader of this blog, you may have seen one of my old posts, " My Most Important Project Was a Bytecode Interpreter ", where I talk about my experiments with a stack-based virtual machine.
So I thought, why not build an emulator for a fantasy CPU for this series? It would give us a pretty nice overview to the x86 instructiuon set basics, and it's a simple enough task that we won't have to get too far into the weeds.
In this post, I'll describe the instruction set of the fantasy CPU we'll be emulating, lay the foundation for our emulator and implement a few of the instructions.
Introducing QBX
For this project, I've chosen to emulate a 16-bit, register-based machine (not too far from DCPU-16, actually). There will be no floating point operations. I've decided to name it QBX, a recursive acronym for "QBX Bytecode eXecutor".
QBX will have 4 16-bit general-purpose registers (GPRs): q0
, q1
, q2
and q3
.
In addition to the 4 GPRs, we'll have a few special registers:
-
qip
-the instruction pointer. This is the moral equivalent of the x64rip
register, it contains the address of the next instruction to be executed. -
qsp
- the stack pointer (QBX is going to have stack manipulation instructions). The equivalent ofrsp
in x64. -
qflags
- the "flags" register, contains a set of bits that depend on the outcome of prior instructions. Some instructions (like arithmetic operations) affect the flags register, while others (such as a conditional branch) depend on the values of certain bits in the flags register. The x64 counterpart register isrflags
.
In our implementation, rather than storing the contents of the "virtual" registers in some block of memory, each QBX register will map directly onto a real x86-64 register. This should eliminate a good number of redundant memory load/store operations.
Here is the mapping of QBX to x64 registers that I'm going to use:
QBX | x64 |
q0 | r12 |
q1 | r13 |
q2 | r14 |
q3 | r15 |
qsp | rbx |
qip | rsi |
qflags | r11 |
Note that this requirement has a very important implication: things like "write value from q0
to q1
" and "write value from q2
to q3
" have to be treated as entirely separate instructions, and not simply as different instances of the same instruction. Why is that? Why not have a generic implementation of an instruction, and give it parameters which determine the registers to use?
If the contents of QBX registers were stored in memory, we indeed could do that, and simply pass the memory address corresponding to the virtual register into the instruction implementation. But remember that one of our goals is to maintain direct correspondence between virtual QBX registers and actual x86-64 registers. Since there is no indirect way to refer to a register (there's no such thing as "pointer to register"!), a "generic" implementation for an instruction isn't possible. In this case, we're making a tradeoff: speeding up emulated programs at the expense of increasing the VM's executable size.
Thus, most QBX instructions will be encoded with a single 2-byte value, called the "instruction code" (there will be some exceptions, which we'll call out explicitly). Things like "write value from q0
to q1
" and "write value from q2
to q3
" shall be represented with distinct instruction codes.
QBX will follow the Von Neumann model, storing both instructions and data in the same memory. Since we're not going to implement any kind of protection measures, QBX programs can actually be self-modifying.
Let's now give a brief overview of the QBX instruction set. I'll be providing the exact details on what each instruction does as we go through implementing them. For now we'll just list the general categories:
-
Memory manipulation instructions
:
load
for reading data from memory into a register,stor
for writing data from a register into memory, and instructions for moving data between registers. -
Stack manipulation
-
push
,pop
- Flow control - unconditional jump, jump if zero flag is set, etc.
- Arithmetic - add, subtract, multiply, divide
- Logic - bitwise and, or, not, as well as bit shift operations
We will also eventually be adding the capability to draw characters to screen and read the keyboard status by means of memory-mapped I/O.
The source code of QBX will be posted to github as this series progresses.
Execution Loop Overview
Let's briefly examine how our implementation will be reading and executing instructions for the virtual machine.
We'll have a single readable-writeable chunk of memory (referred to as "QBX memory") in the data segment, for storing instructions and data belonging to the program being executed by QBX. Another chunk of memory, which we'll call the "jump table", will store a mapping from QBX instruction codes to their corresponding implementations.
On every iteration, we fetch the next instruction by reading the QBX memory at the offset specified by the value in the qip
register. The value in the qip
register is then changed to point to the next instruction to execute.
After that, the current instruction's code is used to find the address of the implementation for that instruction, using the jump table. The execution is transferred to that address.
The instruction implementation is a tiny piece of code that modifies parts of QBX state (registers, memory, flags) according to the instruction's definition, and transfers execution back to the instruction fetch step. This continues until the process either exits or crashes.
Helper Macros
We'll start by setting up a few helper macros, to manage some mundane and repetitive tasks. One of such tasks is invoking the Windows API functions - we will need them for exiting the process cleanly, as well as for input and output later. The file win64_helpers.inc contains all the relevant macros for creating import tables and invoking a function. Since I already described them in great detail in theprevious part, I will not be spending much time on them here.
Next, we'll create a special macro that will receive the name of another macro as an argument and apply it to the list of all QBX instructions. This will help us avoid having to list all QBX instructions multiple times. Since that list will eventually get pretty long, we'll put it into its own file, qbx_instructions.inc . For now, its contents will be:
; applies a macro to the list of all QBX instruction names. macro qbx_insns m, [arg*] { common m arg, \ noop, \ halt }
We will need a few more macros for dealing with instructions. Those will go into qbx_insn_helpers.inc . The first one, given a list of instruction names, will define an instruction code constant for each of them:
; define constants for each instruction code. macro define_icodes base, [insn*] { common local next_icode next_icode = base forward insn = next_icode next_icode = next_icode + 1 }
The next helper macro, given a list of instruction names, will define a jump table mapping instruction codes to their implementations:
macro define_jmp_table jmp_table_name*, [op*] { common next_opcode = 0 forward ; define constants for each instruction code. op = next_opcode next_opcode = next_opcode + 1 common ; jump table name jmp_table_name: forward ; IMPL_* are defined by the `insn` macro. dw IMPL_#op }
Reminder - if you're unfamiliar with the FASM macro system, I recommend reviewingpart 1.
Finally, the last two helper macros are:
; marks the beginning of the QBX instruction implementation. macro insn name { IMPL_#name = $ - insn_base } ; marks the end of a QBX instruction implementation. macro endinsn { ; jump back to the instruction fetch step jmp advance } ; a QBX instruction is defined like so: ; insn instruction_name ; ... implementation goes here... ; endinsn
The insn
macro defines a new assembly-time variable equal to the offset of the given instruction implementation from the insn_base
label (incidentally $
in FASM stands for "current address"). define_jmp_table
creates a table of those offsets. The point of storing offsets rather than absolute addresses is to simply conserve space - an absolute address requires a quad word (8 bytes) per table entry on a 64-bit architecture, but to store offsets to all instructions we only actually need 2 bytes per entry.
In qbx_registers.inc , we'll define a few aliases for mapping QBX registers onto x86-64 ones:
; general-purpose registers q0 equ r12w q1 equ r13w q2 equ r14w q3 equ r15w qaddr equ r12 ; needed for load/store insns q0b equ r12b q1b equ r13b q2b equ r14b q3b equ r15b ; stack ptr qsp equ rbx qspw equ bx qspb equ bl ; insn ptr qip equ rsi ; flags qflags equ r11
Implementing the Execution Loop and Our First Instructions
Let's move on to implementing our main execution loop. It's fairly small, but first we need to get some bootstrapping out of the way, like including files with our helper macros and defining import tables:
format PE64 NX GUI 6.0 entry start include 'win64_helpers.inc' include 'qbx_instructions.inc' include 'qbx_insn_helpers.inc' include 'qbx_registers.inc' section '.idata' import readable writeable import_directory_table KERNEL32, USER32 import_functions KERNEL32, \ AllocConsole, \ WriteConsoleOutputA, \ GetStdHandle, \ ExitProcess import_functions USER32, MessageBoxA
As a next step, let's define the constants for all instruction codes by applying the qbx_insns
macro to the define_icodes
macro:
; define constants for all instruction codes qbx_insns define_icodes, 0
Next comes our data section:
section '.data' data readable writeable ; define the jump table. qbx_insns define_jmp_table, qbx_jmp_table ; QBX memory qbx_mem dw noop dw halt db 1024 dup ?
Here, we're first defining our jump table using the helper macro from earlier. Next comes the chunk of QBX memory, labelled qbx_mem
. As you can see, I have hardcoded a program into it - one that does nothing for one cycle and then immediately exits. Later, we'll have better ways to place executable code into QBX memory.
Now, let's examine the part of the code that implements the execution loop. I've added comments inline:
section '.code' code readable executable start: ; entry point - program starts here int3 ; breakpoint for the debugger xor qip, qip ; zero out instruction pointer xor rdi, rdi ; rdi will hold the next instruction code advance: mov di, word [qbx_mem + qip] ; read the next instruction add qip, 2 ; advance instruction pointer movzx r10, word [qbx_jmp_table + rdi * 2] ; read offset from jump table add r10, insn_base ; compute address of insn implementation jmp r10 ; jump to insn implementation
We've seen some of this stuff, like int3 and xor, before. Let's pick apart the loop line-by-line:
-
mov di, word [qbx_mem + qip]
-qip
always holds the address of the next QBX instruction to execute. At the beginning of every loop we use that value to read the next instruction code into the x86-64 registerdi
. Every time you see square brackets in some instruction's operands, it means using the memory at the address given by the bracketed expression - essentially pointer dereferencing. In this case, we're reading from the memory pointed to by theqbx_mem
label, using an offset equal toqip
. Note that we have to specify the size of the data that we intend to read (word, or two bytes, in this case, since all QBX instruction codes are 2 bytes long). -
add qip, 2
- this advances the instruction pointer and makes it point to the next instruction to execute. -
movzx r10, word [qbx_jmp_table + rdi * 2]
- here we're reading the offset of the instruction implementation from the jump table.movzx
is a special form of mov that writes data from a smaller register (or region of memory) into a larger register, and zeroes out the unused bits in the target. -
add r10, insn_base
- since at this pointr10
contains the offset of the instruction implementation from theinsn_base
label, all we need to do is add the address denoted byinsn_base
to it, and we have the address of the instruction implementation. -
jmp r10
- this jump transfers execution to the address of the instruction implementation that we just calculated.
And finally, here are the implementations for our two first instructions:
insn_base: insn noop nop endinsn insn halt call64 [ExitProcess], 0 endinsn
noop
corresponds to actual x64 instruction nop
, and halt
simply calls ExitProcess
to cease execution.
I recommend building this program and stepping through it with WinDbg (check outpart 0 if you're unfamiliar with WinDbg) to see it in action now.
Implementing the Register and Memory Manipulation Instructions
Now that we have the basic machinery for fetching and processing QBX instructions set up, we can get to implementing more of them. We'll start with instructions that read/write data from/to memory and move data between registers.
On x86-64 processors, the mov
instruction is responsible for doing that. You can see the detailed documentation
for mov
on Félix Coutlier's excellent online x86 reference, which lists the many possible forms of the instruction. As you can see, mov
is very flexible: it can move differently sized pieces of data between memory and registers. However, there are some limitations, e.g. you can't mov
from memory to memory: you always have to go through a register. You also can't mov
between differently sized registers. There are special instructions for that: movzx
( MOVe with Zero Extend
) and movzx
( MOVe with Sign Extend
). These two copy a value from a smaller register or piece of memory to a larger register, and set the extra bits in the target register to zero (for movzx
) or the sign bit of the value (for movsx
). Note that none of these instructions affect any flags.
Let's start by implementing the QBX instructions that place a value into a register. For reasons I mentioned earlier in this post, we have to implement separate instructions for placing a value into registers q0
, q1
, q2
and q3
respectively. Luckily, with the help of FASM's macro system it won't be too much hassle.
We'll first add the new instructions to the list of all QBX instructions in qbx_instructions.inc:
macro qbx_insns m, [arg*] { common m arg, \ noop, \ halt, \ moviwq0, \ moviwq1, \ moviwq2, \ moviwq3 }
The mnemonic is "MOVe Immediate Word to qip
into the appropriate QBX register, and advance qip
by 2. Let's take a look at the implementation:
; move immediate word-sized value into register. rept 4 reg:0 { insn moviwq#reg mov q#reg, word [qbx_mem + qip] add qip, 2 endinsn }
Here, we're seeing a new feature of the FASM macro system - the rept
directive. rept
instructs the assembler to repeat a block of code a fixed number of times. The number of times to repeat is passed in as the first parameter of the directive. The second parameter of the directive is the name of the counter, which can be referred to from within the code block. By default the counter is 1-based, but here we're making it 0-based.
We can now add the "MOVe Immediate Byte" group of instructions, that operates in a similar fashion (don't forget to add them to qbx_instructions.inc as well!):
; move immediate byte-sized value into register. rept 4 reg:0 { insn movibq#reg mov q#reg#b, byte [qbx_mem + qip] add qip, 1 endinsn }
Slightly more involved is the group of instructions for moving values between registers. It requires two nested rept
s and a check to make sure we're not generating code for redundant moves (i.e. q0
to q0
).
; move word-sized value between registers. rept 4 tgt:0 { rept 4 src:0 \{ if ~(tgt eq \src) insn movwq#tgt#q\#src mov q#tgt, q\#src endinsn end if \} } ; move byte-sized value between registers. rept 4 tgt:0 { rept 4 src:0 \{ if ~(tgt eq \src) insn movbq#tgt#q\#src mov q#tgt#b, q\#src\#b endinsn end if \} }
Now that we've covered moving data between registers, let's turn our attention to reading and writing memory. The load
group of instructions reads memory, while the stor
group of instructions writes it. Every memory-related instruction requires the address that is being read or written, the register that is being loaded into or stored, as well as the size of data being read or written. The way the address is specified is determined by the instruction's addressing mode
. For simplicity, QBX has only two straightforward addressing modes:
- Direct - the absolute address at which to read or write memory is specified directly after the instruction code.
-
Indirect
- the absolute address at which to read or write memory is specified in the register
q0
.
q3
at address specified in q0
".
Below is the implementation for stores (loads are analogous). Note that qaddr
maps to r12
which is the 64-bit extension of the q0
register - we need 64-bit registers to calculate addresses.
; store word-sized value to direct address. rept 4 reg:0 { insn storwdq#reg movzx rcx, word [qbx_mem + qip] add qip, 2 ; note that we need to modify qip to skip the address. mov word [qbx_mem + rcx], q#reg endinsn } ; store byte-sized value to direct address. rept 4 reg:0 { insn storbdq#reg movzx rcx, word [qbx_mem + qip] add qip, 2 mov byte [qbx_mem + rcx], q#reg#b endinsn } ; store word-sized value to address in q0. rept 3 reg { insn storwiq#reg mov word [qbx_mem + qaddr], q#reg endinsn } ; store byte-sized value to address in q0. rept 3 reg { insn storbiq#reg mov byte [qbx_mem + qaddr], q#reg#b endinsn }
Implementing the Stack Manipulation Instructions
The stack is an area of memory where programs can put some intermediate data. As we've seen before, function return addresses, some local variables and function arguments live on the stack. The QBX virtual stack is distinct from our thread's actual stack.
The stack is characterized by its size, and a stack pointer. There are two operations we can do with it - push and pop.
Pushing data from a register onto the stack amounts to:
- decreasing the stack pointer by the size of the written data;
- writing the data to the stack.
Popping data from the stack into a register is:
- reading the required amount from the stack;
- increasing the stack pointer by the required amount.
Incidentally, the x86-64 stack manipulation instructions follow the same rules as well.
These operations are somewhat counter-intuitive because the stack pointer decreases when the stack grows . I've tried making some visual explanations for them . Here's one for push:
We'll implement push and pop for both word- and byte-sized values. Here's what the implementation looks like for word-sized values:
; push word-sized value onto the stack rept 4 reg:0 { insn pushwq#reg sub qsp, 2 mov word [qbx_mem + qsp + 1], q#reg endinsn } ; pop word-sized value from the stack. rept 4 reg:0 { insn popwq#reg mov q#reg, word [qbx_mem + qsp + 1] add qsp, 2 }
Writing a Test Program for QBX
It's time for us to finally write a program for QBX to test out the instructions we just implemented. We hardcoded our first program into the machine's memory by putting the corresponding machine code directly into our data section. We'll be hardcoding our programs into memory for a while, but we can make things a bit more convenient.
Create a file called input.asm with the following contents:
include 'qbx_insn_helpers.inc' include 'qbx_instructions.inc' qbx_insns define_icodes, 0 dw moviwq0, 0xbeef, \ moviwq1, 0xfeed, \ movbq2q1, \ movwq2q0, \ storbdq1, 1023, \ loadbdq2, 1023, \ moviwq0, 1022, \ storbiq1, \ pushwq0, \ pushwq1, \ popwq2, \ popwq3, \ halt
What's happening here is we're using some of our helper macros to introduce the instruction code constants, and then we're using FASM's data definition directives to write QBX bytecode in a human-friendly way. The syntax is a bit awkward, but the cool thing is that labels, macros and all other FASM features still totally work!
If you compile this file now, it will produce an input.bin
that can be directly loaded into QBX memory. For now, since we can't load the file dynamically, we'll include it at assembly time. Here's what our updated data section looks like:
section '.data' data readable writeable ; jump table mapping insn codes to insn implementations. qbx_insns define_jmp_table, qbx_jmp_table ; QBX memory. QBX_MEM_SIZE = 1024 qbx_mem file 'input.bin' CODE_SIZE = $ - qbx_mem dw QBX_MEM_SIZE - CODE_SIZE dup ?
(Incidentally, FASM can be really helpful if you need to quickly generate any sort of binary file!)
Since our VM has no I/O capabilities yet, the only way we can observe the effects of this program is through the debugger. You can load up your executable in WinDbg, run it to our hardcoded breakpoint, and then single-step it. You should be able to see the changes to the registers in the corresponding window (keep in mind the mapping between QBX registers and x86-64 ones).
Monitoring changes to memory is a bit more tricky. Let's see it on an example. The illustration below shows the state of the debugger just after the instruction pushwq0
was executed.
The command db
can be used to examine the raw bytes of memory at a particular virtual address. Note the mov
instruction just above the highlighted line in the Disassembly window - this is what writes to memory. The effective address that is used for the write is calculated from a base (which happens to be 0x40209d) and an offset stored in rbx
(recall that x86-64 rbx
register corresponds to QBX stack pointer!). By adding those two values together, we get the address of the top of QBX stack. If we look at it, we'll see that it contains exactly the bytes from the register that we just pushed (QBX q0
is x64 r12w
). The apparently reversed order of bytes is actually due to x86-64 little-endianness
.
That's it for this part. In the next installment, we'll learn about arithmetic and logic operations, so stay tuned!
Like this post? Follow this blog on Twitter for more!
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Ext JS源码分析与开发实例宝典
彭仁夔 / 电子工业出版社 / 2010-1 / 78.00元
《Ext JS源码分析与开发实例宝典》从Ext JS实现的基本功能开始讲解,从两个方面对Ext JS进行整体上的概述,让读者从宏观上去把握ExtJS框架。接下来讲解Ext JS核心基础知识,包括事件机制、模板模型、数据模型以及对类、函数、字符串、日期、数组及定时任务这6个类进行扩展。然后讲解Ext JS基于元素的开发,包括动画特效和拖曳实现等。最后深入讲解组件的开发,对布局、模型及4大组件一一进行......一起来看看 《Ext JS源码分析与开发实例宝典》 这本书的介绍吧!