Table of Contents
Introduction
Welcome to the second part of my multi-part series on virtualization-based obfuscators. If you didn't read the first post where I analyzed Tigress, I recommend reading that first as it introduces many concepts that are important for understanding this write-up. There's also a great Terminology section that defines a couple of frequently-used terms. Here's a link to the post if you want it. I've also published all of the IDA databases, binaries, and diagrams from this writeup here.
Setup
VMProtect is a commercial obfuscator for software protection and is widely considered to be one of the best. While VMProtect does offer a trial version, it applies much simpler obfuscations that are different from the commercial version, so I ended up choosing to reverse the full version of VMProtect. In this write-up, I will be analyzing a simple "Hello World" binary virtualized with VMProtect v3.5.0.1274 (the latest version at the time of writing).
Even though VMProtect supports all kinds of Windows, Linux, and macOS binaries (and even .NET/C# applications), I decided to just make a simple C "Hello World" Windows application to ease analysis. Inside the VMProtect software, I selected hello_world.exe as my target and started to mess around with the settings. It is required to select the individual functions that you want VMProtect to obfuscate, so I went ahead and added helloWorld. I also made sure to only enable virtualization for the binary as I didn't want to deal with mutation. After this, I turned off all of the extra protection such as import protection and packing in the options menu. These extra protections are not the main focus of this write-up and would only hinder our analysis of the virtual machine.
After this, I pressed the compile button and had my obfuscated binary in seconds. From now on, when I refer to hello_world.exe, I will be referring to the VMProtect obfuscated version of the binary.
Analysis
I put the string "Main Function" inside the source code to be able to quickly locate the main function. By looking at the IDA string view, it took about 3 seconds to find it. Taking a peek at the assembly, it looks fairly normal; I renamed some of the functions to make it easier to read. The "Main Function" string is printed by printf, and then the helloWorld function is called. I knew I needed to check in there for any signs of obfuscation.
After entering the helloWorld function, a couple of isolated jumps led to the code seen above. As you can see, it pushes a seemingly random value onto the stack and then jumps to the VM entry point.
Static Obfuscations
One of the most obnoxious parts about reversing VMProtect is the number of static obfuscations that they apply to the binary. Even with mutation disabled, VMProtect still implements dead store code, opaque branching, jump obfuscation, code duplication, and more to protect the internals of their virtual machine. While some of these are fairly difficult to eliminate without drastic measures, dead store code can be manually replaced with nop's and hidden in IDA. I created an IDA Plugin called NOPnHIDE for this exact purpose. For the rest of the analysis, I will be using this plugin to eliminate dead store code.
Another annoying static obfuscation that is much harder to remove is control flow obfuscation. The most obvious form of this is how VMProtect separates small blocks of code into even smaller blocks of code connected by jmp instructions (opaque branching). This seriously clutters the IDA graph view and makes it much more difficult to analyze. Also, while the general architecture of VMProtect is incredibly hard to analyze in IDA (simply due to the amount of jumps it incorporates), they also use push+ret jump obfuscations to break IDA's control flow graphs. While I won't talk about either of these obfuscations in-depth, just know that they do hinder analysis and are fairly difficult to remove.
Looking back at the de-obfuscated code, the first thing to occur after entering the VM is all of the registers being pushed onto the stack. This is done so that the registers can be restored to their original states after the VM exits.
Bytecode Location
After the registers are pushed, the value 0x7FF695820000 is moved into rcx and pushed onto the stack. It seems to be some sort of base address that is used to compute jumps with RVA's. Keep this value in mind as it will have a use on the stack later. A pointer to the virtualized bytecode is also moved into rsi, which is now our instruction pointer for the VM. The pointer is then decrypted by a series of subtraction, negation, addition, and another negation. This transformation sequence is randomized by VMProtect for each different VM. Moving along, the value 0x7FF695820000 (in rcx) and then 0x100000000 is added to the decrypted pointer. While these seem like random constants at the moment, it does provide us with the decrypted pointer to the bytecode, so something is working 🙂.
Virtual Stack
Now that the pointer to the bytecode has been decrypted, the current location of the stack is moved into r10. After this, 0x180 bytes are allocated on the stack for the VM. About 0x40 of these bytes are for the virtual stack and the other 0x140 bytes are for the virtual register space. The register r10 is now our virtual stack pointer since it points to the location underneath the allocated bytes and rsp is now the pointer to the top of the virtual register space. I will describe these structures in greater detail later. Finally, an and operation is performed on rsp which aligns the address to a 16-byte boundary. These operations also show that VMProtect chooses a specific physical register to store pointers to VM context data.
Self-Modifying Encryption Key
Now that the VM has successfully initialized the virtual stack and register space, the current value of rsi gets moved into r9 and some seemingly random calculations happen to it. While it may not seem apparent at the moment, this register now stores the self-modifying encryption key. The numerical value that is stored here has no significance, as VMProtect could use any value it wanted as the initial decryption key, but it seems like they decided to use some numbers that were already on hand. We will see how this encryption key gets used in a moment.
Fetch, Decrypt, Jump (FDJ) Routine
To understand the next section of code, I will quickly explain how a significant portion of the VMProtect 3 architecture works. The most prominent difference between VMProtect and other virtual machines is the fact that it does not use opcodes or a handler table. Instead, it uses an offset that is decoded with the self-modifying decryption key and added to the address of the current handler. This may be slightly hard to visualize, so I'll show the next code segment to hopefully clear things up.
This routine is the Fetch, Decrypt, Jump routine. I will refer to this routine as FDJ from now on. It is incredibly important and can be found in every single handler in the VM, so pay attention 🙂. It starts by moving the address of the current handler into rdi. This address will be used in a while, so just remember that it's there. After this, the bytecode pointer is... reduced by four? I was initially confused by this, but after some investigation, it seems that this VM interprets the bytecode backward in memory. I also found that this is randomized by VMProtect, so other VM's it generates may store it normally, but this VM specifically stores it backward. Either way, the bytecode pointer is now pointing to a 4-byte value included with every instruction: the handler jump offset. This value is stored into ecx which marks the start of the decryption sequence.
Now, the self-modifying decryption key stored in r9 is xor'd with the encrypted handler jump offset in ecx. After this, four random transformations are applied to the handler jump offset. In this case, it is negated, rotated left by one, incremented by one, and then negated again, but these transformations seem to be randomized for each handler. After all of this, the self-modifying encryption key itself (stored in r9) is xor'd again with the final decrypted handler jump offset. This is why it is self-modifying: after it finishes decrypting a value, it xor's the decrypted value with itself. While this seems incredibly convoluted, this self-modifying encryption key serves multiple purposes. Starting with the most obvious, it makes reversing the VM incredibly confusing. While this is the ultimate goal of VMProtect, this specific encryption routine had me stumped for a while. Other than that, it also protects the VM bytecode from being modified or hooked in any way as the self-modifying encryption key will be thrown out of order if any instructions are added, modified, or removed. This design from VMProtect is impressive and adds another layer of protection to an already incredibly complex VM.
After the handler jump offset is decrypted, it is added to rdi. From the start of the routine, we know that rdi stores the base address of the previous handler. After the handler jump offset is added to the previous handler address, we now have the address of the next handler, which is jumped to immediately. This routine is repeated after every instruction handler until the VM exits
Now that we understand the FDJ routine, we can start analyzing some of the actual instruction handlers. The handler pictured above is the very first handler that the virtualized helloWorld function executes after initializing. It starts by moving the value that r10 points to into rbp. If you remember from earlier, r10 is the virtual stack pointer register, meaning this instruction is reading from the virtual stack. After, r10 gets increased by 8, meaning the stack pointer goes down on the stack. These instructions clearly popped a value off the virtual stack. After this, rsi is subtracted by 1 and a byte is moved into eax. This is the operand for this instruction. Directly afterward, we can see a very familiar sequence of instructions. First, al (the byte register for eax) is xor'd with r9. Since r9 is the self-modifying encryption key, we can assume that this is a decryption sequence for the operands. The xor, not, neg, and rol instructions afterward are the transformations I talked about earlier. After decrypting the operands, x9 is xor'd with al, which again shows the self-modifying nature of the encryption key. As you can see, this encryption key is used in more places than just the handler jump offset decryption; it also decrypts the operands for the instructions.
Virtual Registers
After the operand is decrypted, it is used as an offset for rsp and the value from the virtual stack (rbp) is moved into it. It seems like rsp is being used as a virtual register space. Even though the VMProtect architecture is a stack machine, it still has registers for temporary storage, similar to how x86 has a stack even though it's a register-memory machine. Also, even though I have named them virtual registers, there are 40 of them, meaning they can also be used as a temporary data storage space and for local variables.
Going back to the handler, since we know r10 points to the virtual stack and rsp is the virtual register space, we can assume that this handler is popping a value off of the virtual stack and putting it into the virtual register space. After it does this, the handler begins the FDJ routine and moves on to the next instruction. Looking at the handler dynamically, it starts to make sense what it is doing in the context of the original function. Since 0x7FF695820000 (the base address from earlier) was pushed onto the stack right before the virtual stack was defined, it is the value that is now being moved into the virtual register space, specifically at the location 0xB8. If I had to guess, this base address will likely be used to calculate the address of printf in the future.
Instruction Trace
To find that out, I'm going to painstakingly step through all of the instruction handlers and record the purpose of each in the table below. Since there are no opcodes in the VMProtect 3 architecture, it's fairly hard to classify the instruction handlers, but I'll do my best to give each handler a name based on its behavior, list its operands, and write down its location (I'll explain this shortly).
# | Behavior | Handler Name & Operands |
---|---|---|
0 | VREGISTERS[0xB8] = [VSP] (0x7FF695820000) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
1 | VREGISTERS[0x58] = [VSP] (dynamic stack addr.) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
2 | VREGISTERS[0xA0] = [VSP] (0x7FF6C5239BC0) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
3 | VREGISTERS[0x38] = [VSP] (0xE510467C0ADD0000) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
4 | VREGISTERS[0xB0] = [VSP] (0xFFFFFD7F) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
5 | VREGISTERS[0x30] = [VSP] (0x0) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
6 | VREGISTERS[0x68] = [VSP] (dynamic stack addr.) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
7 | VREGISTERS[0x28] = [VSP] (0x0) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
8 | VREGISTERS[0x10] = [VSP] (0x7FFE431CCF24, ptr to ntdll:NtWriteFile+14) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
9 | VREGISTERS[0x0] = [VSP] (0x246) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
10 | VREGISTERS[0x80] = [VSP] (0x0) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
11 | VREGISTERS[0x40] = [VSP] (dynamic stack addr.) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
12 | VREGISTERS[0x48] = [VSP] (0x0) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
13 | VREGISTERS[0x18] = [VSP] (0x0) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
14 | VREGISTERS[0x8] = [VSP] (0xD) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
15 | VREGISTERS[0x60] = [VSP] (0x0) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
16 | VREGISTERS[0x90] = [VSP] (0x246) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
17 | VREGISTERS[0x20] = [VSP] (0x7FF6C534330D, return address from vm_entry) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
18 | VREGISTERS[0x78] = [VSP] (0xFFFFFFFF80745CDA) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
19 | VSP -= 0x8 [VSP] = 0x140011890 if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushConstantOntoStack(QWORD constant) |
20 | VSP -= 0x8 [VSP] = VREGISTERS[0xB8] (0x7FF690780000) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
21 | [VSP + 8] = [VSP] + [VSP + 8] (0x7FF7D0791890) [VSP] = efl (0x206) |
AddStack() |
22 | VREGISTERS[0xA8] = [VSP] (0x206) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
23 | VREGISTERS[0xC0] = [VSP] (0x7FF7D0791890) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
24 | QWORD temp = VSP VSP -= 8 [VSP] = temp if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVSPToVirtualStack() |
25 | VREGISTERS[0xC8] = [VSP] (dynamic stack addr.) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
26 | VSP -= 0x8 [VSP] = VREGISTERS[0x68] (dynamic stack addr.) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
27 | VREGISTERS[0xD0] = [VSP] (dynamic stack addr.) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
28 | VSP -= 0x8 [VSP] = VREGISTERS[0x80] (0x0) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
29 | VREGISTERS[0xD8] = [VSP] (0x0) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
30 | VSP -= 0x8 [VSP] = VREGISTERS[0x40] (dynamic stack addr.) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
31 | VREGISTERS[0xE0] = [VSP] (dynamic stack addr.) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
32 | VSP -= 0x8 [VSP] = VREGISTERS[0x18] (0x0) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
33 | VREGISTERS[0xE8] = [VSP] (0x0) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
34 | VSP -= 0x8 [VSP] = VREGISTERS[0x68] (dynamic stack addr.) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
35 | QWORD temp = VSP VSP -= 8 [VSP] = temp if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVSPToVirtualStack() |
36 | VREGISTERS[0xC8] = [VSP] (dynamic stack addr.) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
37 | VSP -= 0x8 [VSP] = 0x140011892 if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushConstantOntoStack(QWORD constant) |
38 | VSP -= 0x8 [VSP] = VREGISTERS[0xB8] (0x7FF690780000) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
39 | [VSP + 8] = [VSP] + [VSP + 8] (0x7FF7D0791892) [VSP] = efl (0x246) |
AddStack() |
40 | VREGISTERS[0x70] = [VSP] (0x246) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
41 | VREGISTERS[0xC0] = [VSP] (0x7FF7D0791892) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
42 | VSP -= 0x8 [VSP] = VREGISTERS[0x40] (0x0) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
43 | QWORD temp = VSP VSP -= 8 [VSP] = temp if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVSPToVirtualStack() |
44 | VREGISTERS[0xC8] = [VSP] (dynamic stack addr.) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
45 | VSP -= 0x8 [VSP] = 0x140011893 if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushConstantOntoStack(QWORD constant) |
46 | VSP -= 0x8 [VSP] = VREGISTERS[0xB8] (0x7FF690780000) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
47 | [VSP + 8] = [VSP] + [VSP + 8] (0x7FF7D0791893) [VSP] = efl (0x246) |
AddStack() |
48 | VREGISTERS[0x50] = [VSP] (0x246) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
49 | VREGISTERS[0xC0] = [VSP] (0x7FF7D0791893) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
50 | VSP -= 0x8 [VSP] = 0xE8 if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushConstantOntoStack(QWORD constant) |
51 | QWORD temp = VSP VSP -= 8 [VSP] = temp if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVSPToVirtualStack() |
52 | VSP -= 0x8 [VSP] = 0x8 if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushConstantOntoStack(QWORD constant) |
53 | [VSP + 8] = [VSP] + [VSP + 8] (dynamic stack addr.) [VSP] = efl (0x246) |
AddStack() |
54 | VREGISTERS[0xA8] = [VSP] (0x246) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
55 | QWORD temp = VSP VSP -= 8 [VSP] = temp if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVSPToVirtualStack() |
56 | VSP -= 0x8 [VSP] = 0x10 if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushConstantOntoStack(QWORD constant) |
57 | [VSP + 8] = [VSP] + [VSP + 8] (dynamic stack addr.) [VSP] = efl (0x202) |
AddStack() |
58 | VREGISTERS[0xA8] = [VSP] (0x202) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
59 | [VSP + 8] = AND(NOT([VSP]), NOT([VSP + 8])) (0x282) [VSP] = efl (0x282) |
NotThenAnd() |
60 | VREGISTERS[0x98] = [VSP] (0x282) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
61 | [VSP + 8] = [VSP] + [VSP + 8] (0xFFFFFF540B28093F) [VSP] = efl (0x286) |
AddStack() |
62 | VREGISTERS[0x78] = [VSP] (0x286) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
63 | QWORD temp = VSP VSP -= 8 [VSP] = temp if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVSPToVirtualStack() |
64 | [VSP] = [[VSP]] (0xFFFFFF03FC42052F) | ReadMemory() |
65 | [VSP + 8] = AND(NOT([VSP]), NOT([VSP + 8])) (dynamic stack addr.) [VSP] = efl (0x246) |
NotThenAnd() |
66 | VREGISTERS[0x20] = [VSP] (0x246) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
67 | VSP = [VSP] if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
ChangeVSPToCurrentStackValue() |
68 | QWORD temp = VSP VSP -= 8 [VSP] = temp if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVSPToVirtualStack() |
69 | VREGISTERS[0x78] = [VSP] (dynamic stack addr.) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
70 | VSP -= 0x8 [VSP] = VREGISTERS[0xB8] (0x282) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
71 | VSP -= 0x8 [VSP] = VREGISTERS[0x78] (0x282) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
72 | [VSP + 8] = AND(NOT([VSP]), NOT([VSP + 8])) (0xFFFFFFFFFFFFFD7D) [VSP] = efl (0x286) |
NotThenAnd() |
73 | VREGISTERS[0x98] = [VSP] (0x286) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
74 | VSP -= 0x8 [VSP] = 0xFFFFFFFFFFFFF7EA if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushConstantOntoStack(QWORD constant) |
75 | [VSP + 8] = AND(NOT([VSP]), NOT([VSP + 8])) (0x0) [VSP] = efl (0x246) |
NotThenAnd() |
76 | VREGISTERS[0x88] = [VSP] (0x246) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
77 | VSP -= 0x8 [VSP] = VREGISTERS[0x20] (0x202) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
78 | VSP -= 0x8 [VSP] = VREGISTERS[0x20] (0x202) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
79 | [VSP + 8] = AND(NOT([VSP]), NOT([VSP + 8])) (0xFFFFFFFFFFFFFDFD) [VSP] = efl (0x282) |
NotThenAnd() |
80 | VREGISTERS[0x50] = [VSP] (0x282) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
81 | VSP -= 0x8 [VSP] = 0x815 if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushConstantOntoStack(QWORD constant) |
82 | [VSP + 8] = AND(NOT([VSP]), NOT([VSP + 8])) (0x202) [VSP] = efl (0x202) |
NotThenAnd() |
83 | VREGISTERS[0x70] = [VSP] (0x202) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
84 | [VSP + 8] = [VSP] + [VSP + 8] (0x206) [VSP] = efl (0x206) |
AddStack() |
85 | VREGISTERS[0xA8] = [VSP] (0x206) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
86 | VREGISTERS[0x70] = [VSP] (0x206) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
87 | VSP -= 0x8 [VSP] = 0x14001189A if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushConstantOntoStack(QWORD constant) |
88 | VSP -= 0x8 [VSP] = VREGISTERS[0xB8] (0x7FF588180000) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
89 | [VSP + 8] = [VSP] + [VSP + 8] (0x7FF6C819189A) [VSP] = efl (0x206) |
AddStack() |
90 | VREGISTERS[0x50] = [VSP] (0x206) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
91 | VREGISTERS[0xC0] = [VSP] (0x7FF6C819189A) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
92 | QWORD temp = VSP VSP -= 8 [VSP] = temp if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVSPToVirtualStack() |
93 | VSP -= 0x8 [VSP] = 0x20 if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushConstantOntoStack(QWORD constant) |
94 | [VSP + 8] = [VSP] + [VSP + 8] (dynamic stack addr.) [VSP] = efl (0x206) |
AddStack() |
95 | VREGISTERS[0xA8] = [VSP] (0x206) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
96 | VREGISTERS[0x78] = [VSP] (dynamic stack addr.) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
97 | VSP -= 0x8 [VSP] = VREGISTERS[0x78] (dynamic stack addr.) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
98 | VREGISTERS[0xD0] = [VSP] (dynamic stack addr.) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
99 | VSP -= 0x8 [VSP] = 0x14001189F if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushConstantOntoStack(QWORD constant) |
100 | VSP -= 0x8 [VSP] = VREGISTERS[0xB8] (0x7FF588180000) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
101 | [VSP + 8] = [VSP] + [VSP + 8] (0x7FF6C819189F) [VSP] = efl (0x206) |
AddStack() |
102 | VREGISTERS[0x68] = [VSP] (0x206) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
103 | VREGISTERS[0xC0] = [VSP] (0x7FF6C819189F) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
104 | QWORD temp = VSP VSP -= 8 [VSP] = temp if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVSPToVirtualStack() |
105 | VREGISTERS[0xA8] = [VSP] (dynamic stack addr.) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
106 | VSP -= 0x8 [VSP] = 0x1400118A2 if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushConstantOntoStack(QWORD constant) |
107 | VSP -= 0x8 [VSP] = VREGISTERS[0xB8] (dynamic stack addr.) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
108 | [VSP + 8] = [VSP] + [VSP + 8] (0x7FF6C81918A2) [VSP] = efl (0x202) |
AddStack() |
109 | VREGISTERS[0x90] = [VSP] (0x202) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
110 | VREGISTERS[0xC0] = [VSP] (0x7FF6C81918A2) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
111 | VSP -= 0x8 [VSP] = 0x3A if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushConstantOntoStack_32(DWORD constant) |
112 | VREGISTERS[0x38] = [VSP] (0x3A) VSP += 0x8 |
PopStackToVRegister_32(BYTE vRegisterNumber) |
113 | VSP -= 0x8 [VSP] = 0x0 if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushConstantOntoStack_32(DWORD constant) |
114 | VREGISTERS[0x3C] = [VSP] (0x0) VSP += 0x8 |
PopStackToVRegister_32(BYTE vRegisterNumber) |
115 | VSP -= 0x8 [VSP] = 0x1400118A7 if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushConstantOntoStack(QWORD constant) |
116 | VSP -= 0x8 [VSP] = VREGISTERS[0xB8] (0x7FF588180000) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
117 | [VSP + 8] = [VSP] + [VSP + 8] (0x7FF6C81918A7) [VSP] = efl (0x202) |
AddStack() |
118 | VREGISTERS[0x98] = [VSP] (0x202) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
119 | VREGISTERS[0xC0] = [VSP] (0x7FF6C81918A7) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
120 | VSP -= 0x8 [VSP] = 0xCCCCCCCC if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushConstantOntoStack_32(DWORD constant) |
121 | VREGISTERS[0x8] = [VSP] (0xCCCCCCCC) VSP += 0x8 |
PopStackToVRegister_32(BYTE vRegisterNumber) |
122 | VSP -= 0x8 [VSP] = 0x0 if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushConstantOntoStack_32(DWORD constant) |
123 | VREGISTERS[0xC] = [VSP] (0x0) VSP += 0x8 |
PopStackToVRegister_32(BYTE vRegisterNumber) |
124 | VSP -= 0x8 [VSP] = 0x1400118AC if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushConstantOntoStack(QWORD constant) |
125 | VSP -= 0x8 [VSP] = VREGISTERS[0xB8] (0x7FF588180000) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
126 | [VSP + 8] = [VSP] + [VSP + 8] (0x7FF602BB18AC) [VSP] = efl (0x206) |
AddStack() |
127 | VREGISTERS[0x90] = [VSP] (0x206) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
128 | VREGISTERS[0xC0] = [VSP] (0x7FF602BB18AC) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
129 | VSP -= 0x8 [VSP] = 0x140123324 if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushConstantOntoStack(QWORD constant) |
130 | VSP -= 0x8 [VSP] = VREGISTERS[0xB8] (0x7FF4C2BA0000) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
131 | [VSP + 8] = [VSP] + [VSP + 8] (0x7FF602CC3324) [VSP] = efl (0x206) |
AddStack() |
132 | VREGISTERS[0x98] = [VSP] (0x206) VSP += 0x8 |
PopStackToVRegister(BYTE vRegisterNumber) |
133 | VSP -= 0x8 [VSP] = VREGISTERS[0x70] (0x206) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
134 | VSP -= 0x8 [VSP] = VREGISTERS[0x60] (0x0) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
135 | VSP -= 0x8 [VSP] = VREGISTERS[0x8] (0xCCCCCCCC) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
136 | VSP -= 0x8 [VSP] = VREGISTERS[0x18] (0x0) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
137 | VSP -= 0x8 [VSP] = VREGISTERS[0x48] (dynamic stack addr.) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
138 | VSP -= 0x8 [VSP] = VREGISTERS[0xA8] (dynamic stack addr.) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
139 | VSP -= 0x8 [VSP] = VREGISTERS[0x0] (0x246) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
140 | VSP -= 0x8 [VSP] = VREGISTERS[0x10] (0x7FFE431CCF24) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
141 | VSP -= 0x8 [VSP] = VREGISTERS[0x28] (0x0) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
142 | VSP -= 0x8 [VSP] = VREGISTERS[0x78] (dynamic stack addr.) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
143 | VSP -= 0x8 [VSP] = VREGISTERS[0x30] (0x0) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
144 | VSP -= 0x8 [VSP] = VREGISTERS[0xB0] (0xFFFFFD7F) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
145 | VSP -= 0x8 [VSP] = VREGISTERS[0x38] (0x3A) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
146 | VSP -= 0x8 [VSP] = VREGISTERS[0xA0] (0x7FF602BB9BC0) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
147 | VSP -= 0x8 [VSP] = VREGISTERS[0x58] (dynamic stack addr.) if (VSP < (VREGISTERS + 0x140)): relocateVirtualRegisters() |
PushVRegisterToStack(BYTE vRegisterNumber) |
148 | rsp = VSP popAllRegisters() retn (jmp to 0x7FF602CC3324) |
vmExit() |
Keep in mind that there are a lot more virtual instructions than this implemented in VMProtect. This binary was fairly simple so it only used a couple of them, but you now know how to find more yourself. One of the most noticeable things I saw while stepping through the handlers was that there are tons of them doing the exact same thing. For example, the first 10 handlers all contain the same instructions (including the operand decryption transformations) yet their handlers are all in different locations. The handlers are eventually reused, but not immediately. For example, the handlers for PopStackToVRegister start getting reused after 11 cycles, so it seems like whatever algorithm VMProtect uses to generate handlers creates 11 duplicate handlers per behavior (as long as it is used inside of the VM at least 11 times). This is a fairly advanced form of code duplication and helps make the VM much more confusing.
Before I talk about the VM execution as a whole, I want to analyze the behavior of two specific handlers. The first interesting handler is AddStack, which seems like a normal handler at first, but it also does something slightly strange. While it does add the two values on the top of the stack, putting the result at [VSP + 8], it also puts the value of the eflags register into [VSP]. While this was strange at first, it seems to be the only viable way to get the eflags register into a virtual register (which is always done through a PopStackToVRegister instruction directly afterward) after the add instruction finishes executing. This register is only important for comparisons and control flow which we won't go into heavily in this writeup.
The other interesting handler I wanted to talk about is PushConstantOntoStack. If you look at the pseudo-code behavior that I wrote out for this handler, you may notice a small if check and a function underneath it called relocateVirtualRegisters. This isn't actually a function in VMProtect, but I labeled it as one due to the size of the pseudo-code for this routine. In short, it will relocate the virtual registers if the stack starts to become too large. If you recall from the virtual stack & register diagram from earlier in the write-up, the virtual register space sits directly on top of the virtual stack, meaning the virtual stack pointer can eventually enter the virtual register space if the virtual stack grows too large. The check runs anytime a handler pushes a value onto the virtual stack to make sure that this doesn't happen. If the virtual stack does start to intrude upon the virtual register space, the virtual stack will grow by 0x40 bytes and the virtual register space pointer will be moved up by 0x40 bytes. After this happens, a rep movsb instruction moves all of the virtual register values into their new space.
Multiple Virtual Machines
Now that we've analyzed some specific parts of the handlers, let's now analyze the VM as a whole. I was originally confused by the instruction trace as there seemed to be no calls to printf or any mention of the "Hello World" string, but I quickly found out why that was. The VM seems to exit whenever it needs to execute an x86 instruction with no virtual handler (such as rep stosd) or call a non-virtualized function. While this seems normal at first, it actually enters a brand-new VM when it is finished instead of returning to the same VM. This seems to be done intentionally by VMProtect; I believe that they may have re-entered the same VM in past versions of the virtualizer but changed it to confuse analysts. While this was disheartening at first, I found that the first VM was the largest out of all of them and that the other ones were fairly simple to analyze by just observing the virtual stack and registers as they execute. There is also a clear pattern to when the executable enters and exits the separate VM's so it was quite interesting to analyze.
All of these VM's have similar entry points, FDJ routines, decryption routines, and instruction handlers that we discussed with the original VM. The only variation that is immediately apparent between the VM's is the registers used for certain context variables inside the VM (VIP, VSP, self-modifying encryption key, etc.) and the direction of the bytecode execution (forward or backward, the original VM was backward). I've made a table below to highlight these differences.
VM # | Virtual Instruction Pointer | Virtual Stack Pointer | Virtual Register Pointer | Self-Modifying Encrypt Key | Handler Jump Base | Bytecode Execution Direction |
---|---|---|---|---|---|---|
1 | rsi | r10 | rsp | r9 | rdi | Backward |
2 | rbp | r9 | rsp | rdi | r11 | Backward |
3 | r11 | rsi | rsp | rbx | rbp | Backward |
4 | rdi | rsi | rsp | rbp | rbx | Forwards |
The most interesting concept out of all of this is the bytecode execution direction. While the first 3 VM's all have the strange backward bytecode, the fourth VM increments the virtual instruction pointer and has the instructions forward in memory (similar to Tigress). This confirms that this is randomized by VMProtect, which is fairly impressive on their part. Looking at the register usage, rsp is always used as the virtual register pointer, which makes sense considering how it's allocated. Other than that, it seems to be fairly randomized between the VM's. Any register that is not used for one of the context variables is used as a general-purpose register throughout the handlers.
Lifecycle
Now that we have a general understanding of the four separate VM's, we go back to analyzing the individual VM's themselves. The first thing to happen in all of the VM's is that all of the values on the stack (the physical registers that were all pushed right before we entered the VM) are put into virtual registers. This allows the VM to take in parameters through the registers as if it was a normal function. Using VM_1 as an example, instruction 0 puts a base address for future jump calculations (either to non-virtualized functions or to the next VM) into virtual register 0xB8. After this, instruction 17 puts the original return value for vm_entry (main+0x4A) into virtual register 0x20. If you remember from the simplified control graph of all the VM's, this value is used inside VM_4 for the final vmExit instruction.
After all of the physical register values are put into virtual registers, we start to get into the actual meat of the VM. This can vary quite heavily between the VM's, but some parts are present in multiple VM's. The first instructions to execute after the virtual register pushes are the fake address calculation instructions. If a VM has to calculate a jump address for the next VM/non-virtualized function/non-virtualized instruction, it will have a couple of these fake address calculations scattered throughout the VM. Using VM_1 as an example, instruction 19 pushes a constant (which is an RVA to some random location) onto the stack. After this, instruction 20 pushes the jump base address (which was put into virtual register 0xB8 by instruction 0) onto the stack. Then, instruction 21 adds both of these values together while instruction 22 & 23 puts the resulting values into virtual registers. We now have the full jump address inside the virtual register 0xC0. This register is used throughout VM_1 to store these fake address calculations. These addresses are never used once they are calculated and are overwritten shortly afterward by more fake calculations; I believe that this is intentional obfuscation by VMProtect. If we keep following these random address calculations, instructions 129, 130, and 131 follow the same address calculation process yet don't store the address in a virtual register afterward. Instead, it keeps it on the stack and uses it shortly after to jump to the next VM. This is a real jump address calculation and should not be ignored.
Other than the jump address calculations, there aren't many more similarities between the virtual instructions inside the separate VM's. While the fake address calculations aren't the only form of obfuscation/dead-store in the virtualized instructions, the other ones are fairly easy to spot and ignore. Everything else is fairly consistent with the original function code, so let's move on to when the VM exits. Using VM_1 as an example, the VM has to push values from the virtual registers back onto the stack before the vmExit instruction is executed (which pops the values off the stack back into their physical registers). This process happens in instructions 133-146, where the VM chooses which virtual registers it wants to restore into the physical registers. Through close analysis of the entry and exit points of the VM, I noticed that there are actually virtual registers that are directly linked with the physical registers. For example, at the entry point of the VM, the value in the physical register eax gets put into the virtual register 0x8. At the exit point of the VM, the value in the virtual register 0x8 gets put back into the physical register eax. This finding is consistent with all of the physical registers except for efi which receives the value of the virtual register from whenever the last time the efi register was saved (which happens when AddStack or a similar instruction is executed). This finding is interesting because it may ease the process of turning the virtualized bytecode back into x86. Even though some of the virtual registers are directly bound to a physical register, some of them are used for random purposes and temporary data storage, so there is definitely still some translation difficulty. After all of the virtual registers have been put back into their physical registers, a retn instruction is executed and the instruction pointer jumps to the next VM/non-virtualized function/non-virtualized instruction. The entire VM process repeats itself when entering the next VM, meaning the separate VM's can only pass values to each other through the stack and physical registers. This behavior would be very interesting to analyze on larger & more complex functions, but that's for another time.
Comparison Data
These charts can be found at the end of each write-up to allow you to easily compare the different the obfuscators I've analyzed.
General
🝰 | VMProtect 3 |
---|---|
Obfuscation Layer | Compiled Binary |
Architecture Type | Stack Machine, Randomized Instruction Set |
Context Variables | Stored in random physical registers. Virtual stack and virtual register space are allocated on the stack. Includes a virtual instruction pointer, a virtual stack pointer, a virtual register space pointer, a self-modifying encryption key, and a base address for handler jumps. |
Dispatch Type | None, jumps using static offsets from the separate opcode handlers |
Local Variables | Put into the virtual register space |
Virtual Stack | Grows negatively like a normal stack, gains an extra 0x40 bytes in size when it hits the virtual register space |
Static Obfuscations | Dead store code, opaque branching, jump obfuscation, code duplication; fairly easy to overlook |
Bytecode Encryption | Rolling encryption key that decrypts both the jump handler offset (opcode) and the operands |
External Functions | Exits the VM, calls the function, enters a new VM with a brand new instruction set and context |
Strengths and Weaknesses
Strengths | Weaknesses |
---|---|
Very Customizable, Simple to Use, Incredibly Unique Architecture, Compatible with Everything, Great Mitigations for Static Virtual Instruction Analysis and Hooking | Expensive, Weak Static Obfuscations, Simple Instruction Set, Tons of Signatures for Various Components of the VM, Correlation Between Virtual and Physical Registers |
Conclusion
In this write-up, we closely analyzed the architecture of the VMProtect virtual machines and virtual instructions. In the future, I'd love to revisit VMProtect by taking a look at the mutation feature and maybe some of the other protections it offers (memory protection, import protection, packing, etc). I also want to take a look at how the software handles more complex programs, conditional statements, and calls to other virtualized functions. Thanks for reading and I hope you learned something about the exceptionally complex virtualization-based obfuscator known as VMProtect.