Colin Mc Hugo

0 %
Colin Mc Hugo
Security Engineer Manager & CEO at Quantum Infinite Solutions Group Ltd.

Under the hood of Wslink’s multilayered virtual machine

April 17, 2022

ESET researchers describe the construction of the digital machine utilized in samples of Wslink and counsel a doable strategy to see by way of its obfuscation methods

ESET researchers lately described Wslink, a singular and beforehand undocumented malicious loader that runs as a server and that encompasses a virtual-machine-based obfuscator. There are not any code, performance or operational similarities that counsel that is more likely to be a device from a identified menace actor.

In our white paper, linked beneath, we describe the construction of the digital machine utilized in samples of Wslink and counsel a doable strategy to see by way of the obfuscation methods used within the analyzed samples. We display our strategy on chunks of code of the protected pattern. We weren’t motivated to completely deobfuscate the code, as a result of we found a non-obfuscated pattern.

Obfuscation methods are a form of software program safety supposed to make code arduous to grasp and therefore conceal its goals; obfuscating digital machine methods have develop into broadly misused for illicit functions corresponding to obfuscation of malware samples, since they hinder each evaluation and detection. The flexibility to research malicious code and subsequently enhance our detection capabilities is the driving pressure behind our motivation to beat these methods.

Virtualized Wslink samples don’t comprise any clear artifacts, corresponding to particular part names, that simply hyperlink it to a identified virtualization obfuscator. Throughout our analysis, we have been capable of efficiently design and implement a semiautomatic resolution able to considerably facilitating evaluation of the underlying program’s code.

This digital machine launched a various arsenal of obfuscation methods, which we have been capable of overcome to disclose part of the deobfuscated malicious code that we describe on this blogpost. Within the final sections of our white paper, we current components of the code we developed to facilitate our analysis.

Our white paper additionally gives an outline of the inner construction of digital machines on the whole, and introduces some vital phrases and frameworks utilized in our detailed evaluation of the Wslink digital machine.

In an earlier white paper, we described the construction of a customized digital machine, together with our methods to devirtualize the machine. That digital machine contained an attention-grabbing anti-disassembly trick, beforehand utilized by FinFisher – adware with intensive spying capabilities, corresponding to stay surveillance by way of webcams and microphones, keylogging, and exfiltration of information. We moreover offered an strategy for its deobfuscation.

This blogpost consists of excerpts from the Underneath the hood of Wslink’s multilayered digital machine white paper; we encourage everybody enthusiastic about digital machines and obfuscation methods to undergo the unique white paper, because it comprises detailed data on numerous steps required to see by way of the obfuscation methods utilized in Wslink.

Overview of digital machine constructions

Earlier than diving into the evaluation of Wslink’s digital machine (VM), we offer an outline of the inner construction of digital machines on the whole, describe identified approaches to cope with such obfuscation, and introduce some vital phrases and frameworks utilized in our detailed evaluation of the Wslink VM.

Common construction of digital machines

Digital machines might be divided into two foremost classes:

  1. System digital machines – assist execution of full working techniques (e.g., numerous VMWare merchandise, VirtualBox)
  2. Course of digital machines – execute particular person applications in an OS-independent surroundings (e.g., Java, the .NET Widespread Language Runtime)

Right here, we have an interest solely within the second class – course of digital machines – and we’ll briefly describe sure components of their inside anatomy needed to grasp the remainder of this paper.

Course of digital machines run as regular purposes on their host OSes, and in flip run applications whose code is saved as OS-independent bytecode (Determine 1) that represents a collection of directions – an utility – of a digital instruction set structure (ISA).

Determine 1. Illustration of bytecode, the place all opcodes and operands are digital

One may take into consideration bytecode as a form of intermediate illustration (IR); an summary illustration of code consisting of a particular instruction set that resembles meeting greater than a high-level language. It is usually generally known as intermediate language.

Using IR is handy by way of code reusability – when one wants so as to add assist for a brand new structure or CPU instruction set, it’s simpler to transform it to the IR as a substitute of writing all of the required algorithms once more. One other profit is that it could possibly simplify the applying of some optimization algorithms.

One can typically translate each high- and low-level languages into an IR. Translation of a higher-level language is called “decreasing”, and equally translation of a lower-level one, “lifting”.

The next instance lifts an meeting block bb0 right into a block with the pseudo-IR code irb0. All meeting directions are translated right into a set of IR operations and particular person operations in units don’t have an effect on one another, the place ZF stands for zero flag and CF for carry flag:

bb0:
  MOV R8, 0x05
  SUB AX, DX
  XCHG ECX, EDX
irb0:
  R8 = 0x05

  EAX[:0x10] = EAX[:0x10] – EDX[:0x10]
  ZF = EAX[:0x10] – EDX[:0x10] == 0x00
  CF = EAX[:0x10] < EDX[:0x10]
  …

  ECX = EDX
  EDX = ECX

Trendy course of VMs normally present a compiler that may decrease code written in a high-level language — one that’s straightforward to grasp and comfy to make use of – into the respective bytecode.

A VM’s ISA typically defines the supported directions, information sorts and registers, amongst different issues, that naturally should be carried out by a digital ISA as properly.

Directions include the next components:

  • opcodes – operation codes that specify an instruction
  • operands – parameters of the directions

ISAs usually use two well-known digital registers:

  • digital program counter (VPC) – a pointer to the present place within the bytecode
  • digital stack pointer – a pointer to pre-allocated digital stack area used internally by the VM

The digital stack pointer doesn’t need to be current in all VMs; it is not uncommon solely in a sure kind of VM – stack-based ones.

We’ll confer with the directions and their respective components of a digital ISA merely as digital directions, digital opcodes, and digital operands. We generally omit the specific use of “digital” when it’s apparent that we’re speaking in regards to the digital illustration.

An OS-dependent (Determine 2) executable file – interpreter – processes the provided bytecode and sequentially interprets the underlying digital directions thus executing the virtualized program.

Determine 2. Illustration of the connection between bytecode and the VM’s interpreter

Switch of management from one digital instruction to the subsequent throughout interpretation must be carried out by each VM. This course of is generally called dispatching. There are a number of documented dispatch methods corresponding to:

  • Swap Dispatch – the best dispatch mechanism the place digital directions are outlined as case clauses and a digital opcode is used because the check expression (Determine 3)
  • Direct Name Threading – digital directions are outlined as capabilities and digital opcodes comprise addresses of those capabilities
  • Direct Threading – digital directions are outlined as capabilities once more; nevertheless, compared to Direct Name Threading, addresses of the capabilities are saved in a desk and digital opcodes characterize offsets to this desk. Every perform ought to not directly name the next one in accordance with the specification (Determine 4)

The physique of a digital opcode within the interpreter’s code is normally known as a digital handler as a result of it defines the habits of the opcode and handles it when the digital program counter factors to a location within the bytecode that comprises a digital instruction with that opcode.

By context, concerning VMs, we imply a form of digital process context: every time a course of is faraway from entry to the processor throughout course of switching, ample data on its present working state – its context – should be saved such that when it’s once more scheduled to run on the processor, it could possibly resume its operation from an an identical place.

Determine 3. Illustration of Swap Dispatch, the place R0 is a digital register

Determine 4. Illustration of Direct Threading

Obfuscation methods are a form of software program safety supposed to make code arduous to grasp and therefore conceal its goals. Such methods have been initially developed to guard the mental property of authentic software program, e.g., to hamper reverse engineering.

Digital machines used as obfuscation engines are based mostly on course of digital machines, as described above. The first distinction is that they don’t seem to be supposed to run cross-platform purposes they usually normally take machine code compiled or assembled for a identified ISA, disassemble it, and translate that to their very own digital ISA. It is usually normally the case that the VM surroundings and the virtualized utility code are contained multi functional utility, whereas conventional course of VMs normally include a course of that runs as a standalone utility that masses separate, virtualized purposes.

The energy of this obfuscation approach resides in the truth that the ISA of the VM is unknown to any potential reverse engineer – an intensive evaluation of the VM, which might be very time-consuming, is required to grasp the which means of the digital directions and different constructions of the VM. Additional, if efficiency shouldn’t be a problem, the VM’s ISA might be designed to be arbitrarily complicated, slowing its execution of virtualized purposes, however making reverse engineering much more complicated. Understanding of the VM is important for decoding the bytecode and making the virtualized code comprehensible.

Context has a little bit of a distinct which means in regard to obfuscating digital machines: every time we wish to swap from the native to digital ISA or vice-versa, ample data – context – on the present working state should be saved in order that when the lSA must be switched again, execution can resume with solely the related information and registers modified.

Moreover, obfuscating VMs normally virtualize solely sure “attention-grabbing” capabilities – native context is mapped to the digital one and bytecode, representing the respective perform, is chosen beforehand. The built-in interpreter is invoked afterwards (Determine 5). Beginnings of the unique capabilities comprise code that prepares and executes the interpreter – entry of the VM (vm_entry); the remainder of their code is omitted in Determine 5.

Interpreter, bytecode, and digital ISA code with information of obfuscating VMs are sometimes all saved in a devoted part of the executable binary, together with the remainder of the partially virtualized program.

Determine 5 reveals the best way a perform, Perform 1, within the authentic utility focusing on a typical ISA, might be virtualized for an obfuscating VM’s ISA. It must be transformed into bytecode, for instance utilizing a generate_bytecode technique. Its physique is afterwards overwritten by a name into vm_entry and zeroes. The vm_entry perform chooses the respective bytecode, for instance, based mostly on the calling perform’s tackle, then conducts a context swap, and subsequent interprets the bytecode. Lastly, it returns to the code the place the virtualized perform, Perform 1, would return.

Determine 5. Overview of the virtualization course of

In VMs hosted on x86 architectures, such context switches normally include a collection of PUSH and POP directions. For instance:

PUSH EAX
PUSH EBX
PUSH ECX

MOV ECX, context_addr
POP DWORD PTR [ECX]
POP DWORD PTR [ECX + 4]
POP DWORD PTR [ECX + 8]

When the bytecode is absolutely processed, digital context is mapped again to native context and execution continues within the non-virtualized code; nevertheless, one other virtualized perform may very well be executed in the identical method, immediately.

Word that a number of context switches can happen in a single virtualized perform, for instance when a local instruction from the unique ISA couldn’t be translated to digital directions or an unknown perform from the native API must be executed.

Wslink’s digital machine entry – vm_entry

Let’s get to the evaluation of Wslink’s VM now. There are a number of perform calls that enter the VM, all of that are adopted by some gibberish information that IDA makes an attempt to disassemble – the info most probably simply overwrites the perform’s authentic code earlier than virtualization (Determine 6).

Determine 6. Entry level to the digital machine

The vm_entry of the VM:

  • calculates the precise base tackle by subtracting the anticipated relative digital tackle from the precise digital tackle of a spot within the code
  • unpacks code and information associated to the VM on the primary run; it makes use of the calculated base tackle to find out the placement of the packed VM and vacation spot of the unpacked information
  • executes an initialization perform – one of many vm_pre_init() capabilities to be described relies on the caller’s relative tackle that’s mapped to the respective vm_pre_init()

Packer

Wslink’s VM is full of NsPack to cut back the dimensions of the large executable file; further obfuscation might be only a facet impact. Similarities between Wslink’s unpacking code and ClamAV’s unspack() perform are clearly seen (Determine 7 and Determine 8). Word that Ghidra has optimized out calculation of the bottom tackle.

Determine 7. Part of vm_entry of the digital machine decompiled with Ghidra

Determine 8. Perform used to unpack NsPack in ClamAV

The vm_pre_init_dispatch_table in Determine 7 is the construction that maps callers’ addresses of the vm_entry to the respective vm_pre_init() capabilities which are to be described.

Digital machine initialization

Initialization of the VM consists of a number of steps, corresponding to saving values of the native registers on the stack and later shifting them to the digital context, relocation of its inside constructions, or preparation of bytecode. We cowl these steps extra completely within the following subsections.

vm_pre_init() capabilities

vm_pre_init() capabilities are meant solely to arrange parameters for one more stage of initialization (Determine 9). These capabilities name a single vm_init() perform (defined within the subsequent part) with particular parameters. The provided parameters are:

  • CPU flags, that are saved on the stack with a PUSHF instruction firstly of every perform
  • hardcoded offset to a digital instruction desk that represents the primary digital instruction to be executed (its opcode)
  • hardcoded tackle of the bytecode to be interpreted

Determine 9. Miasm’s symbolic execution of a vm_pre_init() exhibiting parameters provided to vm_init()

vm_init() perform

vm_init() pushes all of the native registers and the provided CPU flags from parameters (context) onto the stack. The native context will later be moved to the digital one which, as well as, holds a number of inside registers.

One of many inside registers determines whether or not one other occasion of the VM is already operating – there is just one world digital context and just one occasion of the VM can run at a time. Determine 10 reveals the a part of the code busy-waiting for the digital register, the place RBP comprises the tackle of the digital context and RBX the offset of the digital register – the inner register is saved in [RBX + RBP].

Your complete perform is summarized in Determine 11.

Determine 10. Busy-waiting for interpreter in vm_init()

The bytecode’s tackle, provided within the parameters, is added to the digital context together with the tackle of the digital instruction desk, which is hardcoded. Each have a devoted digital register.

The VM calculates the bottom tackle once more in the identical method as was described for vm_entry; as well as, it shops the tackle in one other inside register that’s used later, ought to an API be known as. Then the bottom tackle is used to relocate the instruction desk, its entries, and the bytecode’s tackle.

The calculated base tackle is just added to all of the perform addresses in the event that they haven’t already been relocated.

Determine 11. vm_init() abstract

Digital directions of the second digital machine

We begin by trying on the first few executed digital directions to look at the habits of the second VM after which attempt to course of the remainder of them in {a partially} automated method.

The diagram in Determine 12 highlights in blue the place the digital directions of the second VM are within the construction of the VMs.

Determine 12. Digital directions within the construction of the digital machines

The primary digital instruction

The primary digital instruction is, exceptionally, not obfuscated, as might be seen in Determine 13. Lastly, we are able to see some operations within the digital context.

By inspecting the modified reminiscence and calculated vacation spot tackle of the instruction, it’s clear that the instruction does three issues:

  • Zeroes out a digital 32-bit register at offset 0xB5 within the digital context (highlighted in grey in Determine 13), which is saved within the RBP register
  • A digital 64-bit register at offset 0x28 is elevated by 0x04: it’s the pointer to the bytecode – digital program counter. The dimensions of the digital instruction is therefore 4 bytes (highlighted in crimson in Determine 13).
  • The subsequent digital instruction is ready to be executed, the offset to the digital instruction desk – digital opcode – is fetched from the digital program counter. The digital instruction desk is at offset 0xA4 (highlighted in inexperienced in Determine 13). Which means the VM makes use of the Direct Threading Dispatch approach.

Determine 13. The preliminary digital instruction of the second VM

Word that the dimensions of the subsequent instruction’s opcode is just two bytes and the remaining phrase is left unused. We are able to see that it’s only a zero once we take a look at digital operands (Determine 14). Sizes of the opposite directions differ – it isn’t simply padding that preserves the identical measurement for all directions.

Determine 14. Bytecode of the digital instruction

The second digital instruction

The second digital instruction doesn’t do something particular; it simply zeroes out a number of digital registers and jumps to the subsequent instruction (Determine 15).

Determine 15. Vacation spot tackle and reminiscence modified by the second digital instruction

The third digital instruction

The third digital instruction shops the tackle of the stack pointer in a digital register (Determine 16); the offset of the register is set by one of many operands, and its offset is 0x0141 in our case.

Determine 16. Vacation spot tackle and reminiscence modified by the third digital instruction

The fourth digital instruction

The fourth instruction comprises two instantly seen anomalies compared with earlier directions – the stack pointer’s delta is decrease on the finish of the perform and it comprises a conditional department (Determine 17).

Determine 17. The conditional department and delta of the stack pointer of the fourth digital instruction

Symbolic execution of the primary block reveals {that a} worth is popped from the stack right into a digital register (Determine 18), which is smart because the values of the native registers stay on the stack after being saved there by vm2_init(). They’re now being moved to the digital context – the context swap is partially carried out by a lot of digital directions, every of which pops one worth off the stack into a distinct register.

Determine 18. Vacation spot tackle and reminiscence modified by the fourth digital instruction

The digital register, the place the worth of the native register is to be saved, is set by an operand and two different digital registers at offsets 0x0B and 0x70. Nonetheless, their preliminary worth is already identified: they have been set to zero by the second digital instruction (Determine 15), which signifies that we are able to calculate the offset of the register and simplify the expressions – they’re used simply to obfuscate the code.

Rolling decryption

Evaluation of different digital directions confirmed that the digital registers at offsets 0x0B and 0x70 are meant simply to encode operands. This method known as rolling decryption and it’s identified for use by the VMProtect obfuscator. Nonetheless, it’s the solely overlap with that obfuscator and we’re extremely assured that this VM is completely different.

The obfuscation approach is definitely one of many causes for the large variety of digital directions – use of the approach requires duplication of particular person directions since every makes use of a distinct key to decode the operands.

Simplification

The expressions might be simplified to the next once we apply the identified values of the digital registers:

IRDst = ([email protected][@64[RBP_init + 0x28] + 0x4] ^ 0x3038 == @16[@64[RBP_init + 0x28] + 0x6])?(0x7FEC91ABD1C,0x7FEC91ABCF6)

@64[RBP_init + {[email protected][@64[RBP_init + 0x28] + 0x4] ^ 0x3038, 0, 16, 0x0, 16, 64}] = @64[RSP_init]

Now allow us to check out the expression within the conditional block:

@64[RBP_init + {@16[@64[RBP_init + 0x28] + 0x6], 0, 16, 0x0, 16, 64}] = @64[RBP_init + {@16[@64[RBP_init + 0x28] + 0x6], 0, 16, 0x0, 16, 64}] + 0x8

We are able to now see that the digital instruction is certainly POP – it strikes a worth off the highest of the stack to a digital register, whose offset remains to be obfuscated with a easy XOR; it moreover will increase the stack pointer when the vacation spot register shouldn’t be the stack pointer.

As values within the bytecode are identified too, we are able to apply them and simplify the instruction even additional into the next remaining unconditional expressions:

IRDst = @64[@64[RBP_init + 0xA4] + 0x5A8]
@64[RBP_init + 0x28] = @64[RBP_init + 0x28] + 0x8
@64[RBP_init + 0x141] = @64[RBP_init + 0x141] + 0x8
@64[RBP_init + 0x12A] = @64[RSP_init]

Automating evaluation of the digital directions

As doing this for greater than 1000 directions can be very time consuming, we wrote a Python script with Miasm that collects this data for us so we are able to get a greater overview of what’s going on. We’re notably enthusiastic about modified reminiscence and vacation spot addresses.

Simply as within the fourth digital instruction, we’ll deal with sure digital registers as concrete values to retrieve clear expressions. These registers are devoted to the rolling decryption and carry out reminiscence accesses which are relative to the bytecode pointer, e.g., [] = [ + 0x05] ^ 0xABCD.

Subsequently we concretize the pointer to the digital instruction desk too and, by the top of the digital instruction: calculate addresses of the subsequent ones, clear the symbolic state, and begin with the next digital directions.

We moreover save apart reminiscence assignments that aren’t associated to the inner registers of the VM and regularly construct a graph based mostly on the digital program counter (Determine 19).

Determine 19. Name graph generated from reminiscence assignments and the VPC

We cease once we can’t unambiguously decide the subsequent digital directions to be executed; one can routinely course of a lot of the digital directions on this method.

Word that directions that includes complicated loops can’t be processed with certainty and must be addressed individually as a result of path explosion downside of symbolic execution, which is described for instance within the paper Demand-Driven Compositional Symbolic Execution: “Systematically executing symbolically all possible program paths doesn’t scale to giant applications. Certainly, the variety of possible paths might be exponential in this system measurement, and even infinite in presence of loops with unbounded variety of iterations.”

For different actions associated to digital directions and digital machine initialization, please seek the advice of the ESET Analysis white paper Underneath the hood of Wslink’s multilayered digital machine.

Conclusion

We’ve got described internals of a sophisticated multilayered digital machine featured in Wslink and efficiently designed and carried out a semiautomatic resolution able to considerably facilitating evaluation of this system’s code.

This digital machine launched a number of different obfuscation methods corresponding to junk code, encoding of digital operands, duplication of digital opcodes, opaque predicates, merging of digital directions, and a nested digital machine to additional hinder reverse engineering of the code that it protects, but we efficiently overcame all of them.

To cope with the obfuscation, we modified a identified approach that extracts the semantics of the digital opcodes utilizing symbolic execution with simplifying guidelines. Moreover, we made concrete the inner digital registers for obfuscation together with reminiscence accesses relative to the digital program counter to routinely apply identified values and de-obfuscate semantics of the digital directions – this moreover broke down boundaries between particular person digital directions.

Boundaries are needed to forestall path explosion of the symbolic execution; we’d lose monitor of the digital program counter – our place within the interpreted code – with out them.

We outlined new boundaries by symbolizing the tackle of the digital instruction desk, since it’s required to get the subsequent instruction, and concretized it solely once we wanted to maneuver to the next digital directions. We subsequently constructed a management circulate graph of the unique code in an intermediate illustration from one of many bytecode blocks based mostly on the digital program counter, and extracted deobfuscated semantics of particular person digital directions. We lastly prolonged the strategy to course of each digital machines directly by solely concretizing the nested one. Once more: for full particulars, see our white paper.

Posted in SecurityTags:
Write a comment