Introduction

Hello, today we’ll be looking at how we can reverse engineer a lazy importer with a real life example. I’ll be using Roblox’s Hyperion as a base to reverse engineer as their implementation is a little bit different to how they’re normally implemented.

Background information

To begin, we’ll need to know what a lazy importer really is and what its use case is. A lazy importer will essentially create small stubs of code that will find a specific function that you’re looking to call and allow you to invoke it while also masking it from the import descriptor. When you have access to the import table of a binary file, it makes life easier to reverse engineer that file because we’d be able to directly see all xrefs (cross references) and understand exactly what’s going on in a good handful of places - this is very helpful when working against some form of anti-cheat/anti-tamper because these always rely on windows api specific functions. Here’s an example of a populated imports descriptor and how it would look like when reverse engineering. image|581x500, 75% If we were to check any xref’s to any of these, it would help us understand the pseudocode easier, for example. image|581x500, 75% We can easily infer that v9 is a handle and that the function before it must return some form of handle. This is just a really minimal example.

Lazy Importer

I’m not going to go into absolute detail but essentially, the way a lazy importer works is it creates stubs that iterates through the PEB of the local process and finds the specific module, the specific module name will also be hashed with two keys. There’s something which I would call an init key and a multiplier. The encryption used is a form of a rolling XOR. Let’s talk about how this algorithm works.

Rolling XOR

A rolling XOR will have two keys, the init key (as i mentioned previously) and the multiplier, the init key is pretty simple, it’s used as the starting value within the algorithm. Here’s the simple transformation.

mov rcx, character
mov rax, last_key
xor rax, rcx
imul rax, multip_key

or…

(character ^ last_key) * multip_key

You might be wondering why there’s a “last_key” and not the init key being used here, it’s called a rolling xor - the result is actually built up off of the previous entry each time.

Traversing the PEB

Now that we’ve talked about the algorithm used for the hashes, we’ll talk about how it typically will start out when it comes to invoking a lazily imported function. It all begins with three simple instructions

mov rax, gs:60h
mov rax, [rax+18h]
add rax, 0x10

or …

std::uint64_t peb = __readgsqword(0x60);
std::uint64_t ldr = *reinterpret_cast<std::uint64_t*>(peb + 0x18);
std::uint64_t InLoadOrderModuleList_addr = ldr + 0x10;

What this is doing is, the first mov will get the PEB. Every process has its own PEB and you can just look at it as some form of descriptor. The second instruction will get the LDR entry from the PEB. The LDR entry essentially contains every loaded module within the process, this is how they’re iterating over each one. After these instructions, there’s some more instructions related to the actual lazy importer but I’ll cover these later on in the section covering a live example, for now we’ll move onto these next instructions;

mov r9, rax
mov r9, [r9]

or…

std::uint64_t entry = *reinterpret_cast<std::uint64_t*>(InLoadOrderModuleList_addr);

We directly dereference the address to get a doubly linked list of loaded modules, we’re going to cast these to _LDR_DATA_TABLE_ENTRY.

movzx   r10d, word ptr [r9+58h]

or…

std::uint16_t unicode_name_length = *reinterpret_cast<std::uint16_t*>(entry + 0x58);

Preparing the DLL

The entry + 0x58 address will lead to the start of a unicode string for the BaseDllName, by dereferencing it, we actually get the length of the name.

shr     r10d, 1
mov     r11, [r9+60h]

or…

// Get the character length of the unicode string
// shift right by 1 is the same as dividing by two
unicode_name_length /= 2
std::uint64_t name_bytes = *reinterpret_cast<std::uint64_t*>(entry + 0x60);

What’s happening here is, we’re getting the actual length of the character’s in the string. Unicode strings length are essentially nm_of_characters * 2 since the Length field stores the amt of bytes. The name bytes is exactly as it says, it gets the pointer to the actual bytes of the dll name. Here’s where it starts to get more interesting though.

mov     ebx, r10d
and     ebx, 0FFFFFFFEh

or…

std::uint32_t pair_len = unicode_name_length - (unicode_name_length % 2);

What this is actually doing is, it splits the string into pairs of two. There’s some assembly before this which actually handles if it’s a odd sized string, it essentially still handles all the pairs but has an extra last case for the last character that was left out.

The hasher

Now, onto the final part of the hashing - the algorithm…

; First byte
movzx   r14d, word ptr [r11+rdi*2]
lea     r15d, [r14-41h]
mov     r12d, r14d
or      r12d, 20h
cmp     r15w, 1Ah
cmovnb  r12d, r14d
movzx   r14d, r12b
xor     r14, rsi
imul    r14, r8

; Second byte
movzx   esi, word ptr [r11+rdi*2+2]
lea     r15d, [rsi-41h]
mov     r12d, esi
or      r12d, 20h
cmp     r15w, 1Ah
cmovnb  r12d, esi
movzx   esi, r12b
xor     rsi, r14
imul    rsi, r8

add     rdi, 2

or…

std::uint64_t lazy_deporter::apply_character(std::uint8_t current_char,
                                             const std::uint64_t key,
                                             const std::uint64_t multip_magic,
                                             const bool requires_upper) const {
  // Ensure lowercase character, do not encrypt as uppercase if it's a module
  // name
  if (std::isupper(current_char) && requires_upper)
    current_char = std::tolower(current_char);
  /*
  mov reg1, key
  xor reg1, reg2
  imul reg1, multip_magic
  */
  return (current_char ^ key) * multip_magic;
}

std::expected<std::uint64_t, std::string>
lazy_deporter::get_hash(const char *entry, lazy_keys keys, bool is_mod) const {
  // Create local copy of current output, at start this is defaulted to the init
  // magic
  std::uint64_t encrypted_return = keys.lazy_init_key;
  // Hyperion splits the input string into streams of two characters, so we'll
  // do the same here. We need to ensure that we get the last tuple index.
  const std::size_t len = strlen(entry),
                    tuple_len = strlen(entry) - (strlen(entry) & 1);
  // Iterate through each pair
  std::size_t idx = 0;
  while (idx != tuple_len) {
    // Encrypt first character
    encrypted_return = this->apply_character(entry[idx], encrypted_return,
                                             keys.lazy_multip, is_mod);
    // Encrypt second character
    encrypted_return = this->apply_character(entry[idx + 1], encrypted_return,
                                             keys.lazy_multip, is_mod);
    // Move onto next tuple
    idx += 2;
  }

  // Check if odd sized string, if so we need to handle the last char that was
  // left out
  if (len & 1) {
    encrypted_return = this->apply_character(entry[len - 1], encrypted_return,
                                             keys.lazy_multip, is_mod);
  }

  // Return encrypted magic to user
  return encrypted_return;
}

That’s pretty much it, it essentially applies the rolling xor algorithm to each character in every pair, if it’s a module name it’ll enforce a lowercase input only which you can see happening in the assembly. The function hashing is exactly the same except it doesn’t enforce lowercase.

Functions

As said before, functions are basically identical except they’re retrieved from the export table of the module. I’ll save you the boring details but, after it matches the hash it will get the export directory of the dll. Once it does this, it’ll iterate through all the exported functions of the module, once it matches the function hash to an export, and directly invoke the function from the base + rva to the function found through the export table. This happens differently within Hyperion, hyperion will actually obfuscate the function address and deobfuscate it immediately after and invoke it. The obfuscation is built up off of some trivial byte transformation on EVERY byte within the address - it uses some sort of rng value in the transformation.

mov     rax, [rbp+940h+var_C0]
inc     al
and     al, 3
movzx   esi, al
movzx   r15d, byte ptr [rbp+940h+var_138]
xor     r15b, r8b
not     r15b
rol     r15b, 1
mov     r12d, r8d
shr     r12d, 8
xor     r12b, byte ptr [rbp+940h+var_140]
not     r12b
rol     r12b, 1
lea     r9, unk_7FF820F0E4B4
mov     [r9+rsi*8+1Ch], r15b
mov     [r9+rsi*8+1Dh], r12b
mov     r13d, r8d
shr     r13d, 10h
xor     r13b, byte ptr [rbp+940h+var_198]
not     r13b
rol     r13b, 1
mov     [r9+rsi*8+1Eh], r13b
mov     eax, r8d
shr     eax, 18h
xor     al, byte ptr [rbp+940h+var_1B0]
not     al
rol     al, 1
mov     [r9+rsi*8+1Fh], al
mov     r14, r8
shr     r14, 20h
movzx   ebx, byte ptr [rbp+940h+var_6E8]
xor     r14b, bl
not     r14b
rol     r14b, 1
mov     rcx, r8
shr     rcx, 28h
movzx   r11d, byte ptr [rbp+940h+var_B8]
xor     cl, r11b
not     cl
rol     cl, 1
mov     [r9+rsi*8+20h], r14b
mov     [r9+rsi*8+21h], cl
mov     rdx, r8
shr     rdx, 30h
movzx   edi, byte ptr [rbp+940h+var_80]
xor     dl, dil
not     dl
rol     dl, 1
mov     [r9+rsi*8+22h], dl
shr     r8, 38h
movzx   r10d, byte ptr [rbp+940h+var_88]
xor     r8b, r10b
not     r8b
rol     r8b, 1
mov     [r9+rsi*8+23h], r8b
mov     cs:byte_7FF820F0E4C0, sil
not     r15b
ror     r15b, 1
xor     r15b, byte ptr [rbp+940h+var_138]
not     r12b
ror     r12b, 1
xor     r12b, byte ptr [rbp+940h+var_140]
not     r13b
ror     r13b, 1
xor     r13b, byte ptr [rbp+940h+var_198]
not     al
ror     al, 1
xor     al, byte ptr [rbp+940h+var_1B0]
not     r14b
ror     r14b, 1
xor     r14b, bl
not     cl
ror     cl, 1
xor     cl, r11b
not     dl
ror     dl, 1
xor     dl, dil
not     r8b
ror     r8b, 1
xor     r8b, r10b

or…

std::uint64_t obfuscate_address(std::uint64_t func_addr,
                                std::uint8_t *xor_bytes,
                                std::uint8_t *rng_table) {
  std::uint8_t current_index = (rng_table[0xC] + 1) & 3;

  std::uint8_t obfuscated_bytes[8];

  for (int idx = 0; idx < 8; idx++) {
    std::uint8_t byte = (reinterpret_cast<std::uint8_t *>(&func_addr))[idx];
    byte ^= xor_bytes[idx];
    byte = ~byte;
    byte = (byte << 1) | (byte >> 7);

    obfuscated_bytes[idx] = byte;
    rng_table[current_index * 8 + 0x1c + idx] = byte;
  }

  rng_table[0xC] = current_index;

  std::uint64_t deobfuscated = 0;
  for (int idx = 0; idx < 8; idx++) {
    std::uint8_t byte = obfuscated_bytes[idx];

    byte = ~byte;
    byte = (byte >> 1) | (byte << 7);
    byte ^= xor_bytes[idx];

    (reinterpret_cast<std::uint8_t *>(&deobfuscated))[idx] = byte;
  }
  return deobfuscated;
}

This is, of course, directly invoked immediately after.

Hyperion Lazy Import overview

Everything we’ve currently discussed so far applies to hyperion, the only major difference is that they actually setup multiple different key sets, typically only one key set is used within a lazy importer but they’ve gone out of there way to add multiple - each set is different and are used to, most likely, mask all functions so that even if you find one, it won’t match the other.

Conclusion

I hope this helped you understand how a Lazy Importer is used and how we can reverse engineer it - Hyperion’s implementation is a really good public use case of this because it utilises multiple different keys as we talked about, they also obfuscate any return too. You can find a public repo dumping all of the lazily imported functions (If any are missing, it’s most likely that the pattern isn’t capturing all references but the core logic should be fine).

Github Repo