Learn how to fix PE magic numbers with Malduck

Malware often corrupts the Portable Executable (PE) header to hinder its analysis. By overwriting parts of the PE header, malware evades simple memory dumpers and thwarts proper loading by analysis tools. If we’re lucky then malware only overwrites the magic numbers of the PE header (MZ and PE) and leaves the rest of the header intact. We can fix such corrupted PE headers with ease. All we need is a little bit of knowledge about the PE format and the right tool to manipulate memory dumps.

First, we’ll learn how to identify such corrupted PE headers quickly with a hexeditor. Next, we’ll see how to manipulate memory dumps with Malduck. This section gives you a head start on how to use this great Python module effectively. Finally, we are putting it all together and write a script to fix PE magic numbers of a corrupted PE header.

Identifying overwritten magic numbers

In the following, I assume that you’ve got a basic understanding of the PE format. If not, I can recommend the article on osdev.org (check the overview graphic of the PE format) as an introduction.

There are two magic numbers in the PE header that are frequently overwritten by malware. First, the magic number of the DOS header (_IMAGE_DOS_HEADER), which is a two-byte or WORD constant (MZ). Second, the magic number of the PE header, which is a four-byte or DWORD constant (PE\x00\x00). This is illustrated in the following screenshot of a PE file opened in a hexeditor. Both magic numbers are colored in orange. The MZ magic is at offset 0x0 and the PE\x00\x00 is at offset 0x80.

Regular PE header with magic numbers (MZ + PE) highlighted.
Principal magic numbers of a PE file (orange)

Now that we’ve seen a perfectly fine PE header, let’s see how a slightly corrupted PE header looks like. The next screenshot shows a PE header with both (principal) magic numbers overwritten.

PE header with overwritten magic numbers.
Overwritten magic numbers (MZ + PE)

We can still identify that this is likely a PE file since the famous string This program cannot be run in DOS mode is still there. But we are missing these two magic numbers, which in turn hinders many analysis tools to properly load and analyze such a binary.

So, fixing this kind of corrupted PE header is straightforward. First, we have to restore the magic number MZ at offset 0x0. Second, we have to determine the offset to the PE header. The DOS header holds this offset in its field e_lfanew (see next code block with DOS header for reference). Therefore, we have to read the value of e_lfanew from the DOS header. e_lfanew resides at offset 0x3c. Third, we have to restore the PE header magic number PE\x00\x00 at the offset pointed to by e_lfanew. Finally, we should validate, if the file is now a valid PE file.

typedef struct _IMAGE_DOS_HEADER {                
    WORD  e_magic;      /* 00: MZ Header signature */       
    WORD  e_cblp;       /* 02: Bytes on last page of file */       
    WORD  e_cp;         /* 04: Pages in file */       
    WORD  e_crlc;       /* 06: Relocations */       
    WORD  e_cparhdr;    /* 08: Size of header in paragraphs */       
    WORD  e_minalloc;   /* 0a: Minimum extra paragraphs needed */       
    WORD  e_maxalloc;   /* 0c: Maximum extra paragraphs needed */       
    WORD  e_ss;         /* 0e: Initial (relative) SS value */       
    WORD  e_sp;         /* 10: Initial SP value */       
    WORD  e_csum;       /* 12: Checksum */        
    WORD  e_ip;         /* 14: Initial IP value */           
    WORD  e_cs;         /* 16: Initial (relative) CS value */          
    WORD  e_lfarlc;     /* 18: File address of relocation table */       
    WORD  e_ovno;       /* 1a: Overlay number */       
    WORD  e_res[4];     /* 1c: Reserved words */       
    WORD  e_oemid;      /* 24: OEM identifier (for e_oeminfo) */       
    WORD  e_oeminfo;    /* 26: OEM information; e_oemid specific */       
    WORD  e_res2[10];   /* 28: Reserved words */        
    DWORD e_lfanew;     /* 3c: Offset to extended header */       
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

Note that if you encounter a memory dump that starts with around 1000 / 0x400 zero bytes, followed by properly aligned code, then you are likely out of luck. What you are looking at is likely a completely overwritten PE header. Here I’ve encountered two scenarios throughout the years. First, the malware unpacks a perfectly fine PE file and overwrites the PE header, for instance, before injecting it into another process. If this is the case then go back to debugging and dump the PE file before the header is overwritten.

Second, the malware unpacks a PE file with an already corrupted PE header. In this case, you have to restore the PE header. If it is really necessary then you can try to build a PE file from scratch with LIEF. However, this is out of the scope of this blog post.

Manipulating memory dumps with Malduck

Malduck is a Python module that helps writing malware analysis scripts quickly. It is developed and maintained by CERT.pl. Malduck’s documentation is very decent. It is my default go-to tool to write malware analysis scripts (e.g. for aPLib decompression).

Open a memory dump with Malduck

Before we can manipulate memory dumps (of PE files), we have to open them with Malduck. The basis for all memory representations is the class Malduck.procmem (alias for malduck.procmem.procmem.ProcessMemory). The constructor takes three parameters:

  • buf: a buffer of the memory contents (as among others Python bytes)
  • base (optional): the base address of the memory dump, which defaults to 0x0
  • regions (optional): a list of regions of the memory dump, defaults to None (not relevant in the following since we’ll work with PE dumps)

We can get the length of a ProcessMemory instance with the method length and close it with close.

Apart from methods for reading and writing (see next sections), there are methods for searching (findmz, findp, findv, regexp, regexv), YARA scanning (yarap, yarav), and disassembling (disasmv). Note that methods may be suffixed with either v (virtual) or p (physical). Methods suffixed with v work on virtual addresses and methods suffixed with p work on physical addresses (raw offsets in the memory dump). The methods p2v and v2p translate from physical addresses to virtual and vice versa.

Based on malduck.procmem.procmem.ProcessMemory, there are four more memory representations in Malduck:

Since this blog post covers PE files, we will work with ProcessMemoryPE (alias procmempe) in the following. The constructor differs slightly from the constructor of ProcessMemory. The first two parameters are still buf and base. However, it does not take the parameter regions but takes two more parameters:

  • image (optional): indicate that this is a dump of a memory-mapped PE file, which defaults to False
  • detect_image (optional): heuristically detects if is a memory-mapped PE file, which defaults to False

A useful method of ProcessMemoryPE is is_valid, which checks if the imagebase of the memory dump points to a valid PE header.

I encourage you to read the documentation to find other hidden gems. There are further methods like extract that tries to extract a malware configuration from the memory dump. See Malduck’s static configuration extractor engine for more information.

Read ProcessMemoryPE

Malduck supports two ways to read from a ProcessMemory instance. First, it allows reading raw data chunks with readp, readv, and readv_until. While readp takes as input a raw offset and an optional length, readv takes as input a virtual addr. The method readv_until is useful when you want to read until a certain stop marker (e.g. end of configuration).

Second, Malduck supports reading various data types:

  • strings: asciiz and utf16z
  • signed integers: int8p/int8v, int16p/int16v, int32p/int32v, int64p /int64v
  • unsigned integers: uint8p/uint8v, uint16p/uint16v, uint32p/uint32v, uint64p /uint64v

Note that there is always a physical (p) and virtual (v) version. Internally, all utilize either readp or readv to read the data.

Write ProcessMemoryPE

The support for writing ProcessMemory instances is rudimentary when compared with the reading support. There are just two functions to know: patchp and patchv. Both accept a raw offset / virtual addr and a bytes buf. That’s it, pretty straight forward!

Putting it all together: fix PE magic numbers with Malduck

This section puts it all together: our theoretical knowledge about the PE format and our practical knowledge about memory dump manipulation with Malduck. The script fix_pe_magic_numbers.py takes a path to a dump of PE file with a corrupted header as input and outputs a fixed dump.

First, it loads the dump into a buffer data and opens it with malduck.procmempe (alias of malduck.procmem.procmempe.ProcessMemoryPE) in line 11. The method is_valid checks if a ProcessMemoryPE object is a valid PE file (line 13). Next, it patches the MZ magic number with the method patchp (line 17) and reads the DOS header field e_lfanew with the method uint32p. Again, e_lfanew resides at offset 0x3C. Afterwards, it patches the PE\x00\x00 magic number with the method patchp (line 24). Finally, it validates the PE file with the method is_valid (line 26). If it is valid, then it writes all bytes of the ProcessMemoryPE object to a file (line 29).

import sys
import malduck

def main(argv):
   if len(argv) != 2:
   print('Usage: fix_pe_magic_numbers.py PATH_TO_DUMP')     
        return

    with open(argv[1], 'rb') as f:     
        data = f.read()
        pe = malduck.procmempe(buf=data)     

        if pe.is_valid():         
            print('This file is already a valid PE file. Skipping...')         
            return     

        pe.patchp(0, b'MZ')     
        lfanew = pe.uint32p(0x3c)     
        if lfanew > len(data):         
            print('Bogus lfanew value ({hex(lfanew)}). Bailing out...')         
            return     
        
        print(f'lfanew: {hex(lfanew)}')     
        pe.patchp(lfanew, b'PE\x00\x00')     
        
        if pe.is_valid():         
            print('Fixed file successfully. Dumping to new file...')         
            with open(argv[1].replace('bin', '') + '_fixed_header.bin', 'wb') as g:
         g.write(pe.readp(0))         
                print('Done.')     
        else:         
            print('Could not fix file, still not a valid PE file.')     
        
        pe.close()

if name == 'main':
    main(sys.argv)

Let’s see how it works with our broken memory dump from the beginning:

> file memdump.bin
memdump.bin: data
> python fix_pe_magic_numbers.py memdump.bin
 lfanew: 0x80
 Fixed file successfully. Dumping to new file…
 Done.
> file memdump_fixed_header.bin
memdump_fixed_header.bin: PE32+ executable (console) x86-64, for MS Windows

et voilà! The script fixed the memory dump as we can see in the following screenshot:

Fix PE magic numbers with malduck.
Fixed PE header with MZ and PE magic numbers restored

Both magic numbers are located at their correct offsets: the magic number of the DOS header MZ resides at offset zero and the magic number of the File header resides at offset 0x80 (as indicated by e_lfanew). Now you can load the PE file with other analysis tools and continue your analysis.