The malware analyst’s guide to zlib compression

Malware often utilizes data compression like zlib or aPLib. There are several reasons for this behavior: first, it saves space and makes binaries smaller and network transfers faster. Second, it adds another layer of obfuscation as the malware analyst needs to detect the compression algorithm first. One of the widely adopted data compression libraries in malware is zlib. For instance, malware families like GhostRat utilize zlib compression.

RFC1950: ZLIB

RFC1950 defines the ZLIB Compressed Data Format. zlib is a open source platform independent compression library. The current version is 1.2.11. It uses the deflate compression algorithm. Since the Deflate algorithm is out of scope of this article, I can really recommend reading the Wikipedia article about this compression algorithm.

Detect if a binary statically links against zlib

Spotting zlib in malicious code is often times very easy. Since many malware authors do not remove strings, the name of zlib‘s author (Marc Adler) and several error messages can be found in the binary:

  • “deflate 1.2.8 Copyright 1995-2013 Jean-loup Gailly and Mark Adler”
  • “inflate 1.2.8 Copyright 1995-2013 Mark Adler”
  • “incorrect header check”
  • “too many length or distance symbols”

Detect aPLib compression with your bare eyes

RFC1950 states that a zlib stream has the following structure:

  0   1          
+---+---+          
|CMF|FLG|   (more-->)          
+---+---+

where CMF stands for Compression Method and flags and FLG stands for FLaGs. These two fields can have the following values:

  • 0x7801, which stands for no compression or low compression
  • 0x789C, which stands for default compression
  • 0x78DA, which stand for best compression

So, if you spot these values at the beginning of a data stream, then chances are high that you are dealing with zlib compression.

For instance, the following screenshot shows a zlib compressed PE file using best compression (level 9). Note the marked first two bytes 0x78DA.

zlib compression of PE file with best compression level 9
zlib compressed PE file with best compression level

zlib decompression

In the following sections, I’ll review several ways how zlib compressed blobs can be uncompressed. First, I’ll show how to decompress them on the command line. Next, I’ll show you how to do zlib decompression with Python.

zlib decompression on the command line

OpenSSL can decompress zlib compressed blobs. One of its Cipher commands is zlib. Using the flag -d allows us to decompress blobs. The following example decompresses a zlib compressed PE file with OpenSSL.

> file compressed.bin   
 compressed.bin: zlib compressed data
> hexdump compressed.bin | head -n 5
 0000000 da78 bdec 7809 c55b 30d5 f73c b16a d96c
 0000010 ec96 8e48 d89c 9289 c4d8 9b21 2ced 9024
 0000020 cb50 6c96 c889 9096 38e4 5366 5964 45b6
 0000030 4964 c968 5bc2 c04a d684 4375 8503 94be
 0000040 96d2 2db7 b42d 05b4 d65a 38b2 2584 2dd0
> openssl zlib -d < compressed.bin > decompressed.bin 
> file decompressed.bin 
 decompressed.bin: PE32+ executable (console) x86-64, for MS Windows
> hexdump decompressed.bin | head -n 5
 0000000 5a4d 0090 0003 0000 0004 0000 ffff 0000
 0000010 00b8 0000 0000 0000 0040 0000 0000 0000
 0000020 0000 0000 0000 0000 0000 0000 0000 0000
 0000030 0000 0000 0000 0000 0000 0000 0080 0000
 0000040 1f0e 0eba b400 cd09 b821 4c01 21cd 6854

Using openssl zlib < INPUT > OUTPUT on the other hand allows us to compress binary files on the command line.

zlib decompression with Python

First of it all, there is an entire zlib module in the Python standard lib. That’s wonderful news since we can deal with all zlib compression / decompression issues in Python.

The following code snippet shows how to decompress in case a zlib header is present.

# https://docs.python.org/2/library/zlib.html       
# if blob has a zlib header then the BLOB can be decompressed simply by calling
import zlib        
zlib.decompress(BLOB)       

However, a common obfuscation technique in malware is not using a zlib header. The following code snippet shows how to deal with such blobs.

# Source: http://stackoverflow.com/questions/561607/identifying-algorithms-in-binaries       
# "From my experience, most of the times the files are compressed using plain old Deflate.        
# You can try using zlib to open them, starting from different offset to compensate for custom headers.        
# Problem is, zlib itself adds its own header. In python (and I guess other implementations has that feature as well),        
# you can pass to zlib.decompress -15 as the history buffer size (i.e. zlib.decompress(data,-15)),        
# which cause it to decompress raw deflated data, without zlib's headers."       
import zlib
zlib.decompress(BLOB,-15)