The malware analyst’s guide to zlib compression

The malware analyst’s guide to zlib compression

Malware often utilizes data compression like zlib or aPLib. There are several reasons for this behavior: first, it saves space and makes binaries smaller and network transfers faster. Second, it adds another layer of obfuscation as the malware analyst needs to detect the compression algorithm first. One of the widely adopted data compression libraries in malware is zlib. For instance, malware families like GhostRat utilize zlib compression.

RFC1950 defines the ZLIB Compressed Data Format. zlib is an open-source platform-independent compression library. The current version is 1.2.11. It uses the deflate compression algorithm. Since the Deflate algorithm is out of the scope of this article, I can really recommend reading the Wikipedia article about this compression algorithm.

Spotting zlib in malicious code is often times very easy. Since many malware authors do not remove strings, the name of zlib‘s author (Marc Adler) and several error messages can be found in the binary:

  • “deflate 1.2.8 Copyright 1995-2013 Jean-loup Gailly and Mark Adler”
  • “inflate 1.2.8 Copyright 1995-2013 Mark Adler”
  • “incorrect header check”
  • “too many length or distance symbols”

Detect aPLib compression with your bare eyes

RFC1950 states that a zlib stream has the following structure:

  0   1          
|CMF|FLG|   (more-->)          

where CMF stands for Compression Method and flags and FLG stands for FLaGs. These two fields can have the following values:

  • 0x7801, which stands for no compression or low compression
  • 0x789C, which stands for default compression
  • 0x78DA, which stand for best compression

So, if you spot these values at the beginning of a data stream, then chances are high that you are dealing with zlib compression.

For instance, the following screenshot shows a zlib compressed PE file using best compression (level 9). Note the marked first two bytes 0x78DA.

zlib compression of PE file with best compression level 9

In the following sections, I’ll review several ways how zlib compressed blobs can be uncompressed. First, I’ll show how to decompress them on the command line. Next, I’ll show you how to do zlib decompression with Python.

zlib decompression on the command line

OpenSSL can decompress zlib compressed blobs. One of its Cipher commands is zlib. Using the flag -d allows us to decompress blobs. The following example decompresses a zlib compressed PE file with OpenSSL.

> file compressed.bin   
 compressed.bin: zlib compressed data
> hexdump compressed.bin | head -n 5
 0000000 da78 bdec 7809 c55b 30d5 f73c b16a d96c
 0000010 ec96 8e48 d89c 9289 c4d8 9b21 2ced 9024
 0000020 cb50 6c96 c889 9096 38e4 5366 5964 45b6
 0000030 4964 c968 5bc2 c04a d684 4375 8503 94be
 0000040 96d2 2db7 b42d 05b4 d65a 38b2 2584 2dd0
> openssl zlib -d < compressed.bin > decompressed.bin 
> file decompressed.bin 
 decompressed.bin: PE32+ executable (console) x86-64, for MS Windows
> hexdump decompressed.bin | head -n 5
 0000000 5a4d 0090 0003 0000 0004 0000 ffff 0000
 0000010 00b8 0000 0000 0000 0040 0000 0000 0000
 0000020 0000 0000 0000 0000 0000 0000 0000 0000
 0000030 0000 0000 0000 0000 0000 0000 0080 0000
 0000040 1f0e 0eba b400 cd09 b821 4c01 21cd 6854

Using openssl zlib < INPUT > OUTPUT on the other hand allows us to compress binary files on the command line.

zlib decompression with Python

First, there is an entire zlib module in the Python standard lib. That’s wonderful news since we can deal with all zlib compression/decompression issues in Python.

The following code snippet shows how to decompress in case a zlib header is present.

# if blob has a zlib header then the BLOB can be decompressed simply by calling
import zlib

However, a common obfuscation technique in malware is not using a zlib header. The following code snippet shows how to deal with such blobs.

# Source:
# "From my experience, most of the times the files are compressed using plain old Deflate.
# You can try using zlib to open them, starting from different offset to compensate for custom headers.
# Problem is, zlib itself adds its own header. In python (and I guess other implementations has that feature as well),
# you can pass to zlib.decompress -15 as the history buffer size (i.e. zlib.decompress(data,-15)),
# which cause it to decompress raw deflated data, without zlib's headers."
import zlib
Share: Twitter Facebook
Thomas Barabosch's Picture

About Thomas Barabosch

Thomas holds a PhD in computer science. He is passionately engaged in malware analysis, threat actor tracking, and bug hunting. Throughout the last years he has found numerous vulnerabilities in low-level software and participated in several botnet take-downs. In his blogs he tells techies and non-techies stories about his adventures in binary code wonderland.