This is all about malware analysis, reverse engineering and some cool stuff

Monday, 31 July 2017

Reverse Engineering of Python built executables


PyInstaller and py2exe bundles a Python application and all its dependencies into a executable file. The user can run the EXE file without installing a Python interpreter or any modules.
As we all know python is easy and effortless scripting language, so malware authors prefer python for writing malwares and convert it into exe file using py2exe or pyInstaller.

In this blog, I am going to explain you how to reverse those binaries and take out the python source code.

Case I :

Let's take this file d243ca34ec6a2f7995730747c6d73388 [VirusTotal][HybridAnalysis]
This file is compiled and built by py2exe.

How will you say it is generated by py2exe.
Ok, let's have a look at the resources of binary.

Fig 1 : Case-1 resources

You will find two sources in this binary, First one is "PYTHON27.DLL" which has embedded python.exe of 2.7 version and another one is "PYTHONSCRIPT" which is nothing but compiled version of python script.
PYTHONSCRIPT starts with the header of size 0x10 and first 8 bytes are magic number 12345678.

How do I get the source code??
Ok, first you have to dump PYTHONSCRIPT resource.
First 0x10 bytes are for header and remaining bytes are marshall or serialized data, so we have to unmarshall it.

For unmarshal, you can use below python code:

import marshal, imp
 
f=open('PYTHONSCRIPT','rb')
f.seek(17)  # Skip the header of size 0x10

ob=marshal.load(f)

for i in range(0,len(ob)):
    open(str(i)+'.pyc','wb').write(imp.get_magic() + '\0'*4 + marshal.dumps(ob[i]))
 
f.close()

This script will read PYTHONSCRIPT dump file, skip 0x10 bytes of header, unmarshall remaining data and save python compiled scripts (.pyc).

In this case, you will get 3 below python compiled scripts.
0.pyc
1.pyc
2.pyc

You got the pyc files, now you just need a python decompiler to get the source code.
You can download uncompyle6 decompiler from here and install it by running its setup.py file.
Or just install it by using pip from terminal or cmd.

pip install uncompyle6

After installation just run it with .pyc file and you will get the source code of final python file.

Fig 2 : decompile python script


Case II :

Now, look at this file 38d795517e7aab20e3fb80e52a30aa5f [VirusTotal][HybridAnalysis]
If you check the resources of this binary, you will not find "PYTHON27.DLL" or "PYTHONSCRIPT" resource.
So, you can say this binary is not built by py2exe.

Now look at the overlay of this binary.

Fig 3 : Overlay of case II binary

The overlay starts with this magic number 78DA63FE and remaining data is python modules encoded with Zlib.
From this magic number, you can say it is built by pyInstaller and contains python script.

To extract the python modules from executable we have one extractor tool pyinstxtractor.
Just run this file with executable binary file.


pyinstxtractor.py conversion_case2.exe

After extraction, you will get the files as shown in Fig4.
Fig 4 : Extracted modules of Case II

"out00-PYZ.pyz_extracted" contains python compiled scripts (pyc) of imported python modules.
We just need the main file, so here "conversion" is our main file.
It is .pyc file without magic header.
Magic number of .pyc file is 03F30D0A00000000 of length 8 bytes, you just need to prepend this magic number to "conversion" file and rename it to "conversion.pyc"

Now, you have .pyc file, you can use uncompyle6 decompiler as explained in Case-I to decompile .pyc
After decompilation, you will get the final python code (.py).


Case III:

Let's have a look at fe23fb462a9e6f730ee6e93daef27c5c [VirusTotal][HybridAnalysis]
This binary does not have resources like Case I, now let's see overlay data.

Fig 5 : Overlay of Case III binary

Here we have 78DA4D8E magic number, first two bytes are similar to Case II.
Yes, it is similar to Case II but difference is only with extracted modules.

Just extract this binary with pyinstxtractor as we did in Case II.
After extraction, you will the files as shown in Fig 6.

Fig 6 : Extracted modules of Case III binary

In Case II, "out00-PYZ.pyz_extracted" directory contains .pyc files of imported python modules but in this case you will files as shown in Fig 7.

Fig 7 : Imported modules in Case III

These modules are encrypted by AES with CFB mode.
If you open "pyimod00_crypto_key" shown in Fig 6, you will get the AES encryption key.

Fig 8 : AES encryption key

In this case, "ThisIsForFun" is the AES encryption key and First 8 bytes of each file are the Initial Vector (IV) of AES encryption.

So, If you need the python modules you can use below script to get it back.


from Crypto.Cipher import AES
import zlib

CRYPT_BLOCK_SIZE = 16

# key obtained from pyimod00_crypto_key
key = 'ThisIsForFun'

inf = open('_abcoll.pyc.encrypted', 'rb') # encrypted file input
outf = open('_abcoll.pyc', 'wb') # output file 

# Initialization vector
iv = inf.read(CRYPT_BLOCK_SIZE)

cipher = AES.new(key, AES.MODE_CFB, iv)

# Decrypt and decompress
plaintext = zlib.decompress(cipher.decrypt(inf.read()))

# Write pyc header
outf.write('\x03\xf3\x0d\x0a\0\0\0\0')

# Write decrypted data
outf.write(plaintext)

inf.close()
outf.close()

To get main python program, prepend the .pyc magic number 03F30D0A00000000  to "conversion" file shown in Fig6 and rename it to "conversion.pyc" and use uncompyle6 decompiler to decompile this file and get the source code.

Case IV:

In some cases, pyc file failed to decompile by decompiler because it has its bytecode manipulated to prevent it from being decompile it easily.
So, in this case we have to disassemble the .pyc file using python disassembler, deobfuscate it and then decompile.

I prefer to disassemble this kind of binaries and try to understand bytecode only.
bytecode is easy to understand, you will get this by using following python code.


import dis
dis.dis("compiled_python.pyc")

You can learn more about python byte code instructions at here.

That's it.
Thank you 😊


0 comments:

Post a Comment