How to analyze malicious PDF files

Open the PDF file in an editor like nano, gedit or scite. Mind you, some of these might not display all encodings, so if it feels like something is missing try opening the file in multiple editors. Scite seems to be doing the best for this type of work.

Look around for clear-text scripts and base64 encoded blobs. Decode them to see what they are. This can be done with

python /path/to/pdf/file

This will give you a list of all the blobs it’s found. Start with the biggest, and extract them to see what’s in them:

python -s <ID number> -S

To look for embedded shellcode:

python -e pu /path/to/pdf/file
python -e bu /path/to/pdf/file
python -e hex /path/to/pdf/file

If you find something, dump it as a binary with the -d and redirect output to a file.

Objects that indicate malicious behaviour/intent:

  • /Launch (run external or embedded executable)
  • /EmbeddedFiles (run external or embedded executable)
  • /JS (embedded JavaScript)
  • /JavaScript (embedded JavaScript)
  • /XFA (embedded JavaScript)
  • /RichMedia (Embedded Flash)
  • /URI (request resource from site)
  • /SubmitForm (Posts data to site)

Streams can contain malicious (or benign) content.

To get a quick list of objects in a PDF file:

python /path/to/file

If you spot any of the suspicious objects, run pdf-parser to search for JavaScript payloads or anything in your dirty words list:

python /path/to/file –search JavaScript


python /path/to/file –searchstream JavaScript

To see the entire object:

python /path/to/file –object 123

If you see an object with something suspicious, dump it and review it:

python /path/to/file –object 123 –filter –raw -d /path/to/output/file