Open the PDF file in an editor like nano, gedit or scite. Mind you, some of these might not display all encodings, so if it feels like something is missing try opening the file in multiple editors. Scite seems to be doing the best for this type of work.
Look around for clear-text scripts and base64 encoded blobs. Decode them to see what they are. This can be done with base64dump.py:
python base64dump.py /path/to/pdf/file
This will give you a list of all the blobs it’s found. Start with the biggest, and extract them to see what’s in them:
python base64dump.py -s <ID number> -S
To look for embedded shellcode:
python base64dump.py -e pu /path/to/pdf/file
python base64dump.py -e bu /path/to/pdf/file
python base64dump.py -e hex /path/to/pdf/file
If you find something, dump it as a binary with the -d and redirect output to a file.
Objects that indicate malicious behaviour/intent:
- /Launch (run external or embedded executable)
- /EmbeddedFiles (run external or embedded executable)
- /JS (embedded JavaScript)
- /JavaScript (embedded JavaScript)
- /XFA (embedded JavaScript)
- /RichMedia (Embedded Flash)
- /URI (request resource from site)
- /SubmitForm (Posts data to site)
Streams can contain malicious (or benign) content.
To get a quick list of objects in a PDF file:
python pdfid.py /path/to/file
If you spot any of the suspicious objects, run pdf-parser to search for JavaScript payloads or anything in your dirty words list:
python pdf-parser.py /path/to/file –search JavaScript
or
python pdf-parser.py /path/to/file –searchstream JavaScript
To see the entire object:
python pdf-parser.py /path/to/file –object 123
If you see an object with something suspicious, dump it and review it:
python pdf-parser.py /path/to/file –object 123 –filter –raw -d /path/to/output/file