Subscribe to Windows IT Pro
August 18, 2011 06:48 PM

PDF Malware Mitigation

Protect against a multitude of PDF vulnerabilities
Windows IT Pro
InstantDoc ID #139862
Rating: (2)

The PDF file format is very popular. This page-description language and the PDF reader applications that support it are designed to prevent arbitrary code execution. However, numerous vulnerabilities have been found in popular PDF readers and exploited by countless malicious PDF documents. In this article, I explain how malicious PDF documents can execute arbitrary code, as well as what you can do as an administrator to protect your users. Many of the mitigation techniques that I discuss also apply to other applications, such as Microsoft Office documents.

Figure 1 shows the PDL code for a very simple one-page PDF document with the text “Hello World.” I designed it to contain only the most essential elements that make up a PDF document and to use only ASCII characters, so that you can read the internals of the document.

A PDF document contains a tree structure of objects with all the instructions needed by the PDF reader to render the document’s pages. In our example, the root object is 1 (1 0 obj) and is found at absolute position 12. The root object refers to the collection of pages found in the PDF document (i.e., object 3). Our example document contains only one page, defined in object 4. The content of the page is defined in object 5; you can find the text Hello World between parentheses. (The other keywords define text properties, such as the font to be used and its location on the page.)

This PDF example is easy to understand, because it uses uncompressed text. Typically, PDF documents use compressed text and can’t be easily read without appropriate tools.

The PDF language and most PDF readers support JavaScript. Scripts can be embedded inside a PDF document and executed by the JavaScript engine of the PDF reader. This engine is restricted in its interaction with the OS. For example, there are no JavaScript statements or functions that allow arbitrary files to be read from or written to. JavaScript in PDF documents is often used in form processing, such as in order forms to calculate totals and sales tax.

So, how can malware authors create PDF documents that will infect systems? They do so by exploiting bugs (vulnerabilities) that they actively research in popular PDF reader applications, such as Adobe Reader. These vulnerabilities are often found in the PDF engine or in the JavaScript engine. Back in 2008, one such vulnerability was found in Adobe Reader in the JavaScript util.printf function. (Adobe patched this vulnerability, and it doesn’t exist in recent versions of Adobe Reader.)

Util.printf is a function that takes arguments and produces a formatted string according to the arguments passed to it. But when util.printf is passed some very specific arguments, a bug in the internal code of the util.printf function is triggered. When called with these arguments, the internal code of util.printf doesn’t behave as the programmers intended, because of a bug. Instead of formatting text and returning execution, the program flow makes the execution of the internal code jump outside the program, at an address where no code exists. When a Windows program tries to execute code that doesn’t exist, an error is generated. This error terminates the Adobe Reader process.

Passing program control to an arbitrary address in memory is the holy grail of malware authors and exploit writers. This is what they need to make applications vulnerable to execute their own code. Very skilled exploit writers can achieve total control of the address to which execution jumps. (This is called Extended Instruction Pointer—EIP—control; EIP is the CPU’s instruction pointer—that is, the register that points to the address in memory that contains executable code.) Exploit writers first place their own code at this address, then exploit the vulnerability so that program execution passes to this address.

However, it’s rare to find such exploits with total EIP control in malicious PDF documents in the wild. (Malware found “in the wild” is malware that’s spreading unrestricted on the Internet—not including proof-of-concept malware that isn’t spreading, or malware used in very targeted attacks.) What’s often found in the wild is PDF malware with exploits that achieve partial EIP control. Malware authors can build an exploit to jump to a particular address in memory, outside the normal program execution, but they can’t build an exploit to jump to an arbitrary address in memory. They use a heap spray technique in JavaScript to plant their malicious code in memory: They fill the vulnerable program’s dynamic memory (the heap) with malicious shellcode. Shellcode is a small program written in machine language that can execute correctly anywhere in memory.

Shellcode used in common malicious PDF documents is very small and typically does the following: It downloads an executable file from a web server on the Internet with an HTTP request, writes this file to the disk in the system32 folder, and executes the downloaded file. The shellcode has no real malicious payload; it’s simply a downloader program that downloads and executes the real Trojan from the Internet. (Downloading a Trojan from the Internet provides malware authors with more flexibility; they can change the Trojan on the web server after they release their malicious PDF document in the wild.) This Trojan is what ultimately infects your machine—for example, by making it a member of a botnet.

Related Content:

ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.