Subscribe to Windows IT Pro

 

Get Newsletters

  • Get the Latest News
  • Product Updates
  • Helpful Tricks
  • Productivity Tips

Subscribe Now!

August 06, 2009 12:00 AM

Tool Time: Export PDF Text with Pdftotext

Windows IT Pro
InstantDoc ID #102437
Rating: (0)

If you occasionally need to export text from PDF files, pdftotext might be a handy addition to your personal toolbox. Part of Foo Labs' free Xpdf package, pdftotext is a command-line tool that automates the export process.

Using pdftotext is straightforward. If you want to export the text from a file named vmware.pdf, you can use pdftotext like this

pdftotext vmware.pdf

This command automatically creates a new file named vmware.txt in the same folder as vmware.pdf. Where possible, pdftotext will remove embedded hyphenation and line breaks. If you also want to remove physical page breaks embedded in the PDF file, you can add the -nopgbrk option:

pdftotext vmware.pdf -nopgbrk

To send the text output to the screen instead of a file, you include the - parameter at the end of the command:

pdftotext vmware.pdf -

You can use multiple parameters together as well:

pdftotext vmware.pdf -nopgbrk -

Pdftotext works only with actual text, so you won't be able to export images or scanned text that hasn't had optical character recognition (OCR) performed on it. However, it works extremely well in its specific niche.

The Xpdf package contains several other tools that can be useful for manipulating PDF files. Pdftoppm and pdftops convert PDF files to the Portable Pixel Map (PPM) or PostScript format, respectively. Pdfimages extracts all images from a PDF file, pdfinfo returns general PDF metadata, and pdffonts diagnoses font-related problems with PDF files. If you work with PDF files and like command-line tools, xpdf is well worth checking out.

Related Content:

ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

White Papers

Get your Windows 7 deployment off to the right start by implementing PC lockdown. A locked-down environment is easier and cheaper to support since users are less likely to make unnecessary changes to the core system configuration - read more here!

Essential Guides

Is your iSCSI "lossy"? The reality is that most off-the-shelf Ethernet hardware deployed for iSCSI can lose packets, resulting in slow performance or application downtime. Learn how to assess your current iSCSI infrastructure and engineer an advanced iSCSI SAN infrastructure.

Web Seminars

What's the best way to keep your network safe from malware? In this web seminar, security expert Greg Shields suggests an alternative method to the traditional blacklisting approach that is common with anti-virus and anti-malware solutions.

eLearning Series

We bring the experts direct to you to share their real-world perspective and expertise. During each event, three sessions stream in real time, so you can learn, ask questions, and get solutions.
Upcoming event: Getting the Most with Exchange 2010 with Paul Robichaux

Subscribe to Windows IT Pro!

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.