What is a PDF file?
Today, most scanned paper documents are stored in digital image (rather than text) format. These scans are usually saved as PDFs (Portable Document Format), a universal file format for document exchange. PDF files can viewed, navigated, and printed on most computers and operating systems by anyone using free Adobe Acrobat Reader™ or other similar software.
Simple and Searchable PDF Files
PDF files make it possible to search, annotate, publish, and archive historical information in a digital environment. However, PDF files come in two format types: Simple PDF files and Searchable PDF files.
Simple PDF files do not have searchable text and each page is saved as an image. These files are often used to record artwork, handwritten manuscripts, and light text documents.
Searchable PDF files add a text layer beneath the image. The file retains the look of the original page while enabling users to search the text contained in the document. This is made possible through Optical Character Recognition (OCR). Use the Search feature of the application to find words, phrases, or other strings of text or characters. Like a word processing program, you can highlight text, copy it, and paste text into another document.
The OCR process provides text accuracy of 97 to 99 percent on clean documents set in modern type. However, when scanning old, often torn or stained historical documents with faded (or crooked) type, the rate of accuracy is much lower. This often creates problems when pasting text into another document, and text may appear garbled or incomplete.