A searchable PDF lets you find any word in a scanned document with Ctrl+F. Without OCR, the same document is a stack of images that requires reading page by page. Modern office MFPs include OCR at the scan stage, embedding searchable text into the PDF without a second processing step. The setting is hidden in different menus on different brands.
Optical Character Recognition (OCR) analyses scanned images and identifies the characters, words and sentences they contain. The recognised text is then stored as an invisible layer behind the original image. The PDF still looks identical to the scan, but a text search now finds every occurrence of a word, and the text can be copied and pasted into other documents.
OCR accuracy on modern office MFPs runs around 95 to 99% on clean printed text. Handwriting, low resolution scans, or skewed pages reduce accuracy. The recognised text is searchable but may not be 100% correct for citation purposes; treat OCR output as a search index, not a perfect transcription.
Most office MFPs default to Copy mode. Switch to Scan to Email or Scan to Folder. Both destinations support OCR processing.
The naming varies by brand. Canon labels it "Searchable PDF". Ricoh uses "OCR PDF". Xerox uses "Text PDF". If the option includes "OCR" or "Searchable", it produces the right output.
Most office MFPs sold in Spain default to Spanish OCR. For documents in English, French, German or Catalan, change the OCR language to match. Mixed language documents work but with reduced accuracy.
OCR accuracy degrades below 300 DPI. Most office MFPs default to 200 DPI for scan, which is enough for image reproduction but too low for reliable OCR. Bump to 300 DPI for any document where OCR matters.
Skewed pages reduce OCR accuracy because the engine struggles with diagonal text. Skew correction straightens automatically. Blank page removal trims empty backs of single sided pages so the OCR engine does not spend cycles on nothing.
Open the resulting PDF and try Ctrl+F to search for a word visible in the document. If the search returns results, OCR is working. If it returns nothing, the OCR step did not run; check the file format setting.
A PDF can contain text in three states: invisible OCR text behind images (searchable), visible printed text (searchable), or only images with no text layer (not searchable). The visual appearance is identical across the three. Only the text search reveals which state applies.
A scan saved as PDF without OCR is image only and not searchable. A scan saved as Searchable PDF or processed by Adobe Acrobat's OCR adds the text layer. For routine office work, scan with OCR at the device rather than processing after the fact.
| Setting | For OCR accuracy | For file size |
|---|---|---|
| Resolution | 300 DPI minimum, 400 for poor originals | 300 DPI; higher inflates file size |
| Colour mode | Black and white for text only documents | BW produces smallest files |
| Compression | Medium | High; balance against OCR accuracy |
| Skew correction | On | No file size impact |
| Blank page removal | On | Reduces file size by 5 to 10% |
Three document types resist OCR processing.
OCR is calibrated for printed text. Handwriting accuracy runs around 60 to 80% even on neat samples. Hand written notes, signatures and informal markings remain visible in the scan but the OCR text layer for those sections is unreliable.
Faded carbon copies, old fax printouts, and pencil writing on cream paper produce low contrast scans that the OCR engine struggles to parse. Pre processing (increasing contrast at the scan stage) helps but does not fully solve the problem.
Multi column layouts, sidebars, footnotes and tables produce OCR text that may not preserve the original reading order. The search remains functional, but copy paste of OCR text may produce scrambled paragraph order.
Office MFP OCR scanning suits documents up to roughly 500 pages per session. The OCR processing time adds 0.5 to 1 second per page, so a 100 page document scans and OCRs in around 3 to 4 minutes. Larger volumes work but tie up the device for extended periods; consider scheduling overnight runs for large archive batches.
PDF/A is an archival format with specific embedding rules for fonts and metadata. Many office MFPs offer Searchable PDF and PDF/A as separate options. PDF/A includes OCR text plus the archival constraints. For routine office work, Searchable PDF suffices; for long term archival aligned with ISO 19005 standards, PDF/A is the correct choice.