Optical character recognition converts pixel-level images of typed or printed text into machine-readable character strings. This guide walks through the five-stage OCR pipeline, accuracy expectations on modern engines, and the practical difference between OCR-on and OCR-off scanning.
MFP scans the page at 300 dpi or higher, producing a raw bitmap of the document.
Image deskew, denoise, contrast adjustment, and binarisation prepare the page for character analysis.
Engine identifies columns, paragraphs, tables, and image regions to segment the page logically.
Neural network classifier matches each glyph against trained patterns, producing character predictions with confidence scores.
Recognised characters assemble into searchable PDF, Word document, or plain-text output preserving the original layout.
OCR is the technology that converts the pixel image of a scanned document into machine-readable text. A document scanned without OCR is functionally an image — searchable only by filename, copyable only by retyping, useful only for visual reading. The same document scanned with OCR becomes a fully indexed asset: searchable by any word it contains, copy-pasteable as text, parsable by downstream automation. The technology has matured substantially over the past decade and now sits invisibly inside every modern office MFP, accessible through a single setting choice at scan time.
The five-stage pipeline above describes what happens inside the MFP during an OCR-enabled scan. Each stage takes milliseconds; the full pipeline completes in 1 to 3 seconds per page on modern hardware. The result is delivered as a searchable PDF (PDF/A is the typical format for archival use), a Word document, an Excel spreadsheet for tabular content, or plain text depending on the destination workflow.
The performance cost of running OCR on scanned documents is negligible on modern MFP hardware — typically 1 to 3 seconds per page added to the scan cycle. The downstream benefit is substantial: every scanned document becomes searchable, indexable, and processable through automated workflows. The default-on configuration produces a small per-scan time cost in exchange for permanent retrievability of every document the office scans. The trade-off is overwhelmingly favourable for almost every office workflow.
For offices that have not deliberately enabled OCR on their MFP scan defaults, the simplest single configuration change is to flip the setting on. The benefit accrues from the first scan onward and compounds across the office's document archive as users discover that every scanned document is now full-text-searchable. The cluster's other articles cover how to enable searchable-PDF output specifically, how to scan directly to Word and Excel, and the trade-off between dedicated OCR software (ABBYY FineReader) and MFP-built-in engines.