Skip to content
Help center

How to make a scanned PDF searchable with OCR in your browser

Run OCR on a scanned PDF entirely in your browser to add an invisible text layer, so the pages stay pixel-identical but become selectable and searchable in any PDF reader.

Last updated

Make searchable runs optical character recognition (OCR) on a scanned PDF entirely in your browser and writes an invisible text layer underneath the page images — the pages stay pixel-identical, but the result is selectable and searchable in any PDF reader. The file is never uploaded.

Before you start

  • Make searchable is free and doesn't need an account.
  • Everything runs locally: the OCR engine (Tesseract, compiled to WebAssembly) executes inside your browser, so your PDF never leaves your device.
  • OCR is the heaviest tool AttachKit ships — expect roughly 5–60 seconds per page depending on page size, scan density, and your computer. One run is capped at 200 pages; split longer documents first with Pages.
  • If the PDF asks for a password to open, OCR can't read it. Remove the password first with Unlock (you need to know the password), then come back.
  • Know what language the document is written in. English is the default; 24 languages are available, from Arabic to Vietnamese, and your choice is remembered for next time.

Steps

  1. Open Make searchable.
  2. Drop your scanned PDF onto the drop zone, or click it to pick a file. No PDF handy? Use the Try with a sample link.
  3. Choose the OCR language. The first use of a language downloads about 5 MB of training data to your browser cache. English and Russian are served from AttachKit's own servers with zero third-party requests; other languages fetch their training data — plain data, not executable code — from a public CDN.
  4. Click Make searchable. A progress bar tracks the run page by page, with a time estimate once a few pages are done.
  5. Wait — or don't. You can click Cancel OCR at any time. Progress is checkpointed to encrypted storage on your device after every completed page, so you can come back later and pick up from the tool's home screen under Resume from where you left off.
  6. When the run finishes, the searchable PDF downloads automatically as yourfile-searchable.pdf, and a status line reports how many words were indexed.

The progress bar moves through these stages:

StageWhat's happening
Loading OCR engineThe Tesseract WASM worker and language data load (cached after first use).
Rendering pageThe current page is rasterized in your browser for recognition.
Recognising textTesseract reads the page image and locates every word.
DoneAll pages are finished and the output file is being built.

Result

You get a copy of your PDF named yourfile-searchable.pdf — the original file is untouched. Each recognized word is written underneath the page image as invisible text (PDF Text Rendering Mode 3, the same technique Acrobat uses for its searchable OCR output), which means:

  • The pages look exactly like the original scan.
  • You can select, copy, and Ctrl/Cmd-F search the text in any PDF reader.
  • Non-Latin scripts (Cyrillic, Chinese, Japanese, Korean, Arabic and more) work too — AttachKit embeds a Unicode font automatically when the recognized text needs it.

After the download, AttachKit offers to hand the result straight to Fill ("Fill in this form next?") — useful when the scan is a form you need to complete. OCR also unlocks other tools: Redact can only find and cover text it can read — if you drop a raw scan there, it offers to run this same OCR first, or you can make the scan searchable here and then redact the result.

  • Searchable troubleshooting — encrypted files, no text found, slow runs, the 200-page cap
  • Pages — split a long scan into chunks the 200-page OCR cap can handle
  • Unlock — remove an open password so the PDF can be OCR'd

Open the tool →

Related

Was this helpful?

Still stuck? Contact support →