How to scan a PDF for fakes and AI-generated content
Step-by-step guide to scanning any PDF for metadata anomalies, suspicious producer fingerprints, and AI-generation tells — free, entirely in your browser, with no upload.
Last updated
Scan checks any PDF for metadata anomalies, suspicious producer fingerprints, and statistical signs of AI-generated content. The analysis runs entirely in your browser — the file is never uploaded anywhere.
Before you start
- Scan is free and needs no account — there is no plan gate on this tool.
- The drop zone accepts files up to 100 MB. For anything larger, split out the pages you need with Pages first.
- Password-protected PDFs can't be analyzed: the metadata Scan reads lives inside an encrypted stream. Remove the password first — Unlock can do this in your browser if you know the password — then re-scan.
- Know what you're getting: these checks are heuristics, not cryptographic proof. A motivated forger can spoof every field Scan reads by re-saving the file through Adobe Acrobat. Scan catches lazy fakes — AI-generated lease agreements, photoshopped pay stubs, recycled bank statements — and gives you a structured report to make a human judgment call.
Steps
- Open Scan.
- Drag your PDF onto the Drop a PDF here zone, or click Choose PDF and pick a file. An Analyzing… status appears while the report is computed — usually a second or two.
- Read the verdict banner at the top of the report. It shows one of three states plus a Suspicion score from 0 to 100 — higher means more suspicious. The score is a weighted tally of the signals below, not a probability.
- Review the signal rows under the banner. Each row has a severity icon, a one-line explanation of why the signal matters, and the raw value (for example, the exact producer string) when there is one.
- Click Scan another to reset and check a different file.
| Verdict banner | What it means |
|---|---|
| No automated red flags | No warning- or high-level signals fired. The document was probably authored by the tool its metadata claims — this does not rule out tampering by a motivated forger. |
| Suspicious — worth a closer look | At least one warning-level signal. Consider asking the source for the original file, the authoring tool, or signed provenance. |
| Likely fake or AI-generated | At least one strong tell, such as a known AI-generator name in the producer string. Treat the contents as unverified. |
What the signals check
- Producer fingerprint — the PDF's
/Producerstring is classified in order of precedence. Known AI and LLM tool names (ChatPDF, ChatGPT, GPT, Claude, Gemini, Copilot, Perplexity) are a high-severity tell. Generic AI-vendor fragments (OpenAI, Anthropic, Mistral, Llama, DeepSeek, Groq, xAI, phrases like "AI-generated") also score high, even when wrapped around a familiar tool name. Known online re-processors (iLovePDF, Smallpdf, PDF24, Sejda, PDF2Go and similar) raise a warning, because the document has been re-rendered at least once and the original layout or signatures may have been altered. Well-known authoring tools (Microsoft, Adobe Acrobat, LiveCycle Designer, pdfTeX, Ghostscript, LibreOffice, macOS Quartz and others) count as a good sign. Anything unrecognized is shown as informational, not suspicious. - Missing metadata — a missing producer string is a warning, since real PDFs almost always carry one. If the metadata simply lives in the modern XMP stream instead of the legacy /Info dictionary — common for InDesign and recent Acrobat exports — Scan notes that as informational instead.
- Creation and modification dates — a missing creation date is a warning, and a modification date more than a year after creation is a warning when the producer is unknown. Both downgrade to informational when the producer is a known authoring tool, because old templates re-saved years later are normal: the IRS W-9 template dates back to 1996 and is re-issued every year.
- Document structure — page count, plus Creator, Author, and Subject metadata for cross-reference. Fillable AcroForm form fields are a good sign: AI text generators rarely produce them.
- Embedded images — Scan extracts embedded JPEG images (the first 6 are analyzed) and measures color diversity and pixel-noise smoothness. Real photographs carry sensor grain; AI-generated images cluster at unnaturally low color counts and unnaturally smooth gradients. An image must trip both tells at once — and in multi-image documents, at least half of the analyzed images must agree — before the report raises it, which keeps false alarms low on photo-heavy real documents.
Scoring: each high-severity signal adds 40 points, each warning adds 18, each good sign subtracts 5, and the total is clamped to 0–100. Any high-severity signal makes the verdict Likely fake or AI-generated; otherwise one or more warnings make it Suspicious — worth a closer look.
Result
You get a structured, explainable report — and because nothing is stored and no bytes leave your browser, you can safely scan an applicant's pay stub or a counterparty's invoice before acting on it. Remember the limit: a clean verdict means no automated red flags, not proof of authenticity. For verifiable authenticity, ask the source to sign the document with Sign and check the signature yourself at Verify.
Related
- Verify a signed PDF — cryptographic certainty instead of heuristics, for AttachKit-signed documents.
- Unlock a PDF — remove a password (when you know it) so Scan can read the metadata.
- Pages — split an over-100 MB file before scanning.
Related
Still stuck? Contact support →