Removing metadata from documents: Word, PDF, and photos
A document contains more than you see. Beyond the visible content, Word files, PDFs, photos, and other files automatically store hidden information: who created it, when, on which device, and sometimes even deleted text still present in the file structure.
Removing metadata from documents
A document contains more than you see. Beyond the visible content, Word files, PDFs, photos, and other files automatically store hidden information: who created it, when, on which device, and sometimes even deleted text still present in the file structure.
This isn’t theoretical. In 2003, the British government could identify the author of an Iraq dossier from the metadata history of the Word file. In 2010, Anonymous members were traced via GPS coordinates in photos they shared online.
Who this guide is for
This guide is mainly for readers who share files in contexts where hidden metadata can create real professional, legal, or personal risk.
It fits especially:
- journalists and sources handling sensitive documents
- lawyers, healthcare professionals, and other professional contexts with confidentiality obligations
- anyone sharing photos or files where location, authorship, or device data should not leak
For ordinary casual sharing, not every file needs forensic-grade treatment. This guide becomes most important when the file itself may move beyond your control after you send it.
What you gain, and what it costs
If you remove metadata properly, you usually gain:
- less accidental leakage of authorship, device, time, and location data
- a safer file-sharing baseline in sensitive contexts
- fewer hidden clues that undermine anonymity or confidentiality
But it costs something:
- some extra time before sharing
- occasional tooling friction, especially across document formats
- the need to verify results instead of assuming one cleanup step did everything
That is usually a reasonable trade when files may be redistributed or scrutinized. It becomes overkill only when you treat every normal everyday file as a covert document instead of matching the effort to the real risk.
What is metadata?
Metadata is information about a file, not the content of the file itself.
In Word and Office files:
- Author name and initials (from your Windows/Mac account settings)
- Organisation name
- Edit timestamps — who changed what and when
- Fast save residue — deleted text fragments still in the file
- Comments and tracked changes, even after “accepting”
- Name of the template used
In PDF files:
- Creator (which software generated the PDF)
- Author and organisation
- Creation and modification date
- Revision history if generated from Word
- Sometimes: embedded hidden layers, invisible text
In photos (EXIF data):
- GPS coordinates — exact location where the photo was taken
- Camera or phone make and model
- Date and time of capture
- Camera settings (shutter speed, ISO, focal length)
How to check what’s in a file
Before removing, it’s useful to know what’s actually there.
Online (for non-sensitive files): Jeffrey’s EXIF Viewer (for photos): upload a photo, see all EXIF data.
Locally — ExifTool:
exiftool filename.pdfexiftool photo.jpgexiftool document.docx
This shows all metadata the file contains.
Removing metadata
Word and Office (.docx, .xlsx, .pptx)
Method 1 — Built into Word:
Go to File → Info → Check for Issues → Inspect Document.
Enable all categories you want to remove:
- Comments, revisions, versions, annotations
- Document properties and personal information
- Hidden text
- Embedded documents
Click Inspect, then Remove All per category.
Important: Save the document again after inspecting. Keep a copy of the original if you still need the revision history yourself.
Method 2 — ExifTool (command line):
exiftool -all= document.docx
This removes all metadata in one step. The original is renamed to document.docx_original.
Method 3 — Export as PDF: When sharing a final version, export from Word to PDF via File → Export → PDF/XPS. In the export dialog: click Options and uncheck “Document properties”. This gives more control over what is and isn’t included.
PDF files
ExifTool:
exiftool -all= document.pdf
Important: ExifTool does not permanently remove PDF metadata. The data is hidden, but it can technically be restored with exiftool -pdf-update:all=. Use MAT2 for real removal, or combine ExifTool with qpdf for a deeper cleanup.
MAT2 (Metadata Anonymisation Toolkit):
MAT2 is an open-source tool specifically for metadata removal. Broader than ExifTool for Office formats, and usable from the command line.
Install on Debian/Ubuntu:
sudo apt install mat2
Use:
mat2 document.pdfmat2 photo.jpgmat2 document.docx
MAT2 creates a clean file alongside the original (document.cleaned.pdf). The original is left untouched.
Verify the result:
mat2 --check document.pdf
Photos
ExifTool — remove everything:
exiftool -all= photo.jpg
ExifTool — remove GPS only:
exiftool -gps:all= photo.jpg
ExifTool — multiple photos at once:
exiftool -all= *.jpg
MAT2:
mat2 photo.jpg
On phone (iOS): Settings → Privacy & Security → Location Services → Camera → Never. This prevents new photos from containing GPS.
For existing photos there are two methods:
- Remove permanently (iOS 17+): open the Photos app → open the photo → swipe up or tap the info icon → tap Adjust next to the location name → select No Location. This removes the location from the photo itself.
- Remove for one share action: tap Share → tap Options at the top of the share sheet → turn Location off. This removes the location only for that specific share action.
Or use an app like Metapho (removing location during sharing is free; editing features require a subscription — check current price in the App Store).
On phone (Android): In Google Photos → Photo → Three dots → Info → location pencil icon → delete. Or use a file manager with an EXIF editor.
Printer steganography — a different kind of metadata
Colour laser printers from most major manufacturers (HP, Canon, Xerox, Brother) print invisible yellow microdots on every page. This pattern encodes the printer’s serial number and the print date.
This cannot be removed via software — it’s in the printer’s firmware. Relevant if you print documents that must not be traceable to your printer (and therefore your location or organisation):
- Print on a public printer (library, print shop)
- Use a black-and-white printer — steganography only exists in colour printers
- Scan an already-printed document back in as PDF instead of printing directly
When is this relevant?
Always:
- Photos you share via social media or email (GPS data)
- Documents you send to journalists, lawyers, or other sensitive recipients
Required for:
- Journalists protecting sources — every document that gets forwarded
- Lawyers sharing privileged information
- Whistleblowers submitting documents anonymously — use SecureDrop
Useful in:
- Business contexts where author name must not appear externally
- Research situations where your identity must not be in the file
Tools summary
| Tool | Formats | Platform | Use |
|---|---|---|---|
| ExifTool | Everything | Linux, Mac, Windows | Command line, most powerful |
| MAT2 | PDF, Office, images, audio, video, and more | Linux | Command line |
| Word Document Inspector | .docx, .xlsx, .pptx | Windows, Mac | Built-in, no install |
| Metapho | Photos | iOS | App Store, location removal free / edits via subscription |
Next step
Go further
- SecureDrop guide — submit documents or source material anonymously
Profiles
- Profile: journalist or activist — why source protection starts with metadata
- Profile: lawyer or politician — metadata risks for professional privilege