← Back to Blog
Thumbnail for CTF Forensics: Uncovering Hidden Flags in PDF Metadata

CTF Forensics: Uncovering Hidden Flags in PDF Metadata

November 17, 2025 at 5:24 AM
4 min read
#picoCTF#CTF#Decoding

How I solved a picoCTF challenge by extracting and decoding hidden data from PDF document properties


The Challenge

In this Capture The Flag (CTF) forensics challenge, we were given a PDF file called

confidential.pdf
with the simple instruction: "Find the flag." The flag format for picoCTF is typically
picoCTF{...}
.

Initial Approach

Understanding PDF Structure

PDF files are complex containers that can hide information in various ways:

  • Document metadata (author, title, subject, etc.)
  • Embedded files and attachments
  • Compressed streams
  • JavaScript code
  • Invisible layers and annotations

First Steps

I started with the most basic forensic tool available on any Linux system:

strings

strings confidential.pdf

The

strings
command extracts all human-readable text from binary files, which is perfect for initial reconnaissance on unknown files.

The Breakthrough

While the initial

strings
output was extensive, I focused on the most common hiding spots in CTF challenges - document metadata.

strings confidential.pdf | grep -i "subject\|author\|title\|creator\|producer"

This command revealed:

  • Producer: PyPDF2 (indicating the PDF was generated with a Python library)
  • Author:
    cGljb0NURntwdXp6bDNkX20zdGFkYXRhX2YwdW5kIV9jOGY5MWQ2OH0=

The Author field contained a long string of characters that immediately stood out due to:

  1. Its unusual length for a typical author name
  2. The character set (mixed case letters, numbers, +, /, and =)
  3. The
    =
    padding at the end - a classic indicator of base64 encoding

Decoding the Flag

Base64 is a common encoding method used in CTF challenges to obscure text while keeping it easily reversible. The decoding process was straightforward:

echo "cGljb0NURntwdXp6bDNkX20zdGFkYXRhX2YwdW5kIV9jOGY5MWQ2OH0=" | base64 -d

Result:

picoCTF{puzzl3d_m3tadata_f0und!_c8f91d68}

attached image command line interaction showin how i was writing commands. However, when working with linux; it is always better to see if you are sync on the real current time. This reduces issues with the utilities fetch errors. And lastly, remember to upgrade everyday. Alt Text

Why This Worked

Technical Analysis

  • PDF Metadata Fields are part of the PDF specification's document information dictionary
  • These fields are easily accessible but often overlooked
  • The PyPDF2 reference in the Producer field was a subtle hint about the file's origin
  • Base64 encoding makes the flag less obvious while remaining easily decodable

Common CTF Hiding Techniques

This challenge demonstrates several common CTF concepts:

  1. Data Obfuscation: Using base64 to hide plaintext
  2. Unconventional Storage: Using standard document fields in unexpected ways
  3. Forensic Mindset: Looking beyond the visible content of documents

Comprehensive PDF Analysis Methodology

For future reference, here's a complete methodology for analyzing suspicious PDFs:

Phase 1: Quick Reconnaissance

# Basic strings extraction
strings file.pdf

# Metadata examination
strings file.pdf | grep -i "author\|title\|subject\|creator\|producer"
exiftool file.pdf 2>/dev/null

# Common flag patterns
strings file.pdf | grep -E "flag{.*}|ctf{.*}|picoCTF{.*}"

Phase 2: Structural Analysis

# PDF object analysis
pdf-parser.py file.pdf 2>/dev/null

# Embedded file detection
pdfdetach -list file.pdf 2>/dev/null

# JavaScript content
strings file.pdf | grep -i "javascript"

Phase 3: Advanced Techniques

# Stream extraction and analysis
python3 -c "import PyPDF2; pdf=PyPDF2.PdfReader('file.pdf'); print(pdf.metadata)"

# Hex analysis for hidden data
xxd file.pdf | head -100

# Check for appended files
binwalk file.pdf

Key Takeaways

  1. Start Simple: Always begin with basic tools like
    strings
    and
    grep
  2. Check Metadata: Document properties are prime real estate for hidden data
  3. Recognize Encoding: Learn to identify common encodings (base64, hex, rot13, etc.)
  4. Context Matters: The "Producer: PyPDF2" was a hint about the file's creation method
  5. Systematic Approach: Follow a structured methodology to avoid missing clues

Tools Used

  • strings
    - Basic text extraction
  • grep
    - Pattern searching
  • base64
    - Encoding/decoding
  • Built-in Linux commands

This challenge perfectly illustrates how CTF competitions teach valuable digital forensics skills through practical, hands-on exercises. The techniques used here are directly applicable to real-world incident response and forensic investigations.

Flag:

picoCTF{puzzl3d_m3tadata_f0und!_c8f91d68}

Happy hacking! 🔍