CTF Forensics: Uncovering Hidden Flags in PDF Metadata
How I solved a picoCTF challenge by extracting and decoding hidden data from PDF document properties
The Challenge
In this Capture The Flag (CTF) forensics challenge, we were given a PDF file called
confidential.pdf with the simple instruction: "Find the flag." The flag format for picoCTF is typically picoCTF{...}.
Initial Approach
Understanding PDF Structure
PDF files are complex containers that can hide information in various ways:
- Document metadata (author, title, subject, etc.)
- Embedded files and attachments
- Compressed streams
- JavaScript code
- Invisible layers and annotations
First Steps
I started with the most basic forensic tool available on any Linux system: strings
strings confidential.pdf
The
strings command extracts all human-readable text from binary files, which is perfect for initial reconnaissance on unknown files.
The Breakthrough
While the initial
strings output was extensive, I focused on the most common hiding spots in CTF challenges - document metadata.
strings confidential.pdf | grep -i "subject\|author\|title\|creator\|producer"
This command revealed:
- Producer: PyPDF2 (indicating the PDF was generated with a Python library)
- Author:
cGljb0NURntwdXp6bDNkX20zdGFkYXRhX2YwdW5kIV9jOGY5MWQ2OH0=
The Author field contained a long string of characters that immediately stood out due to:
- Its unusual length for a typical author name
- The character set (mixed case letters, numbers, +, /, and =)
- The
padding at the end - a classic indicator of base64 encoding=
Decoding the Flag
Base64 is a common encoding method used in CTF challenges to obscure text while keeping it easily reversible. The decoding process was straightforward:
echo "cGljb0NURntwdXp6bDNkX20zdGFkYXRhX2YwdW5kIV9jOGY5MWQ2OH0=" | base64 -d
Result:
picoCTF{puzzl3d_m3tadata_f0und!_c8f91d68}
attached image command line interaction showin how i was writing commands. However, when working with linux; it is always better to see if you are sync on the real current time. This reduces issues with the utilities fetch errors. And lastly, remember to upgrade everyday.
Why This Worked
Technical Analysis
- PDF Metadata Fields are part of the PDF specification's document information dictionary
- These fields are easily accessible but often overlooked
- The PyPDF2 reference in the Producer field was a subtle hint about the file's origin
- Base64 encoding makes the flag less obvious while remaining easily decodable
Common CTF Hiding Techniques
This challenge demonstrates several common CTF concepts:
- Data Obfuscation: Using base64 to hide plaintext
- Unconventional Storage: Using standard document fields in unexpected ways
- Forensic Mindset: Looking beyond the visible content of documents
Comprehensive PDF Analysis Methodology
For future reference, here's a complete methodology for analyzing suspicious PDFs:
Phase 1: Quick Reconnaissance
# Basic strings extraction strings file.pdf # Metadata examination strings file.pdf | grep -i "author\|title\|subject\|creator\|producer" exiftool file.pdf 2>/dev/null # Common flag patterns strings file.pdf | grep -E "flag{.*}|ctf{.*}|picoCTF{.*}"
Phase 2: Structural Analysis
# PDF object analysis pdf-parser.py file.pdf 2>/dev/null # Embedded file detection pdfdetach -list file.pdf 2>/dev/null # JavaScript content strings file.pdf | grep -i "javascript"
Phase 3: Advanced Techniques
# Stream extraction and analysis python3 -c "import PyPDF2; pdf=PyPDF2.PdfReader('file.pdf'); print(pdf.metadata)" # Hex analysis for hidden data xxd file.pdf | head -100 # Check for appended files binwalk file.pdf
Key Takeaways
- Start Simple: Always begin with basic tools like
andstringsgrep - Check Metadata: Document properties are prime real estate for hidden data
- Recognize Encoding: Learn to identify common encodings (base64, hex, rot13, etc.)
- Context Matters: The "Producer: PyPDF2" was a hint about the file's creation method
- Systematic Approach: Follow a structured methodology to avoid missing clues
Tools Used
- Basic text extractionstrings
- Pattern searchinggrep
- Encoding/decodingbase64- Built-in Linux commands
This challenge perfectly illustrates how CTF competitions teach valuable digital forensics skills through practical, hands-on exercises. The techniques used here are directly applicable to real-world incident response and forensic investigations.
Flag:
picoCTF{puzzl3d_m3tadata_f0und!_c8f91d68}
Happy hacking! 🔍