Thierry Mukiza - Digital Portfolio

How I solved a picoCTF challenge by extracting and decoding hidden data from PDF document properties

The Challenge

In this Capture The Flag (CTF) forensics challenge, we were given a PDF file called

confidential.pdf

with the simple instruction: "Find the flag." The flag format for picoCTF is typically

picoCTF{...}

Initial Approach

Understanding PDF Structure

PDF files are complex containers that can hide information in various ways:

Document metadata (author, title, subject, etc.)
Embedded files and attachments
Compressed streams
JavaScript code
Invisible layers and annotations

First Steps

I started with the most basic forensic tool available on any Linux system:

strings

strings confidential.pdf

The

strings

command extracts all human-readable text from binary files, which is perfect for initial reconnaissance on unknown files.

The Breakthrough

While the initial

strings

output was extensive, I focused on the most common hiding spots in CTF challenges - document metadata.

strings confidential.pdf | grep -i "subject\|author\|title\|creator\|producer"

This command revealed:

Producer: PyPDF2 (indicating the PDF was generated with a Python library)

Author:

cGljb0NURntwdXp6bDNkX20zdGFkYXRhX2YwdW5kIV9jOGY5MWQ2OH0=

The Author field contained a long string of characters that immediately stood out due to:

Its unusual length for a typical author name
The character set (mixed case letters, numbers, +, /, and =)
The
```
=
```
padding at the end - a classic indicator of base64 encoding

Decoding the Flag

Base64 is a common encoding method used in CTF challenges to obscure text while keeping it easily reversible. The decoding process was straightforward:

echo "cGljb0NURntwdXp6bDNkX20zdGFkYXRhX2YwdW5kIV9jOGY5MWQ2OH0=" | base64 -d

Result:

picoCTF{puzzl3d_m3tadata_f0und!_c8f91d68}

attached image command line interaction showin how i was writing commands. However, when working with linux; it is always better to see if you are sync on the real current time. This reduces issues with the utilities fetch errors. And lastly, remember to upgrade everyday.

Why This Worked

Technical Analysis

PDF Metadata Fields are part of the PDF specification's document information dictionary
These fields are easily accessible but often overlooked
The PyPDF2 reference in the Producer field was a subtle hint about the file's origin
Base64 encoding makes the flag less obvious while remaining easily decodable

Common CTF Hiding Techniques

This challenge demonstrates several common CTF concepts:

Data Obfuscation: Using base64 to hide plaintext
Unconventional Storage: Using standard document fields in unexpected ways
Forensic Mindset: Looking beyond the visible content of documents

Comprehensive PDF Analysis Methodology

For future reference, here's a complete methodology for analyzing suspicious PDFs:

Phase 1: Quick Reconnaissance

# Basic strings extraction
strings file.pdf

# Metadata examination
strings file.pdf | grep -i "author\|title\|subject\|creator\|producer"
exiftool file.pdf 2>/dev/null

# Common flag patterns
strings file.pdf | grep -E "flag{.*}|ctf{.*}|picoCTF{.*}"

Phase 2: Structural Analysis

# PDF object analysis
pdf-parser.py file.pdf 2>/dev/null

# Embedded file detection
pdfdetach -list file.pdf 2>/dev/null

# JavaScript content
strings file.pdf | grep -i "javascript"

Phase 3: Advanced Techniques

# Stream extraction and analysis
python3 -c "import PyPDF2; pdf=PyPDF2.PdfReader('file.pdf'); print(pdf.metadata)"

# Hex analysis for hidden data
xxd file.pdf | head -100

# Check for appended files
binwalk file.pdf

Key Takeaways

Start Simple: Always begin with basic tools like
```
strings
```
and
```
grep
```
Check Metadata: Document properties are prime real estate for hidden data
Recognize Encoding: Learn to identify common encodings (base64, hex, rot13, etc.)
Context Matters: The "Producer: PyPDF2" was a hint about the file's creation method
Systematic Approach: Follow a structured methodology to avoid missing clues

Tools Used

```
strings
```
- Basic text extraction
```
grep
```
- Pattern searching
```
base64
```
- Encoding/decoding
Built-in Linux commands

This challenge perfectly illustrates how CTF competitions teach valuable digital forensics skills through practical, hands-on exercises. The techniques used here are directly applicable to real-world incident response and forensic investigations.

Flag:

picoCTF{puzzl3d_m3tadata_f0und!_c8f91d68}

Happy hacking! 🔍

CTF Forensics: Uncovering Hidden Flags in PDF Metadata