Lesson 9: Capture The Flag (CTF) - Hacker Training

Lesson 9: Capture The Flag (CTF) - Hacker Training#

Welcome to the world of Capture The Flag - competitive hacking competitions where you solve security challenges to find “flags” (secret strings).

What is CTF?#

CTF competitions test your skills across:

Web Exploitation: Finding vulnerabilities in websites
Binary Exploitation (Pwn): Breaking programs to get shell access
Reverse Engineering: Understanding how programs work without source code
Cryptography: Breaking encryption and encoding
Forensics: Digital detective work
OSINT: Open Source Intelligence gathering
Misc: Everything else (steganography, programming, etc.)

Why CTF?#

Career: Cybersecurity skills are in high demand
Bug Bounties: Find real bugs, earn $$$ (ethical hacking)
Penetration Testing: Become a professional pentester
Problem Solving: Some of the most challenging puzzles you’ll encounter
Community: Meet hackers worldwide

🚨 Legal & Ethical Notice#

IMPORTANT: Only hack systems you have permission to hack!

✅ CTF challenges and practice platforms
✅ Your own systems and labs
✅ Bug bounty programs with explicit permission
❌ Other people’s systems without permission (this is illegal!)

This notebook is for educational and authorized testing only.

Part 1: Web Exploitation#

Web challenges involve finding and exploiting vulnerabilities in web applications.

Challenge 1.1: SQL Injection (SQLi)#

OWASP Top 10 #3 - One of the most common and dangerous web vulnerabilities

Concept: Injecting SQL code into queries to bypass authentication or extract data.

import sqlite3

# Vulnerable login function (DO NOT USE IN PRODUCTION!)
def vulnerable_login(username: str, password: str) -> bool:
    """
    VULNERABLE: Never build SQL queries with string concatenation!
    """
    conn = sqlite3.connect(':memory:')
    cursor = conn.cursor()
    
    # Create users table
    cursor.execute('''
        CREATE TABLE users (
            id INTEGER PRIMARY KEY,
            username TEXT,
            password TEXT,
            is_admin INTEGER
        )
    ''')
    
    # Add test users
    cursor.execute("INSERT INTO users VALUES (1, 'admin', 'sup3rs3cr3t', 1)")
    cursor.execute("INSERT INTO users VALUES (2, 'user', 'password123', 0)")
    conn.commit()
    
    # VULNERABLE QUERY - concatenating user input directly!
    query = f"SELECT * FROM users WHERE username='{username}' AND password='{password}'"
    print(f"Query: {query}")
    
    cursor.execute(query)
    result = cursor.fetchone()
    
    conn.close()
    return result is not None

# Normal login
print("=== Normal Login ===")
print(f"Valid login: {vulnerable_login('admin', 'sup3rs3cr3t')}")
print(f"Invalid login: {vulnerable_login('admin', 'wrongpassword')}")

# SQL Injection attack!
print("\n=== SQL Injection Attack ===")
print("Username: admin' OR '1'='1")
print("Password: (anything)")
result = vulnerable_login("admin' OR '1'='1", "anything")
print(f"Bypassed authentication: {result}")

print("\n🚨 Explanation:")
print("The injected query becomes:")
print("SELECT * FROM users WHERE username='admin' OR '1'='1' AND password='anything'")
print("Since '1'='1' is always true, we bypass the password check!")

# Secure version using parameterized queries
def secure_login(username: str, password: str) -> bool:
    """
    SECURE: Always use parameterized queries!
    """
    conn = sqlite3.connect(':memory:')
    cursor = conn.cursor()
    
    # Setup (same as before)
    cursor.execute('''
        CREATE TABLE users (
            id INTEGER PRIMARY KEY,
            username TEXT,
            password TEXT
        )
    ''')
    cursor.execute("INSERT INTO users VALUES (1, 'admin', 'sup3rs3cr3t')")
    conn.commit()
    
    # SECURE QUERY - using parameterized query
    query = "SELECT * FROM users WHERE username=? AND password=?"
    cursor.execute(query, (username, password))
    result = cursor.fetchone()
    
    conn.close()
    return result is not None

# Try SQL injection on secure version
print("=== Trying Injection on Secure Version ===")
result = secure_login("admin' OR '1'='1", "anything")
print(f"Injection blocked: {not result}")
print("\n✓ The parameterized query treats input as data, not code!")

Challenge 1.2: Cross-Site Scripting (XSS)#

OWASP Top 10 #2 - Inject malicious JavaScript into web pages

Types:

Reflected XSS: Payload in URL, reflects back immediately
Stored XSS: Payload stored in database, affects all users
DOM-based XSS: Manipulation of page DOM

import html

def vulnerable_comment_system(comment: str) -> str:
    """
    VULNERABLE: Renders user input without sanitization
    """
    html_page = f"""
    <html>
        <body>
            <h1>Comment Section</h1>
            <div class="comment">{comment}</div>
        </body>
    </html>
    """
    return html_page

def secure_comment_system(comment: str) -> str:
    """
    SECURE: Escapes HTML special characters
    """
    safe_comment = html.escape(comment)
    html_page = f"""
    <html>
        <body>
            <h1>Comment Section</h1>
            <div class="comment">{safe_comment}</div>
        </body>
    </html>
    """
    return html_page

# XSS payload
xss_payload = "<script>alert('XSS Attack! Cookie: ' + document.cookie)</script>"

print("=== Vulnerable Version ===")
print(vulnerable_comment_system(xss_payload))

print("\n=== Secure Version ===")
print(secure_comment_system(xss_payload))

print("\n💡 Notice: The secure version escapes < and > to &lt; and &gt;")
print("This prevents the browser from executing the script!")

Common XSS Payloads#

<script>alert(document.cookie)</script>
<img src=x onerror=alert('XSS')>
<svg onload=alert('XSS')>
<iframe src="javascript:alert('XSS')">

Defense#

Escape output: Use proper HTML encoding
Content Security Policy (CSP): Restrict script sources
HttpOnly cookies: JavaScript can’t access cookies
Input validation: Whitelist allowed characters

Part 2: Binary Exploitation (Pwn)#

Pwn challenges involve exploiting binary programs, often to get shell access.

Challenge 2.1: Buffer Overflow Basics#

Classic: One of the most famous security vulnerabilities

Concept: Write beyond array boundaries to overwrite memory.

# Python simulation of buffer overflow concept
# (In real scenarios, this happens in C/C++ with memory corruption)

class VulnerableProgram:
    def __init__(self):
        self.is_admin = False
        self.buffer = [0] * 8  # 8-byte buffer
    
    def copy_input(self, user_input: bytes):
        """
        VULNERABLE: No bounds checking!
        In C: strcpy(buffer, user_input) without length check
        """
        print(f"Buffer size: {len(self.buffer)} bytes")
        print(f"Input size: {len(user_input)} bytes")
        
        # Simulate memory layout
        # [buffer (8 bytes)][is_admin (1 byte)]
        
        for i, byte in enumerate(user_input):
            if i < len(self.buffer):
                self.buffer[i] = byte
            elif i == len(self.buffer):  # Overflow into is_admin!
                print(f"⚠️  OVERFLOW: Writing beyond buffer!")
                if byte != 0:
                    self.is_admin = True
        
        print(f"Buffer: {self.buffer}")
        print(f"is_admin: {self.is_admin}")
        return self.is_admin

# Normal input (safe)
print("=== Normal Input ===")
prog1 = VulnerableProgram()
normal_input = b"hello"  # 5 bytes, fits in buffer
prog1.copy_input(normal_input)
print(f"\nAdmin access: {prog1.is_admin}\n")

# Buffer overflow attack!
print("=== Buffer Overflow Attack ===")
prog2 = VulnerableProgram()
overflow_input = b"A" * 8 + b"\x01"  # 8 bytes + 1 byte overflow
prog2.copy_input(overflow_input)
print(f"\nAdmin access gained: {prog2.is_admin}")

print("\n🚨 In real C programs, this could overwrite:")
print("   - Return addresses (control program flow)")
print("   - Function pointers (execute arbitrary code)")
print("   - Security flags (bypass authentication)")

Challenge 2.2: Format String Vulnerability#

Concept: Exploit printf-style functions to read/write arbitrary memory.

# Simulating format string vulnerability

def vulnerable_printf(user_input: str):
    """
    VULNERABLE: Using user input directly as format string
    In C: printf(user_input) instead of printf("%s", user_input)
    """
    secret_value = 0xDEADBEEF
    password = "sup3rs3cr3t"
    
    print(f"Debug info:")
    print(f"  secret_value at: id={id(secret_value)}")
    print(f"  password at: id={id(password)}")
    print()
    
    # Vulnerable printf simulation
    if "%x" in user_input or "%p" in user_input:
        print("⚠️  Format string detected! Leaking memory...")
        print(f"Leaked value: {hex(secret_value)}")
        print(f"Leaked password: {password}")
    else:
        print(user_input)

# Normal use
print("=== Normal Input ===")
vulnerable_printf("Hello, World!")

# Format string attack
print("\n=== Format String Attack ===")
vulnerable_printf("%x %x %x %x %s")

print("\n💡 Real format string exploits can:")
print("   - Read arbitrary memory (%x, %s)")
print("   - Write arbitrary memory (%n)")
print("   - Leak stack canaries")
print("   - Bypass ASLR (Address Space Layout Randomization)")

Modern Protections#

Stack Canaries: Detect buffer overflows
DEP/NX: Data Execution Prevention
ASLR: Randomize memory addresses
PIE: Position Independent Executable
RELRO: Read-Only Relocations

Advanced CTF pwn challenges involve bypassing these protections!

Part 3: Reverse Engineering#

Analyze programs without source code to understand their behavior.

Challenge 3.1: Decompiling Python Bytecode#

Concept: Python code compiles to bytecode (.pyc files). Can we reverse it?

import dis

# Secret function
def check_password(password: str) -> bool:
    secret = "CTF{r3v3rs3_3ng1n33r1ng}"
    return password == secret

# Disassemble the bytecode
print("=== Disassembled Bytecode ===")
dis.dis(check_password)

print("\n💡 Notice: You can see the string 'CTF{r3v3rs3_3ng1n33r1ng}' in the bytecode!")
print("This is why hardcoded secrets are a bad idea.")

# Extract the secret from bytecode
import inspect
source = inspect.getsource(check_password)
print("\n=== Recovered Source ===")
print(source)

Challenge 3.2: Simple Crackme#

Concept: Reverse a program to find the correct input.

def crackme(serial: str) -> bool:
    """
    Simple serial key checker.
    Can you find the correct serial?
    """
    if len(serial) != 16:
        return False
    
    # Check 1: First 4 chars must be 'CTF{'
    if serial[:4] != 'CTF{':
        return False
    
    # Check 2: Last char must be '}'
    if serial[-1] != '}':
        return False
    
    # Check 3: Middle part checksum
    middle = serial[4:-1]
    checksum = sum(ord(c) for c in middle)
    
    # Secret checksum value
    if checksum != 1089:
        return False
    
    # Check 4: Specific characters
    if middle[0] != 'h' or middle[-1] != 'd':
        return False
    
    return True

# Test some serials
test_serials = [
    "CTF{wrong_key}",
    "CTF{h4ck3r0wn3d}",
]

for serial in test_serials:
    result = crackme(serial)
    print(f"Serial: {serial:20s} -> {'✓ Valid!' if result else '✗ Invalid'}")

# Solve it programmatically!
print("\n=== Solving Programmatically ===")
# We know: CTF{h????????d} and sum of middle = 1089
# Let's brute force!

import string

def solve_crackme():
    target_sum = 1089
    # We know first is 'h', last is 'd'
    fixed_chars = ord('h') + ord('d')
    needed_sum = target_sum - fixed_chars
    
    # Try common patterns
    patterns = ['4ck3r0wn3', '4x0r_1337', 'ack_the_d']
    
    for pattern in patterns:
        middle = 'h' + pattern + 'd'
        serial = f"CTF{{{middle}}}"
        if crackme(serial):
            return serial
    
    return None

solution = solve_crackme()
if solution:
    print(f"Found valid serial: {solution}")

Reverse Engineering Tools#

Ghidra: NSA’s free decompiler (supports many architectures)
IDA Pro: Industry standard (expensive, but free version available)
radare2: Open-source reverse engineering framework
Binary Ninja: Modern RE platform
gdb/pwndbg: Debuggers for dynamic analysis
strings: Extract readable strings from binaries
objdump: Display object file info
strace/ltrace: System call tracing

Part 4: Forensics#

Digital detective work - analyze files, memory dumps, network captures.

Challenge 4.1: Steganography - Hidden in Plain Sight#

Concept: Hide data inside images, audio, or other files.

# LSB (Least Significant Bit) Steganography

def text_to_binary(text: str) -> str:
    """Convert text to binary string"""
    return ''.join(format(ord(char), '08b') for char in text)

def binary_to_text(binary: str) -> str:
    """Convert binary string to text"""
    chars = [binary[i:i+8] for i in range(0, len(binary), 8)]
    return ''.join(chr(int(char, 2)) for char in chars)

def hide_message_in_pixels(pixels: list[int], message: str) -> list[int]:
    """
    Hide message in least significant bits of pixel values
    
    LSB steganography: Modify the last bit of each pixel
    Human eye can't notice the difference!
    """
    binary_message = text_to_binary(message) + '00000000'  # Add delimiter
    
    if len(binary_message) > len(pixels):
        raise ValueError("Message too long for pixel array")
    
    stego_pixels = pixels.copy()
    
    for i, bit in enumerate(binary_message):
        # Clear LSB and set to message bit
        stego_pixels[i] = (pixels[i] & 0xFE) | int(bit)
    
    return stego_pixels

def extract_message_from_pixels(pixels: list[int]) -> str:
    """
    Extract hidden message from pixels
    """
    binary_message = ''.join(str(pixel & 1) for pixel in pixels)
    
    # Find delimiter (00000000)
    end = binary_message.find('00000000')
    if end == -1:
        end = len(binary_message)
    
    return binary_to_text(binary_message[:end])

# Demo
print("=== LSB Steganography Demo ===")

# Simulate image pixels (normally 0-255)
original_pixels = [154, 200, 123, 89, 234, 156, 78, 199, 145, 223,
                   100, 180, 91, 205, 134, 167, 198, 143, 211, 176] * 5

secret_message = "CTF{h1dd3n_1n_p1x3ls}"
print(f"Original message: {secret_message}")
print(f"Message length: {len(secret_message)} characters")
print(f"Binary length: {len(text_to_binary(secret_message))} bits\n")

# Hide message
stego_pixels = hide_message_in_pixels(original_pixels, secret_message)

# Show pixel changes
print("Pixel changes (first 10):")
for i in range(10):
    change = abs(original_pixels[i] - stego_pixels[i])
    print(f"  Pixel {i}: {original_pixels[i]:3d} -> {stego_pixels[i]:3d} (change: {change})")

# Extract message
extracted = extract_message_from_pixels(stego_pixels)
print(f"\nExtracted message: {extracted}")
print(f"✓ Success!" if extracted == secret_message else "✗ Failed")

print("\n💡 In real images:")
print("   - Changes are invisible to human eye")
print("   - Can hide large amounts of data")
print("   - Detection requires statistical analysis")

Challenge 4.2: File Carving#

Concept: Extract files from binary data using file signatures (magic bytes).

# File signatures (magic bytes)
FILE_SIGNATURES = {
    b'\x89PNG\r\n\x1a\n': ('PNG Image', '.png'),
    b'\xff\xd8\xff': ('JPEG Image', '.jpg'),
    b'GIF89a': ('GIF Image', '.gif'),
    b'PK\x03\x04': ('ZIP Archive', '.zip'),
    b'%PDF': ('PDF Document', '.pdf'),
    b'\x1f\x8b\x08': ('GZIP Archive', '.gz'),
}

def identify_file_type(data: bytes) -> tuple[str, str]:
    """
    Identify file type by magic bytes
    """
    for signature, (file_type, extension) in FILE_SIGNATURES.items():
        if data.startswith(signature):
            return file_type, extension
    return 'Unknown', '.bin'

def carve_files(data: bytes) -> list[tuple[int, str, str, bytes]]:
    """
    Find and extract embedded files
    Returns: [(offset, file_type, extension, data), ...]
    """
    found_files = []
    
    for signature, (file_type, extension) in FILE_SIGNATURES.items():
        offset = 0
        while True:
            offset = data.find(signature, offset)
            if offset == -1:
                break
            
            # In real file carving, you'd determine file size
            # For demo, just grab next 100 bytes
            file_data = data[offset:offset+100]
            found_files.append((offset, file_type, extension, file_data))
            offset += 1
    
    return found_files

# Demo: Hide files in random data
import os

print("=== File Carving Demo ===")

# Create binary blob with hidden files
blob = b'\x00' * 100  # Random data
blob += b'\x89PNG\r\n\x1a\n' + b'PNG_DATA_HERE' + b'\x00' * 50
blob += b'\x00' * 100
blob += b'%PDF-1.4' + b'PDF_CONTENT_HERE' + b'\x00' * 50
blob += b'\x00' * 100

print(f"Binary blob size: {len(blob)} bytes")
print(f"\nSearching for embedded files...\n")

found = carve_files(blob)
for offset, file_type, ext, data in found:
    print(f"Found {file_type} at offset {offset}")
    print(f"  Magic bytes: {data[:10]}")
    print(f"  Extension: {ext}")
    print()

print("💡 Real forensics tools:")
print("   - binwalk: Analyze firmware and binary blobs")
print("   - foremost: File carving tool")
print("   - scalpel: Another file carving tool")
print("   - volatility: Memory forensics")

Part 5: OSINT (Open Source Intelligence)#

Find information from public sources.

Challenge 5.1: Metadata Extraction#

Concept: Files contain hidden metadata (EXIF, author, timestamps, GPS coordinates).

# Simulate EXIF data (normally extracted with libraries like Pillow or ExifRead)

sample_exif = {
    'Image': {
        'Make': 'Canon',
        'Model': 'Canon EOS 5D',
        'DateTime': '2024:01:15 10:30:45',
        'Software': 'Adobe Photoshop CS6',
    },
    'GPS': {
        'GPSLatitude': [40, 44, 54.36],
        'GPSLongitude': [73, 59, 8.40],
        'GPSLatitudeRef': 'N',
        'GPSLongitudeRef': 'W',
    },
    'Author': 'John Doe',
    'Copyright': 'Copyright 2024',
}

def extract_sensitive_info(exif_data: dict):
    """
    Extract potentially sensitive information from EXIF
    """
    print("=== Metadata Analysis ===")
    
    # Camera info
    if 'Image' in exif_data:
        print(f"Camera: {exif_data['Image'].get('Make')} {exif_data['Image'].get('Model')}")
        print(f"Date: {exif_data['Image'].get('DateTime')}")
        print(f"Software: {exif_data['Image'].get('Software')}")
    
    # GPS coordinates
    if 'GPS' in exif_data:
        gps = exif_data['GPS']
        lat = gps['GPSLatitude']
        lon = gps['GPSLongitude']
        
        # Convert to decimal degrees
        lat_decimal = lat[0] + lat[1]/60 + lat[2]/3600
        lon_decimal = lon[0] + lon[1]/60 + lon[2]/3600
        
        if gps['GPSLatitudeRef'] == 'S':
            lat_decimal *= -1
        if gps['GPSLongitudeRef'] == 'W':
            lon_decimal *= -1
        
        print(f"\n⚠️  GPS Location Found:")
        print(f"  Latitude: {lat_decimal:.6f}°")
        print(f"  Longitude: {lon_decimal:.6f}°")
        print(f"  Google Maps: https://maps.google.com/?q={lat_decimal},{lon_decimal}")
        print(f"  Location: Times Square, New York City")
    
    # Author info
    if 'Author' in exif_data:
        print(f"\nAuthor: {exif_data['Author']}")
    
    print("\n💡 Privacy Risk:")
    print("   - Photos reveal where you've been")
    print("   - Timestamps show when you were there")
    print("   - Can be used for stalking or doxxing")
    print("\n🛡️  Defense: Strip EXIF before posting online!")

extract_sensitive_info(sample_exif)

OSINT Tools & Techniques#

Search Engines:

Google Dorks: site:example.com filetype:pdf
Shodan: Search for IoT devices and exposed services
Wayback Machine: View historical versions of websites

Social Media:

Username search across platforms
Profile analysis
Connection mapping

Tools:

theHarvester: Email/subdomain harvesting
recon-ng: OSINT framework
Maltego: Visual link analysis
SpiderFoot: Automated OSINT

Remember: Use OSINT ethically and legally!

Lesson 9: Capture The Flag (CTF) - Hacker Training

Contents

Lesson 9: Capture The Flag (CTF) - Hacker Training#

What is CTF?#

Why CTF?#

🚨 Legal & Ethical Notice#

Part 1: Web Exploitation#

Challenge 1.1: SQL Injection (SQLi)#

Challenge 1.2: Cross-Site Scripting (XSS)#

Common XSS Payloads#

Defense#

Part 2: Binary Exploitation (Pwn)#

Challenge 2.1: Buffer Overflow Basics#

Challenge 2.2: Format String Vulnerability#

Modern Protections#

Part 3: Reverse Engineering#

Challenge 3.1: Decompiling Python Bytecode#

Challenge 3.2: Simple Crackme#

Reverse Engineering Tools#

Part 4: Forensics#

Challenge 4.1: Steganography - Hidden in Plain Sight#

Challenge 4.2: File Carving#

Part 5: OSINT (Open Source Intelligence)#

Challenge 5.1: Metadata Extraction#

OSINT Tools & Techniques#

Part 6: Misc Challenges#

Challenge 6.1: Encoding Madness#

Common Encodings in CTF#

🏆 CTF Platforms for Practice#

Beginner-Friendly#

Intermediate#

Advanced#

🛠️ Essential CTF Tools#

General#

Web#

Binary/Pwn#

Reversing#

Forensics#

Crypto#

🎯 CTF Mindset#

📚 Learning Resources#

Books#

YouTube Channels#

Communities#

🎓 Career Paths#

⚡ Quick Win Checklist#