Lesson 9: Capture The Flag (CTF) - Hacker Training#
Welcome to the world of Capture The Flag - competitive hacking competitions where you solve security challenges to find โflagsโ (secret strings).
What is CTF?#
CTF competitions test your skills across:
Web Exploitation: Finding vulnerabilities in websites
Binary Exploitation (Pwn): Breaking programs to get shell access
Reverse Engineering: Understanding how programs work without source code
Cryptography: Breaking encryption and encoding
Forensics: Digital detective work
OSINT: Open Source Intelligence gathering
Misc: Everything else (steganography, programming, etc.)
Why CTF?#
Career: Cybersecurity skills are in high demand
Bug Bounties: Find real bugs, earn $$$ (ethical hacking)
Penetration Testing: Become a professional pentester
Problem Solving: Some of the most challenging puzzles youโll encounter
Community: Meet hackers worldwide
๐จ Legal & Ethical Notice#
IMPORTANT: Only hack systems you have permission to hack!
โ CTF challenges and practice platforms
โ Your own systems and labs
โ Bug bounty programs with explicit permission
โ Other peopleโs systems without permission (this is illegal!)
This notebook is for educational and authorized testing only.
Part 1: Web Exploitation#
Web challenges involve finding and exploiting vulnerabilities in web applications.
Challenge 1.1: SQL Injection (SQLi)#
OWASP Top 10 #3 - One of the most common and dangerous web vulnerabilities
Concept: Injecting SQL code into queries to bypass authentication or extract data.
import sqlite3
# Vulnerable login function (DO NOT USE IN PRODUCTION!)
def vulnerable_login(username: str, password: str) -> bool:
"""
VULNERABLE: Never build SQL queries with string concatenation!
"""
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
# Create users table
cursor.execute('''
CREATE TABLE users (
id INTEGER PRIMARY KEY,
username TEXT,
password TEXT,
is_admin INTEGER
)
''')
# Add test users
cursor.execute("INSERT INTO users VALUES (1, 'admin', 'sup3rs3cr3t', 1)")
cursor.execute("INSERT INTO users VALUES (2, 'user', 'password123', 0)")
conn.commit()
# VULNERABLE QUERY - concatenating user input directly!
query = f"SELECT * FROM users WHERE username='{username}' AND password='{password}'"
print(f"Query: {query}")
cursor.execute(query)
result = cursor.fetchone()
conn.close()
return result is not None
# Normal login
print("=== Normal Login ===")
print(f"Valid login: {vulnerable_login('admin', 'sup3rs3cr3t')}")
print(f"Invalid login: {vulnerable_login('admin', 'wrongpassword')}")
# SQL Injection attack!
print("\n=== SQL Injection Attack ===")
print("Username: admin' OR '1'='1")
print("Password: (anything)")
result = vulnerable_login("admin' OR '1'='1", "anything")
print(f"Bypassed authentication: {result}")
print("\n๐จ Explanation:")
print("The injected query becomes:")
print("SELECT * FROM users WHERE username='admin' OR '1'='1' AND password='anything'")
print("Since '1'='1' is always true, we bypass the password check!")
# Secure version using parameterized queries
def secure_login(username: str, password: str) -> bool:
"""
SECURE: Always use parameterized queries!
"""
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
# Setup (same as before)
cursor.execute('''
CREATE TABLE users (
id INTEGER PRIMARY KEY,
username TEXT,
password TEXT
)
''')
cursor.execute("INSERT INTO users VALUES (1, 'admin', 'sup3rs3cr3t')")
conn.commit()
# SECURE QUERY - using parameterized query
query = "SELECT * FROM users WHERE username=? AND password=?"
cursor.execute(query, (username, password))
result = cursor.fetchone()
conn.close()
return result is not None
# Try SQL injection on secure version
print("=== Trying Injection on Secure Version ===")
result = secure_login("admin' OR '1'='1", "anything")
print(f"Injection blocked: {not result}")
print("\nโ The parameterized query treats input as data, not code!")
Challenge 1.2: Cross-Site Scripting (XSS)#
OWASP Top 10 #2 - Inject malicious JavaScript into web pages
Types:
Reflected XSS: Payload in URL, reflects back immediately
Stored XSS: Payload stored in database, affects all users
DOM-based XSS: Manipulation of page DOM
import html
def vulnerable_comment_system(comment: str) -> str:
"""
VULNERABLE: Renders user input without sanitization
"""
html_page = f"""
<html>
<body>
<h1>Comment Section</h1>
<div class="comment">{comment}</div>
</body>
</html>
"""
return html_page
def secure_comment_system(comment: str) -> str:
"""
SECURE: Escapes HTML special characters
"""
safe_comment = html.escape(comment)
html_page = f"""
<html>
<body>
<h1>Comment Section</h1>
<div class="comment">{safe_comment}</div>
</body>
</html>
"""
return html_page
# XSS payload
xss_payload = "<script>alert('XSS Attack! Cookie: ' + document.cookie)</script>"
print("=== Vulnerable Version ===")
print(vulnerable_comment_system(xss_payload))
print("\n=== Secure Version ===")
print(secure_comment_system(xss_payload))
print("\n๐ก Notice: The secure version escapes < and > to < and >")
print("This prevents the browser from executing the script!")
Common XSS Payloads#
<script>alert(document.cookie)</script>
<img src=x onerror=alert('XSS')>
<svg onload=alert('XSS')>
<iframe src="javascript:alert('XSS')">
Defense#
Escape output: Use proper HTML encoding
Content Security Policy (CSP): Restrict script sources
HttpOnly cookies: JavaScript canโt access cookies
Input validation: Whitelist allowed characters
Part 2: Binary Exploitation (Pwn)#
Pwn challenges involve exploiting binary programs, often to get shell access.
Challenge 2.1: Buffer Overflow Basics#
Classic: One of the most famous security vulnerabilities
Concept: Write beyond array boundaries to overwrite memory.
# Python simulation of buffer overflow concept
# (In real scenarios, this happens in C/C++ with memory corruption)
class VulnerableProgram:
def __init__(self):
self.is_admin = False
self.buffer = [0] * 8 # 8-byte buffer
def copy_input(self, user_input: bytes):
"""
VULNERABLE: No bounds checking!
In C: strcpy(buffer, user_input) without length check
"""
print(f"Buffer size: {len(self.buffer)} bytes")
print(f"Input size: {len(user_input)} bytes")
# Simulate memory layout
# [buffer (8 bytes)][is_admin (1 byte)]
for i, byte in enumerate(user_input):
if i < len(self.buffer):
self.buffer[i] = byte
elif i == len(self.buffer): # Overflow into is_admin!
print(f"โ ๏ธ OVERFLOW: Writing beyond buffer!")
if byte != 0:
self.is_admin = True
print(f"Buffer: {self.buffer}")
print(f"is_admin: {self.is_admin}")
return self.is_admin
# Normal input (safe)
print("=== Normal Input ===")
prog1 = VulnerableProgram()
normal_input = b"hello" # 5 bytes, fits in buffer
prog1.copy_input(normal_input)
print(f"\nAdmin access: {prog1.is_admin}\n")
# Buffer overflow attack!
print("=== Buffer Overflow Attack ===")
prog2 = VulnerableProgram()
overflow_input = b"A" * 8 + b"\x01" # 8 bytes + 1 byte overflow
prog2.copy_input(overflow_input)
print(f"\nAdmin access gained: {prog2.is_admin}")
print("\n๐จ In real C programs, this could overwrite:")
print(" - Return addresses (control program flow)")
print(" - Function pointers (execute arbitrary code)")
print(" - Security flags (bypass authentication)")
Challenge 2.2: Format String Vulnerability#
Concept: Exploit printf-style functions to read/write arbitrary memory.
# Simulating format string vulnerability
def vulnerable_printf(user_input: str):
"""
VULNERABLE: Using user input directly as format string
In C: printf(user_input) instead of printf("%s", user_input)
"""
secret_value = 0xDEADBEEF
password = "sup3rs3cr3t"
print(f"Debug info:")
print(f" secret_value at: id={id(secret_value)}")
print(f" password at: id={id(password)}")
print()
# Vulnerable printf simulation
if "%x" in user_input or "%p" in user_input:
print("โ ๏ธ Format string detected! Leaking memory...")
print(f"Leaked value: {hex(secret_value)}")
print(f"Leaked password: {password}")
else:
print(user_input)
# Normal use
print("=== Normal Input ===")
vulnerable_printf("Hello, World!")
# Format string attack
print("\n=== Format String Attack ===")
vulnerable_printf("%x %x %x %x %s")
print("\n๐ก Real format string exploits can:")
print(" - Read arbitrary memory (%x, %s)")
print(" - Write arbitrary memory (%n)")
print(" - Leak stack canaries")
print(" - Bypass ASLR (Address Space Layout Randomization)")
Modern Protections#
Stack Canaries: Detect buffer overflows
DEP/NX: Data Execution Prevention
ASLR: Randomize memory addresses
PIE: Position Independent Executable
RELRO: Read-Only Relocations
Advanced CTF pwn challenges involve bypassing these protections!
Part 3: Reverse Engineering#
Analyze programs without source code to understand their behavior.
Challenge 3.1: Decompiling Python Bytecode#
Concept: Python code compiles to bytecode (.pyc files). Can we reverse it?
import dis
# Secret function
def check_password(password: str) -> bool:
secret = "CTF{r3v3rs3_3ng1n33r1ng}"
return password == secret
# Disassemble the bytecode
print("=== Disassembled Bytecode ===")
dis.dis(check_password)
print("\n๐ก Notice: You can see the string 'CTF{r3v3rs3_3ng1n33r1ng}' in the bytecode!")
print("This is why hardcoded secrets are a bad idea.")
# Extract the secret from bytecode
import inspect
source = inspect.getsource(check_password)
print("\n=== Recovered Source ===")
print(source)
Challenge 3.2: Simple Crackme#
Concept: Reverse a program to find the correct input.
def crackme(serial: str) -> bool:
"""
Simple serial key checker.
Can you find the correct serial?
"""
if len(serial) != 16:
return False
# Check 1: First 4 chars must be 'CTF{'
if serial[:4] != 'CTF{':
return False
# Check 2: Last char must be '}'
if serial[-1] != '}':
return False
# Check 3: Middle part checksum
middle = serial[4:-1]
checksum = sum(ord(c) for c in middle)
# Secret checksum value
if checksum != 1089:
return False
# Check 4: Specific characters
if middle[0] != 'h' or middle[-1] != 'd':
return False
return True
# Test some serials
test_serials = [
"CTF{wrong_key}",
"CTF{h4ck3r0wn3d}",
]
for serial in test_serials:
result = crackme(serial)
print(f"Serial: {serial:20s} -> {'โ Valid!' if result else 'โ Invalid'}")
# Solve it programmatically!
print("\n=== Solving Programmatically ===")
# We know: CTF{h????????d} and sum of middle = 1089
# Let's brute force!
import string
def solve_crackme():
target_sum = 1089
# We know first is 'h', last is 'd'
fixed_chars = ord('h') + ord('d')
needed_sum = target_sum - fixed_chars
# Try common patterns
patterns = ['4ck3r0wn3', '4x0r_1337', 'ack_the_d']
for pattern in patterns:
middle = 'h' + pattern + 'd'
serial = f"CTF{{{middle}}}"
if crackme(serial):
return serial
return None
solution = solve_crackme()
if solution:
print(f"Found valid serial: {solution}")
Reverse Engineering Tools#
Ghidra: NSAโs free decompiler (supports many architectures)
IDA Pro: Industry standard (expensive, but free version available)
radare2: Open-source reverse engineering framework
Binary Ninja: Modern RE platform
gdb/pwndbg: Debuggers for dynamic analysis
strings: Extract readable strings from binaries
objdump: Display object file info
strace/ltrace: System call tracing
Part 4: Forensics#
Digital detective work - analyze files, memory dumps, network captures.
Challenge 4.2: File Carving#
Concept: Extract files from binary data using file signatures (magic bytes).
# File signatures (magic bytes)
FILE_SIGNATURES = {
b'\x89PNG\r\n\x1a\n': ('PNG Image', '.png'),
b'\xff\xd8\xff': ('JPEG Image', '.jpg'),
b'GIF89a': ('GIF Image', '.gif'),
b'PK\x03\x04': ('ZIP Archive', '.zip'),
b'%PDF': ('PDF Document', '.pdf'),
b'\x1f\x8b\x08': ('GZIP Archive', '.gz'),
}
def identify_file_type(data: bytes) -> tuple[str, str]:
"""
Identify file type by magic bytes
"""
for signature, (file_type, extension) in FILE_SIGNATURES.items():
if data.startswith(signature):
return file_type, extension
return 'Unknown', '.bin'
def carve_files(data: bytes) -> list[tuple[int, str, str, bytes]]:
"""
Find and extract embedded files
Returns: [(offset, file_type, extension, data), ...]
"""
found_files = []
for signature, (file_type, extension) in FILE_SIGNATURES.items():
offset = 0
while True:
offset = data.find(signature, offset)
if offset == -1:
break
# In real file carving, you'd determine file size
# For demo, just grab next 100 bytes
file_data = data[offset:offset+100]
found_files.append((offset, file_type, extension, file_data))
offset += 1
return found_files
# Demo: Hide files in random data
import os
print("=== File Carving Demo ===")
# Create binary blob with hidden files
blob = b'\x00' * 100 # Random data
blob += b'\x89PNG\r\n\x1a\n' + b'PNG_DATA_HERE' + b'\x00' * 50
blob += b'\x00' * 100
blob += b'%PDF-1.4' + b'PDF_CONTENT_HERE' + b'\x00' * 50
blob += b'\x00' * 100
print(f"Binary blob size: {len(blob)} bytes")
print(f"\nSearching for embedded files...\n")
found = carve_files(blob)
for offset, file_type, ext, data in found:
print(f"Found {file_type} at offset {offset}")
print(f" Magic bytes: {data[:10]}")
print(f" Extension: {ext}")
print()
print("๐ก Real forensics tools:")
print(" - binwalk: Analyze firmware and binary blobs")
print(" - foremost: File carving tool")
print(" - scalpel: Another file carving tool")
print(" - volatility: Memory forensics")
Part 5: OSINT (Open Source Intelligence)#
Find information from public sources.
Challenge 5.1: Metadata Extraction#
Concept: Files contain hidden metadata (EXIF, author, timestamps, GPS coordinates).
# Simulate EXIF data (normally extracted with libraries like Pillow or ExifRead)
sample_exif = {
'Image': {
'Make': 'Canon',
'Model': 'Canon EOS 5D',
'DateTime': '2024:01:15 10:30:45',
'Software': 'Adobe Photoshop CS6',
},
'GPS': {
'GPSLatitude': [40, 44, 54.36],
'GPSLongitude': [73, 59, 8.40],
'GPSLatitudeRef': 'N',
'GPSLongitudeRef': 'W',
},
'Author': 'John Doe',
'Copyright': 'Copyright 2024',
}
def extract_sensitive_info(exif_data: dict):
"""
Extract potentially sensitive information from EXIF
"""
print("=== Metadata Analysis ===")
# Camera info
if 'Image' in exif_data:
print(f"Camera: {exif_data['Image'].get('Make')} {exif_data['Image'].get('Model')}")
print(f"Date: {exif_data['Image'].get('DateTime')}")
print(f"Software: {exif_data['Image'].get('Software')}")
# GPS coordinates
if 'GPS' in exif_data:
gps = exif_data['GPS']
lat = gps['GPSLatitude']
lon = gps['GPSLongitude']
# Convert to decimal degrees
lat_decimal = lat[0] + lat[1]/60 + lat[2]/3600
lon_decimal = lon[0] + lon[1]/60 + lon[2]/3600
if gps['GPSLatitudeRef'] == 'S':
lat_decimal *= -1
if gps['GPSLongitudeRef'] == 'W':
lon_decimal *= -1
print(f"\nโ ๏ธ GPS Location Found:")
print(f" Latitude: {lat_decimal:.6f}ยฐ")
print(f" Longitude: {lon_decimal:.6f}ยฐ")
print(f" Google Maps: https://maps.google.com/?q={lat_decimal},{lon_decimal}")
print(f" Location: Times Square, New York City")
# Author info
if 'Author' in exif_data:
print(f"\nAuthor: {exif_data['Author']}")
print("\n๐ก Privacy Risk:")
print(" - Photos reveal where you've been")
print(" - Timestamps show when you were there")
print(" - Can be used for stalking or doxxing")
print("\n๐ก๏ธ Defense: Strip EXIF before posting online!")
extract_sensitive_info(sample_exif)
OSINT Tools & Techniques#
Search Engines:
Google Dorks:
site:example.com filetype:pdfShodan: Search for IoT devices and exposed services
Wayback Machine: View historical versions of websites
Social Media:
Username search across platforms
Profile analysis
Connection mapping
Tools:
theHarvester: Email/subdomain harvestingrecon-ng: OSINT frameworkMaltego: Visual link analysisSpiderFoot: Automated OSINT
Remember: Use OSINT ethically and legally!
Part 6: Misc Challenges#
Challenge 6.1: Encoding Madness#
Concept: Data is often encoded multiple times. Recognize and decode!
import base64
import codecs
def identify_encoding(data: str) -> str:
"""
Try to identify encoding type
"""
# Base64: Only a-zA-Z0-9+/= characters
if all(c in 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=' for c in data):
if len(data) % 4 == 0: # Base64 is always multiple of 4
return 'base64'
# Hex: Only 0-9a-f
if all(c in '0123456789abcdefABCDEF' for c in data):
if len(data) % 2 == 0:
return 'hex'
# ROT13: Looks like English but garbled
if data.isalpha():
return 'maybe_rot13'
return 'unknown'
def decode_layers(encoded: str) -> str:
"""
Recursively decode multiple encoding layers
"""
print(f"Input: {encoded[:50]}..." if len(encoded) > 50 else f"Input: {encoded}")
encoding = identify_encoding(encoded)
print(f"Detected encoding: {encoding}")
try:
if encoding == 'base64':
decoded = base64.b64decode(encoded).decode('utf-8')
print(f"Decoded: {decoded[:50]}..." if len(decoded) > 50 else f"Decoded: {decoded}")
# Check if still encoded
if identify_encoding(decoded) != 'unknown':
print("Still encoded! Decoding again...\n")
return decode_layers(decoded)
return decoded
elif encoding == 'hex':
decoded = bytes.fromhex(encoded).decode('utf-8')
print(f"Decoded: {decoded}")
if identify_encoding(decoded) != 'unknown':
print("Still encoded! Decoding again...\n")
return decode_layers(decoded)
return decoded
elif encoding == 'maybe_rot13':
decoded = codecs.decode(encoded, 'rot_13')
print(f"Decoded: {decoded}")
return decoded
except Exception as e:
print(f"Decoding failed: {e}")
return encoded
return encoded
# Multi-layer encoded flag
flag = "CTF{3nc0d1ng_1s_n0t_encrypt10n}"
# Encode multiple times
encoded1 = flag.encode().hex() # To hex
encoded2 = base64.b64encode(encoded1.encode()).decode() # To base64
encoded3 = base64.b64encode(encoded2.encode()).decode() # To base64 again
print("=== Multi-Layer Encoding Challenge ===")
print(f"\nTriple-encoded flag:\n{encoded3}\n")
print("\nDecoding...\n" + "="*50 + "\n")
result = decode_layers(encoded3)
print("\n" + "="*50)
print(f"\nโ Final result: {result}")
Common Encodings in CTF#
Base64:
VGhpcyBpcyBiYXNlNjQ=Hex:
54686973206973206865 78ROT13:
Guvf vf EBG13URL encoding:
This%20is%20URL%20encodedBinary:
01010100 01101000Morse:
- .... .. ... / .. ... / -- --- .-. ... .ASCII85:
<~9jqo^BlbD~>
Tool: CyberChef - The ultimate encoding/decoding tool!
๐ CTF Platforms for Practice#
Beginner-Friendly#
picoCTF: Educational CTF by Carnegie Mellon
OverTheWire: Wargames (Bandit for Linux, Natas for web)
HackTheBox: Boxes and challenges (free tier)
TryHackMe: Guided learning paths
Intermediate#
CTFtime: Calendar of upcoming CTFs
HackThisSite: Various challenge types
Root-Me: 400+ challenges
Ringzer0 CTF: All categories
Advanced#
pwnable.kr/tw: Binary exploitation
CryptoHack: Cryptography focused
Damn Vulnerable Web Application (DVWA): Web exploitation lab
DEF CON CTF: The Olympics of hacking
๐ ๏ธ Essential CTF Tools#
General#
Kali Linux: Pre-loaded with hacking tools
CyberChef: The ultimate data transformation tool
pwntools: Python CTF library
Web#
Burp Suite: Web proxy and scanner
sqlmap: Automated SQL injection
nikto: Web server scanner
Binary/Pwn#
gdb-peda/pwndbg: Enhanced debugger
ROPgadget: ROP chain builder
checksec: Binary security checker
Reversing#
Ghidra: Free decompiler
radare2/Cutter: RE framework
ltrace/strace: Dynamic analysis
Forensics#
binwalk: Firmware analysis
volatility: Memory forensics
Wireshark: Network analysis
exiftool: Metadata extraction
Crypto#
RsaCtfTool: RSA attacks
hashcat: Hash cracking
john: John the Ripper password cracker
๐ฏ CTF Mindset#
Try Everything: Even stupid ideas sometimes work
Google Fu: Half of CTF is knowing what to search
Read Error Messages: They often contain hints
Low-Hanging Fruit: Check for common mistakes first
Think Like a Defender: What would you protect?
Ask for Help: CTF communities are friendly
Write Writeups: Teaching solidifies learning
Stay Curious: Always be learning new techniques
๐ Learning Resources#
Books#
The Web Application Hackerโs Handbook
Hacking: The Art of Exploitation
Practical Malware Analysis
The Shellcoderโs Handbook
YouTube Channels#
LiveOverflow: CTF writeups and RE
John Hammond: Beginner-friendly walkthroughs
IppSec: HackTheBox writeups
PwnFunction: Animated security explanations
Communities#
CTFtime Forums: Discuss challenges
r/securityCTF: Reddit community
Discord Servers: Join CTF team discords
๐ Career Paths#
CTF skills lead to:
Penetration Tester: Break into systems (legally)
Security Researcher: Find 0-days, publish papers
Bug Bounty Hunter: Full-time or side hustle
Malware Analyst: Reverse engineer malware
Incident Responder: Handle security breaches
Red Team: Offensive security for companies
โก Quick Win Checklist#
When stuck on a CTF challenge:
Run
stringson binary filesCheck source code / view-source:URL
Look for hidden files (.git, robots.txt, .htaccess)
Try common credentials (admin/admin, root/toor)
Decode suspicious strings (base64, hex, rot13)
Run
binwalkon unknown filesGoogle the exact error message
Check for SQL injection:
' OR 1=1--Test for XSS:
<script>alert(1)</script>Try directory brute forcing
Check file extensions (change .jpg to .txt)
Look at HTTP headers and cookies
Welcome to the hacker community! Now go break things (legally) and capture those flags! ๐ฉ