Debugging and Profiling: Finding and Fixing Bugs

Debugging and Profiling: Finding and Fixing Bugs#

Debugging is a superpower. Professional developers spend 50%+ of their time debugging. Learning to debug effectively will make you 10x more productive.

What You’ll Learn#

Debugging mindset and strategies
Using print statements effectively
Python’s built-in debugger (pdb)
IDE debugging tools
Reading stack traces
Performance profiling
Memory profiling
Best practices for bug-free code

💡 Real-World Analogy#

Debugging is like being a detective:

🔍 Gather clues (error messages, logs)
🧪 Form hypotheses (what might be wrong?)
🔬 Test theories (add print statements, use debugger)
✅ Solve the case (fix the bug!)

1. The Debugging Mindset#

Accept That Bugs Are Normal#

Everyone writes bugs - even senior engineers!
Bugs are learning opportunities
The best developers are the best debuggers

The Scientific Method#

Observe - What’s happening? What’s expected?
Hypothesize - What could cause this?
Test - Add logging, use debugger
Analyze - Were you right? If not, new hypothesis!
Fix - Apply the solution
Verify - Does it work now?

Common Debugging Mistakes#

❌ Random code changes - “Maybe if I change this…” ❌ Assuming - “This can’t be the problem” ❌ Not reading error messages - They tell you exactly what’s wrong! ❌ Debugging tired - Take breaks!

✅ Read error messages carefully ✅ Reproduce the bug consistently ✅ Isolate the problem - Binary search through code ✅ Use version control - Revert if needed ✅ Rubber duck debugging - Explain to someone/something

2. Reading Error Messages#

Error messages are your friends! They tell you exactly what’s wrong.

Anatomy of a Stack Trace#

def divide_numbers(a, b):
    return a / b

def calculate_average(numbers):
    total = sum(numbers)
    count = len(numbers)
    return divide_numbers(total, count)

# This will error!
result = calculate_average([])

Reading the traceback:

Traceback (most recent call last):              ← Start here
  File "script.py", line 10, in <module>        ← Where you called it
    result = calculate_average([])              ← The line that failed
  File "script.py", line 7, in calculate_average
    return divide_numbers(total, count)         ← Function that called the error
  File "script.py", line 2, in divide_numbers
    return a / b                                ← The actual error location
ZeroDivisionError: division by zero             ← What went wrong

How to read it:

Bottom: The actual error type and message
Work upward: See the call stack
Find YOUR code: Focus on files you wrote

Common Python Exceptions#

# SyntaxError - Code won't even run
# if x > 5  # Missing colon!

# NameError - Variable doesn't exist
# print(undefined_variable)

# TypeError - Wrong type for operation
# result = "hello" + 5

# AttributeError - Object doesn't have that attribute
# numbers = [1, 2, 3]
# numbers.append_item(4)  # Should be append()

# IndexError - List index out of range
# numbers = [1, 2, 3]
# print(numbers[10])

# KeyError - Dictionary key doesn't exist
# person = {"name": "Alice"}
# print(person["age"])

# ValueError - Right type, wrong value
# number = int("hello")

# ZeroDivisionError - Division by zero
# result = 10 / 0

# FileNotFoundError - File doesn't exist
# with open("nonexistent.txt") as f:
#     content = f.read()

print("All errors are commented out to prevent crashes")

3. Print Debugging#

The most common debugging technique! Simple but effective.

Basic Print Debugging#

def buggy_function(data):
    print(f"DEBUG: Input data = {data}")  # What did we receive?
    
    result = []
    for item in data:
        print(f"DEBUG: Processing item = {item}")  # Track progress
        
        if item > 0:
            value = item * 2
            print(f"DEBUG: Calculated value = {value}")  # Check calculation
            result.append(value)
    
    print(f"DEBUG: Final result = {result}")  # What are we returning?
    return result

buggy_function([1, -2, 3, 0, 5])

Better Print Debugging#

import sys

def debug_print(*args, **kwargs):
    """
    Enhanced print for debugging.
    Prints to stderr (won't interfere with normal output)
    """
    print("[DEBUG]", *args, **kwargs, file=sys.stderr)

def calculate(x, y):
    debug_print(f"calculate called with x={x}, y={y}")
    result = x + y
    debug_print(f"result = {result}")
    return result

calculate(5, 3)

Using f-strings for Debugging (Python 3.8+)#

x = 42
y = "hello"
z = [1, 2, 3]

# The = sign shows variable name AND value!
print(f"{x=}")
print(f"{y=}")
print(f"{z=}")

# Works with expressions too
print(f"{len(z)=}")
print(f"{x * 2=}")

Conditional Debug Prints#

DEBUG = True  # Toggle this to enable/disable debug output

def debug_log(message):
    if DEBUG:
        print(f"[DEBUG] {message}")

def process_data(data):
    debug_log(f"Starting processing with {len(data)} items")
    
    for i, item in enumerate(data):
        debug_log(f"Item {i}: {item}")
        # ... processing ...
    
    debug_log("Processing complete")

process_data([1, 2, 3])

4. Python Debugger (pdb)#

pdb is Python’s built-in interactive debugger. It’s like pausing time and inspecting everything!

Basic pdb Usage#

import pdb

def calculate_total(prices):
    total = 0
    
    pdb.set_trace()  # ← Execution pauses here!
    
    for price in prices:
        total += price
    
    return total

# Uncomment to try (won't work well in notebook):
# calculate_total([10, 20, 30])

pdb Commands#

When pdb pauses, you get an interactive prompt:

(Pdb) ?

Navigation:

n (next) - Execute current line, move to next
s (step) - Step INTO function calls
c (continue) - Continue until next breakpoint
r (return) - Continue until current function returns
q (quit) - Exit debugger

Inspection:

p variable - Print variable value
pp variable - Pretty-print variable
l (list) - Show current code context
ll (longlist) - Show whole function
w (where) - Show stack trace
a (args) - Show function arguments

Execution:

Any Python code! - total = 100, len(prices), etc.
! prefix for commands: !n = 5 (set n to 5, not next)

Breakpoints:

b line_number - Set breakpoint at line
b function_name - Break when function is called
b file.py:20 - Break at line 20 in file.py
cl (clear) - Remove all breakpoints

Breakpoint() - Modern Way (Python 3.7+)#

def process_user_data(users):
    results = []
    
    for user in users:
        # Modern way - cleaner than pdb.set_trace()
        breakpoint()  # ← Pauses here
        
        name = user.get("name", "Unknown")
        age = user.get("age", 0)
        results.append(f"{name} is {age} years old")
    
    return results

# Uncomment to try:
# process_user_data([{"name": "Alice", "age": 30}])

Post-Mortem Debugging#

import pdb

def buggy_code():
    x = 10
    y = 0
    return x / y  # This will crash!

try:
    buggy_code()
except Exception:
    # Enter debugger at the point of exception
    pdb.post_mortem()
    # Now you can inspect variables at crash time!

5. Logging#

Better than print statements for production code!

Basic Logging#

import logging

# Configure logging
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

logger = logging.getLogger(__name__)

def divide(a, b):
    logger.debug(f"divide called with a={a}, b={b}")
    
    if b == 0:
        logger.error("Attempted division by zero!")
        return None
    
    result = a / b
    logger.info(f"Result: {result}")
    return result

divide(10, 2)
divide(10, 0)

Log Levels#

logger.debug("Detailed info for debugging")      # DEBUG
logger.info("General information")               # INFO
logger.warning("Warning - something unexpected")  # WARNING
logger.error("Error occurred")                   # ERROR
logger.critical("Critical failure!")             # CRITICAL

When to use:

DEBUG: Detailed diagnostic info
INFO: Confirm things are working
WARNING: Something unexpected but handled
ERROR: Serious problem, function failed
CRITICAL: Program may crash

Logging to File#

import logging

# Log to file instead of console
logging.basicConfig(
    filename='app.log',
    level=logging.DEBUG,
    format='%(asctime)s - %(levelname)s - %(message)s'
)

logging.info("Application started")
logging.debug("Processing user data")
logging.warning("Low memory")
logging.error("Failed to connect to database")

6. Assert Statements#

Catch bugs early by asserting what should be true.

def calculate_average(numbers):
    # Assertions help catch bugs during development
    assert len(numbers) > 0, "Cannot calculate average of empty list"
    assert all(isinstance(n, (int, float)) for n in numbers), "All items must be numbers"
    
    return sum(numbers) / len(numbers)

# This works
print(calculate_average([1, 2, 3]))

# These will fail with helpful messages
# calculate_average([])  # AssertionError: Cannot calculate average of empty list
# calculate_average([1, "two", 3])  # AssertionError: All items must be numbers

Best Practices:

Use for internal checks (sanity checks)
NOT for user input validation (use exceptions)
Can be disabled with python -O (optimize flag)

7. Performance Profiling#

“Premature optimization is the root of all evil” - Donald Knuth

But when you DO need to optimize, profile first!

Timing Code - Simple Way#

import time

def slow_function():
    total = 0
    for i in range(1000000):
        total += i
    return total

# Time it
start = time.time()
result = slow_function()
end = time.time()

print(f"Took {end - start:.4f} seconds")

Using timeit Module#

import timeit

# Time a snippet (runs multiple times for accuracy)
time_taken = timeit.timeit(
    stmt="sum(range(1000))",
    number=10000  # Run 10000 times
)

print(f"Average time: {time_taken / 10000:.6f} seconds")

# Compare two approaches
approach1 = timeit.timeit("[i**2 for i in range(100)]", number=10000)
approach2 = timeit.timeit("list(map(lambda x: x**2, range(100)))", number=10000)

print(f"List comprehension: {approach1:.6f}s")
print(f"Map/lambda: {approach2:.6f}s")
print(f"Winner: {'List comprehension' if approach1 < approach2 else 'Map/lambda'}")

Jupyter Magic Commands#

# Time a single line
%timeit sum(range(1000))

# Time a cell
%%timeit
total = 0
for i in range(1000):
    total += i

cProfile - Detailed Profiling#

import cProfile
import pstats
from io import StringIO

def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

def calculate_fibs():
    results = []
    for i in range(20):
        results.append(fibonacci(i))
    return results

# Profile the function
profiler = cProfile.Profile()
profiler.enable()

calculate_fibs()

profiler.disable()

# Print stats
s = StringIO()
ps = pstats.Stats(profiler, stream=s).sort_stats('cumulative')
ps.print_stats(10)  # Top 10 functions
print(s.getvalue())

Reading cProfile output:

ncalls - Number of times called
tottime - Total time in this function (excluding sub-calls)
cumtime - Total time including sub-calls
Focus on functions with high cumtime!

line_profiler - Line-by-Line Profiling#

# Install: pip install line_profiler

# Use with @profile decorator
# Then run: kernprof -l -v script.py

# Example:
# @profile
# def slow_function():
#     for i in range(1000):
#         x = i ** 2
#     return x

8. Memory Profiling#

Checking Memory Usage#

import sys

# Size of objects
numbers = list(range(1000))
print(f"List size: {sys.getsizeof(numbers)} bytes")

text = "Hello" * 1000
print(f"String size: {sys.getsizeof(text)} bytes")

# Memory-efficient alternatives
import array
efficient_numbers = array.array('i', range(1000))
print(f"Array size: {sys.getsizeof(efficient_numbers)} bytes")
print(f"Savings: {sys.getsizeof(numbers) - sys.getsizeof(efficient_numbers)} bytes")

memory_profiler#

# Install: pip install memory_profiler

# Use with @profile decorator
# Run: python -m memory_profiler script.py

# Example:
# @profile
# def memory_hog():
#     big_list = [0] * (10 ** 6)
#     return sum(big_list)

9. Debugging Strategies#

Binary Search Debugging#

Problem: Code worked yesterday, broken today. 1000 lines changed.

Solution:

Comment out half the code
Does it work? Bug is in commented half. Uncomment it.
Repeat until you find the buggy line

With Git:

git bisect start
git bisect bad           # Current commit is broken
git bisect good abc123   # This old commit worked
# Git checks out middle commit
# Test it...
git bisect good  # or bad
# Repeat until bug is found

Rubber Duck Debugging#

Get a rubber duck (or colleague, or pet)
Explain your code line by line
Often you’ll spot the bug while explaining!

Why it works: Forces you to think clearly about what code does

Wolf Fence Debugging#

“There’s a wolf in Alaska. How do you find it?”

Build a fence across middle of Alaska
Wolf is on one side
Build fence across middle of that half
Repeat until you find the wolf

Applied to debugging:

# Code is 100 lines, bug somewhere
# Line 50
print("Checkpoint A")

# Does it print? If yes, bug is after line 50
# If no, bug is before line 50
# Repeat in that half!

Minimal Reproducible Example#

Remove everything not related to the bug:

# Original: 500 lines, 10 functions
# Minimal example:
def buggy():
    x = [1, 2, 3]
    return x[5]  # Bug found!

Benefits:

Often find bug while creating minimal example
Easy to share for help (Stack Overflow)
Easy to write test case

10. Writing Debuggable Code#

Make Code Easy to Debug#

Bad:

def f(x,y,z): return x+y*z if x>0 else y-z

Good:

def calculate_result(base, multiplier, offset):
    """Calculate result based on base value."""
    if base > 0:
        result = base + (multiplier * offset)
    else:
        result = multiplier - offset
    return result

Use Type Hints#

def process_user(name: str, age: int) -> dict:
    """Process user data and return formatted dict."""
    return {
        "name": name.upper(),
        "age": age,
        "is_adult": age >= 18
    }

# Type hints help catch bugs with mypy:
# mypy script.py
# error: Argument 1 to "process_user" has incompatible type "int"; expected "str"

Defensive Programming#

def safe_divide(a, b):
    """
    Safely divide two numbers.
    
    Args:
        a: Numerator
        b: Denominator
    
    Returns:
        Result of division or None if invalid
    """
    # Validate inputs
    if not isinstance(a, (int, float)):
        raise TypeError(f"a must be a number, got {type(a)}")
    if not isinstance(b, (int, float)):
        raise TypeError(f"b must be a number, got {type(b)}")
    
    # Check for division by zero
    if b == 0:
        logging.warning("Attempted division by zero")
        return None
    
    result = a / b
    
    # Validate output
    assert isinstance(result, (int, float)), "Result must be a number"
    
    return result

print(safe_divide(10, 2))
print(safe_divide(10, 0))

Early Returns#

def process_data(data):
    """Process data with early validation."""
    
    # Validate early, return early
    if not data:
        return []
    
    if not isinstance(data, list):
        raise TypeError("Data must be a list")
    
    # Now we know data is a non-empty list
    results = []
    for item in data:
        # ... process item ...
        results.append(item * 2)
    
    return results

📝 Exercises#

Exercise 1: Debug This Code#

Find and fix the bugs:

def calculate_average(numbers):
    total = 0
    for number in numbers:
        total += number
    average = total / len(numbers)
    return average

# Test cases - which ones fail?
# print(calculate_average([1, 2, 3, 4, 5]))
# print(calculate_average([]))
# print(calculate_average([1, "two", 3]))

Exercise 2: Add Logging#

Add appropriate logging to this function:

def fetch_user_data(user_id):
    # Add logging here!
    
    users = {
        1: {"name": "Alice", "email": "alice@example.com"},
        2: {"name": "Bob", "email": "bob@example.com"}
    }
    
    if user_id not in users:
        # Log error
        return None
    
    user = users[user_id]
    # Log success
    return user

Exercise 3: Profile Performance#

Which approach is faster?

def approach1(n):
    """Using list comprehension."""
    return [i ** 2 for i in range(n)]

def approach2(n):
    """Using loop."""
    result = []
    for i in range(n):
        result.append(i ** 2)
    return result

# Use timeit to compare them!
# Your code here

Exercise 4: Write Assertions#

Add assertions to validate inputs:

def calculate_grade(score, max_score):
    """Calculate percentage grade."""
    # Add assertions here!
    # - score should be >= 0
    # - max_score should be > 0
    # - score should be <= max_score
    
    percentage = (score / max_score) * 100
    return percentage

✅ Self-Check Quiz#

What’s the first thing to do when you encounter a bug?
How do you read a Python traceback?
What’s the difference between print() and logging?
What does pdb.set_trace() do?
Name 3 pdb commands and what they do
When should you use assertions?
What’s the purpose of profiling?
What is rubber duck debugging?
Why write type hints?
What makes code easy to debug?

🎯 Key Takeaways#

Read error messages - They tell you what’s wrong!
Print debugging works, but logging is better
pdb is powerful for interactive debugging
Profile before optimizing - Don’t guess!
Assertions catch bugs during development
Type hints help find bugs before runtime
Defensive programming prevents bugs
Binary search debugging saves time
Minimal examples help isolate bugs
Good code is easy to debug

🚀 Next Steps#

Debugging is a skill that improves with practice!

Practice by:

Debugging every bug you encounter
Using pdb instead of print sometimes
Reading other people’s stack traces
Profiling your slow code

Continue learning:

Security Essentials: tools/07_security_essentials.ipynb
Build Systems: tools/08_build_and_automation.ipynb

💡 Pro Tips#

Read the error - Don’t skim, read carefully
Reproduce consistently - Can’t fix what you can’t reproduce
Use version control - Know what changed
Take breaks - Fresh eyes find bugs faster
Ask for help - After you’ve tried (rubber duck first!)
Write tests - Catch bugs before users do
Keep it simple - Complex code has more bugs
Log, don’t print - Production code needs logging

📚 Resources#

Documentation:

Tools:

pytest - Testing framework
mypy - Static type checker
black - Code formatter
pylint - Code analyzer

Learning:

“Debugging is twice as hard as writing code in the first place. Therefore, if you write code as cleverly as possible, you are, by definition, not smart enough to debug it.” - Brian Kernighan

Write simple, debuggable code! 🐛✨