Debugging and Profiling: Finding and Fixing Bugs#
Debugging is a superpower. Professional developers spend 50%+ of their time debugging. Learning to debug effectively will make you 10x more productive.
What You’ll Learn#
Debugging mindset and strategies
Using print statements effectively
Python’s built-in debugger (pdb)
IDE debugging tools
Reading stack traces
Performance profiling
Memory profiling
Best practices for bug-free code
💡 Real-World Analogy#
Debugging is like being a detective:
🔍 Gather clues (error messages, logs)
🧪 Form hypotheses (what might be wrong?)
🔬 Test theories (add print statements, use debugger)
✅ Solve the case (fix the bug!)
1. The Debugging Mindset#
Accept That Bugs Are Normal#
Everyone writes bugs - even senior engineers!
Bugs are learning opportunities
The best developers are the best debuggers
The Scientific Method#
Observe - What’s happening? What’s expected?
Hypothesize - What could cause this?
Test - Add logging, use debugger
Analyze - Were you right? If not, new hypothesis!
Fix - Apply the solution
Verify - Does it work now?
Common Debugging Mistakes#
❌ Random code changes - “Maybe if I change this…” ❌ Assuming - “This can’t be the problem” ❌ Not reading error messages - They tell you exactly what’s wrong! ❌ Debugging tired - Take breaks!
✅ Read error messages carefully ✅ Reproduce the bug consistently ✅ Isolate the problem - Binary search through code ✅ Use version control - Revert if needed ✅ Rubber duck debugging - Explain to someone/something
2. Reading Error Messages#
Error messages are your friends! They tell you exactly what’s wrong.
Anatomy of a Stack Trace#
def divide_numbers(a, b):
return a / b
def calculate_average(numbers):
total = sum(numbers)
count = len(numbers)
return divide_numbers(total, count)
# This will error!
result = calculate_average([])
Reading the traceback:
Traceback (most recent call last): ← Start here
File "script.py", line 10, in <module> ← Where you called it
result = calculate_average([]) ← The line that failed
File "script.py", line 7, in calculate_average
return divide_numbers(total, count) ← Function that called the error
File "script.py", line 2, in divide_numbers
return a / b ← The actual error location
ZeroDivisionError: division by zero ← What went wrong
How to read it:
Bottom: The actual error type and message
Work upward: See the call stack
Find YOUR code: Focus on files you wrote
Common Python Exceptions#
# SyntaxError - Code won't even run
# if x > 5 # Missing colon!
# NameError - Variable doesn't exist
# print(undefined_variable)
# TypeError - Wrong type for operation
# result = "hello" + 5
# AttributeError - Object doesn't have that attribute
# numbers = [1, 2, 3]
# numbers.append_item(4) # Should be append()
# IndexError - List index out of range
# numbers = [1, 2, 3]
# print(numbers[10])
# KeyError - Dictionary key doesn't exist
# person = {"name": "Alice"}
# print(person["age"])
# ValueError - Right type, wrong value
# number = int("hello")
# ZeroDivisionError - Division by zero
# result = 10 / 0
# FileNotFoundError - File doesn't exist
# with open("nonexistent.txt") as f:
# content = f.read()
print("All errors are commented out to prevent crashes")
3. Print Debugging#
The most common debugging technique! Simple but effective.
Basic Print Debugging#
def buggy_function(data):
print(f"DEBUG: Input data = {data}") # What did we receive?
result = []
for item in data:
print(f"DEBUG: Processing item = {item}") # Track progress
if item > 0:
value = item * 2
print(f"DEBUG: Calculated value = {value}") # Check calculation
result.append(value)
print(f"DEBUG: Final result = {result}") # What are we returning?
return result
buggy_function([1, -2, 3, 0, 5])
Better Print Debugging#
import sys
def debug_print(*args, **kwargs):
"""
Enhanced print for debugging.
Prints to stderr (won't interfere with normal output)
"""
print("[DEBUG]", *args, **kwargs, file=sys.stderr)
def calculate(x, y):
debug_print(f"calculate called with x={x}, y={y}")
result = x + y
debug_print(f"result = {result}")
return result
calculate(5, 3)
Using f-strings for Debugging (Python 3.8+)#
x = 42
y = "hello"
z = [1, 2, 3]
# The = sign shows variable name AND value!
print(f"{x=}")
print(f"{y=}")
print(f"{z=}")
# Works with expressions too
print(f"{len(z)=}")
print(f"{x * 2=}")
Conditional Debug Prints#
DEBUG = True # Toggle this to enable/disable debug output
def debug_log(message):
if DEBUG:
print(f"[DEBUG] {message}")
def process_data(data):
debug_log(f"Starting processing with {len(data)} items")
for i, item in enumerate(data):
debug_log(f"Item {i}: {item}")
# ... processing ...
debug_log("Processing complete")
process_data([1, 2, 3])
4. Python Debugger (pdb)#
pdb is Python’s built-in interactive debugger. It’s like pausing time and inspecting everything!
Basic pdb Usage#
import pdb
def calculate_total(prices):
total = 0
pdb.set_trace() # ← Execution pauses here!
for price in prices:
total += price
return total
# Uncomment to try (won't work well in notebook):
# calculate_total([10, 20, 30])
pdb Commands#
When pdb pauses, you get an interactive prompt:
(Pdb) ?
Navigation:
n(next) - Execute current line, move to nexts(step) - Step INTO function callsc(continue) - Continue until next breakpointr(return) - Continue until current function returnsq(quit) - Exit debugger
Inspection:
p variable- Print variable valuepp variable- Pretty-print variablel(list) - Show current code contextll(longlist) - Show whole functionw(where) - Show stack tracea(args) - Show function arguments
Execution:
Any Python code! -
total = 100,len(prices), etc.!prefix for commands:!n = 5(set n to 5, not next)
Breakpoints:
b line_number- Set breakpoint at lineb function_name- Break when function is calledb file.py:20- Break at line 20 in file.pycl(clear) - Remove all breakpoints
Breakpoint() - Modern Way (Python 3.7+)#
def process_user_data(users):
results = []
for user in users:
# Modern way - cleaner than pdb.set_trace()
breakpoint() # ← Pauses here
name = user.get("name", "Unknown")
age = user.get("age", 0)
results.append(f"{name} is {age} years old")
return results
# Uncomment to try:
# process_user_data([{"name": "Alice", "age": 30}])
Post-Mortem Debugging#
import pdb
def buggy_code():
x = 10
y = 0
return x / y # This will crash!
try:
buggy_code()
except Exception:
# Enter debugger at the point of exception
pdb.post_mortem()
# Now you can inspect variables at crash time!
5. Logging#
Better than print statements for production code!
Basic Logging#
import logging
# Configure logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
def divide(a, b):
logger.debug(f"divide called with a={a}, b={b}")
if b == 0:
logger.error("Attempted division by zero!")
return None
result = a / b
logger.info(f"Result: {result}")
return result
divide(10, 2)
divide(10, 0)
Log Levels#
logger.debug("Detailed info for debugging") # DEBUG
logger.info("General information") # INFO
logger.warning("Warning - something unexpected") # WARNING
logger.error("Error occurred") # ERROR
logger.critical("Critical failure!") # CRITICAL
When to use:
DEBUG: Detailed diagnostic info
INFO: Confirm things are working
WARNING: Something unexpected but handled
ERROR: Serious problem, function failed
CRITICAL: Program may crash
Logging to File#
import logging
# Log to file instead of console
logging.basicConfig(
filename='app.log',
level=logging.DEBUG,
format='%(asctime)s - %(levelname)s - %(message)s'
)
logging.info("Application started")
logging.debug("Processing user data")
logging.warning("Low memory")
logging.error("Failed to connect to database")
6. Assert Statements#
Catch bugs early by asserting what should be true.
def calculate_average(numbers):
# Assertions help catch bugs during development
assert len(numbers) > 0, "Cannot calculate average of empty list"
assert all(isinstance(n, (int, float)) for n in numbers), "All items must be numbers"
return sum(numbers) / len(numbers)
# This works
print(calculate_average([1, 2, 3]))
# These will fail with helpful messages
# calculate_average([]) # AssertionError: Cannot calculate average of empty list
# calculate_average([1, "two", 3]) # AssertionError: All items must be numbers
Best Practices:
Use for internal checks (sanity checks)
NOT for user input validation (use exceptions)
Can be disabled with
python -O(optimize flag)
7. Performance Profiling#
“Premature optimization is the root of all evil” - Donald Knuth
But when you DO need to optimize, profile first!
Timing Code - Simple Way#
import time
def slow_function():
total = 0
for i in range(1000000):
total += i
return total
# Time it
start = time.time()
result = slow_function()
end = time.time()
print(f"Took {end - start:.4f} seconds")
Using timeit Module#
import timeit
# Time a snippet (runs multiple times for accuracy)
time_taken = timeit.timeit(
stmt="sum(range(1000))",
number=10000 # Run 10000 times
)
print(f"Average time: {time_taken / 10000:.6f} seconds")
# Compare two approaches
approach1 = timeit.timeit("[i**2 for i in range(100)]", number=10000)
approach2 = timeit.timeit("list(map(lambda x: x**2, range(100)))", number=10000)
print(f"List comprehension: {approach1:.6f}s")
print(f"Map/lambda: {approach2:.6f}s")
print(f"Winner: {'List comprehension' if approach1 < approach2 else 'Map/lambda'}")
Jupyter Magic Commands#
# Time a single line
%timeit sum(range(1000))
# Time a cell
%%timeit
total = 0
for i in range(1000):
total += i
cProfile - Detailed Profiling#
import cProfile
import pstats
from io import StringIO
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
def calculate_fibs():
results = []
for i in range(20):
results.append(fibonacci(i))
return results
# Profile the function
profiler = cProfile.Profile()
profiler.enable()
calculate_fibs()
profiler.disable()
# Print stats
s = StringIO()
ps = pstats.Stats(profiler, stream=s).sort_stats('cumulative')
ps.print_stats(10) # Top 10 functions
print(s.getvalue())
Reading cProfile output:
ncalls- Number of times calledtottime- Total time in this function (excluding sub-calls)cumtime- Total time including sub-callsFocus on functions with high
cumtime!
line_profiler - Line-by-Line Profiling#
# Install: pip install line_profiler
# Use with @profile decorator
# Then run: kernprof -l -v script.py
# Example:
# @profile
# def slow_function():
# for i in range(1000):
# x = i ** 2
# return x
8. Memory Profiling#
Checking Memory Usage#
import sys
# Size of objects
numbers = list(range(1000))
print(f"List size: {sys.getsizeof(numbers)} bytes")
text = "Hello" * 1000
print(f"String size: {sys.getsizeof(text)} bytes")
# Memory-efficient alternatives
import array
efficient_numbers = array.array('i', range(1000))
print(f"Array size: {sys.getsizeof(efficient_numbers)} bytes")
print(f"Savings: {sys.getsizeof(numbers) - sys.getsizeof(efficient_numbers)} bytes")
memory_profiler#
# Install: pip install memory_profiler
# Use with @profile decorator
# Run: python -m memory_profiler script.py
# Example:
# @profile
# def memory_hog():
# big_list = [0] * (10 ** 6)
# return sum(big_list)
9. Debugging Strategies#
Binary Search Debugging#
Problem: Code worked yesterday, broken today. 1000 lines changed.
Solution:
Comment out half the code
Does it work? Bug is in commented half. Uncomment it.
Repeat until you find the buggy line
With Git:
git bisect start
git bisect bad # Current commit is broken
git bisect good abc123 # This old commit worked
# Git checks out middle commit
# Test it...
git bisect good # or bad
# Repeat until bug is found
Rubber Duck Debugging#
Get a rubber duck (or colleague, or pet)
Explain your code line by line
Often you’ll spot the bug while explaining!
Why it works: Forces you to think clearly about what code does
Wolf Fence Debugging#
“There’s a wolf in Alaska. How do you find it?”
Build a fence across middle of Alaska
Wolf is on one side
Build fence across middle of that half
Repeat until you find the wolf
Applied to debugging:
# Code is 100 lines, bug somewhere
# Line 50
print("Checkpoint A")
# Does it print? If yes, bug is after line 50
# If no, bug is before line 50
# Repeat in that half!
Minimal Reproducible Example#
Remove everything not related to the bug:
# Original: 500 lines, 10 functions
# Minimal example:
def buggy():
x = [1, 2, 3]
return x[5] # Bug found!
Benefits:
Often find bug while creating minimal example
Easy to share for help (Stack Overflow)
Easy to write test case
10. Writing Debuggable Code#
Make Code Easy to Debug#
Bad:
def f(x,y,z): return x+y*z if x>0 else y-z
Good:
def calculate_result(base, multiplier, offset):
"""Calculate result based on base value."""
if base > 0:
result = base + (multiplier * offset)
else:
result = multiplier - offset
return result
Use Type Hints#
def process_user(name: str, age: int) -> dict:
"""Process user data and return formatted dict."""
return {
"name": name.upper(),
"age": age,
"is_adult": age >= 18
}
# Type hints help catch bugs with mypy:
# mypy script.py
# error: Argument 1 to "process_user" has incompatible type "int"; expected "str"
Defensive Programming#
def safe_divide(a, b):
"""
Safely divide two numbers.
Args:
a: Numerator
b: Denominator
Returns:
Result of division or None if invalid
"""
# Validate inputs
if not isinstance(a, (int, float)):
raise TypeError(f"a must be a number, got {type(a)}")
if not isinstance(b, (int, float)):
raise TypeError(f"b must be a number, got {type(b)}")
# Check for division by zero
if b == 0:
logging.warning("Attempted division by zero")
return None
result = a / b
# Validate output
assert isinstance(result, (int, float)), "Result must be a number"
return result
print(safe_divide(10, 2))
print(safe_divide(10, 0))
Early Returns#
def process_data(data):
"""Process data with early validation."""
# Validate early, return early
if not data:
return []
if not isinstance(data, list):
raise TypeError("Data must be a list")
# Now we know data is a non-empty list
results = []
for item in data:
# ... process item ...
results.append(item * 2)
return results
📝 Exercises#
Exercise 1: Debug This Code#
Find and fix the bugs:
def calculate_average(numbers):
total = 0
for number in numbers:
total += number
average = total / len(numbers)
return average
# Test cases - which ones fail?
# print(calculate_average([1, 2, 3, 4, 5]))
# print(calculate_average([]))
# print(calculate_average([1, "two", 3]))
Exercise 2: Add Logging#
Add appropriate logging to this function:
def fetch_user_data(user_id):
# Add logging here!
users = {
1: {"name": "Alice", "email": "alice@example.com"},
2: {"name": "Bob", "email": "bob@example.com"}
}
if user_id not in users:
# Log error
return None
user = users[user_id]
# Log success
return user
Exercise 3: Profile Performance#
Which approach is faster?
def approach1(n):
"""Using list comprehension."""
return [i ** 2 for i in range(n)]
def approach2(n):
"""Using loop."""
result = []
for i in range(n):
result.append(i ** 2)
return result
# Use timeit to compare them!
# Your code here
Exercise 4: Write Assertions#
Add assertions to validate inputs:
def calculate_grade(score, max_score):
"""Calculate percentage grade."""
# Add assertions here!
# - score should be >= 0
# - max_score should be > 0
# - score should be <= max_score
percentage = (score / max_score) * 100
return percentage
✅ Self-Check Quiz#
What’s the first thing to do when you encounter a bug?
How do you read a Python traceback?
What’s the difference between
print()andlogging?What does
pdb.set_trace()do?Name 3 pdb commands and what they do
When should you use assertions?
What’s the purpose of profiling?
What is rubber duck debugging?
Why write type hints?
What makes code easy to debug?
🎯 Key Takeaways#
Read error messages - They tell you what’s wrong!
Print debugging works, but logging is better
pdb is powerful for interactive debugging
Profile before optimizing - Don’t guess!
Assertions catch bugs during development
Type hints help find bugs before runtime
Defensive programming prevents bugs
Binary search debugging saves time
Minimal examples help isolate bugs
Good code is easy to debug
🚀 Next Steps#
Debugging is a skill that improves with practice!
Practice by:
Debugging every bug you encounter
Using pdb instead of print sometimes
Reading other people’s stack traces
Profiling your slow code
Continue learning:
Security Essentials:
tools/07_security_essentials.ipynbBuild Systems:
tools/08_build_and_automation.ipynb
💡 Pro Tips#
Read the error - Don’t skim, read carefully
Reproduce consistently - Can’t fix what you can’t reproduce
Use version control - Know what changed
Take breaks - Fresh eyes find bugs faster
Ask for help - After you’ve tried (rubber duck first!)
Write tests - Catch bugs before users do
Keep it simple - Complex code has more bugs
Log, don’t print - Production code needs logging
📚 Resources#
Documentation:
Tools:
Learning:
“Debugging is twice as hard as writing code in the first place. Therefore, if you write code as cleverly as possible, you are, by definition, not smart enough to debug it.” - Brian Kernighan
Write simple, debuggable code! 🐛✨