Hard Lesson 02: Generators and Iterators - Memory-Efficient Data Processing#
Master the art of memory-efficient data processing using generators, iterators, and coroutines.
Learning Objectives#
By the end of this lesson, you will be able to:
โ Understand the iterator protocol and implement custom iterators
โ Create generator functions using
yieldfor lazy evaluationโ Build memory-efficient data pipelines with generator expressions
โ Use advanced generator patterns:
yield from,send(),throw(),close()โ Implement coroutines for cooperative multitasking
โ Create infinite sequences and bounded iterators
โ Apply generators to real-world problems (file processing, data streaming)
โ Analyze memory and performance tradeoffs
Prerequisites#
Strong understanding of Python functions and scope
Familiarity with decorators and closures
Knowledge of list comprehensions
Understanding of memory management concepts
Why Generators and Iterators Matter#
Real-World Applications:
Big Data Processing: Stream terabytes of data without loading into memory
Web Scraping: Process paginated results efficiently
Log Analysis: Parse multi-gigabyte log files line by line
Machine Learning: Generate training batches on-the-fly
ETL Pipelines: Transform data streams in real-time
API Rate Limiting: Control request timing with generator-based delays
Part 1: The Iterator Protocol - Building Blocks of Iteration#
What is an Iterator?#
An iterator is an object that implements two methods:
__iter__(): Returns the iterator object itself__next__(): Returns the next value or raisesStopIteration
This protocol enables the for loop and other iteration contexts.
# Understanding how iteration works under the hood
numbers = [1, 2, 3, 4, 5]
# When you use 'for', Python calls iter() to get an iterator
iterator = iter(numbers)
print(f"Iterator object: {iterator}")
print(f"Type: {type(iterator)}")
# Then repeatedly calls next() until StopIteration
print(f"\nManual iteration:")
print(next(iterator)) # 1
print(next(iterator)) # 2
print(next(iterator)) # 3
print(next(iterator)) # 4
print(next(iterator)) # 5
# print(next(iterator)) # Would raise StopIteration
Creating a Custom Iterator#
Letโs build a custom iterator from scratch:
class Countdown:
"""
Custom iterator that counts down from a number.
This demonstrates the iterator protocol:
- __iter__() returns self (the iterator object)
- __next__() returns next value or raises StopIteration
"""
def __init__(self, start):
self.current = start
def __iter__(self):
"""Return the iterator object (self)."""
return self
def __next__(self):
"""Return the next value or raise StopIteration."""
if self.current <= 0:
raise StopIteration
value = self.current
self.current -= 1
return value
# Using the custom iterator
print("Countdown from 5:")
counter = Countdown(5)
for num in counter:
print(num, end=" ")
# Can't iterate again (iterator is exhausted)
print("\n\nTrying to iterate again:")
for num in counter:
print(num, end=" ")
print("(Nothing printed - iterator exhausted)")
Iterable vs Iterator#
Important distinction:
Iterable: Object that can return an iterator (has
__iter__())Iterator: Object that produces values (has
__iter__()and__next__())
An iterable can be iterated multiple times, an iterator is single-use.
class CountdownIterable:
"""
An ITERABLE (not iterator) that creates new iterators.
This allows multiple iterations.
"""
def __init__(self, start):
self.start = start
def __iter__(self):
"""Return a NEW iterator each time."""
return CountdownIterator(self.start)
class CountdownIterator:
"""The actual iterator."""
def __init__(self, start):
self.current = start
def __iter__(self):
return self
def __next__(self):
if self.current <= 0:
raise StopIteration
value = self.current
self.current -= 1
return value
# Now we can iterate multiple times
countdown = CountdownIterable(3)
print("First iteration:")
for num in countdown:
print(num, end=" ")
print("\n\nSecond iteration:")
for num in countdown:
print(num, end=" ")
print("\n\nWorks because each 'for' gets a fresh iterator!")
Part 2: Generator Functions - Elegant Iterators#
Why Generators?#
Writing custom iterator classes is verbose. Generators provide a simpler syntax using the yield keyword.
Key Benefits:
Simple Syntax: No need for
__iter__()and__next__()Automatic State Management: Local variables are preserved between calls
Memory Efficient: Values are generated on-demand
Lazy Evaluation: Compute only whatโs needed
def countdown(n):
"""
Generator function for counting down.
Much simpler than the class-based iterator!
"""
while n > 0:
yield n # Pause here and return n
n -= 1 # Resume here on next call
# Using the generator
print("Countdown from 5:")
for num in countdown(5):
print(num, end=" ")
# Generators are single-use (like iterators)
gen = countdown(3)
print("\n\nFirst iteration:", list(gen))
print("Second iteration:", list(gen)) # Empty!
How Generators Work: Execution Flow#
When you call a generator function:
It returns a generator object (doesnโt execute the body)
Calling
next()executes until the firstyieldyieldpauses execution and returns a valueNext
next()call resumes after theyieldWhen function ends, raises
StopIteration
def demo_generator():
"""Demonstrate generator execution flow."""
print(" [Generator started]")
print(" [About to yield 1]")
yield 1
print(" [Resumed after yield 1]")
print(" [About to yield 2]")
yield 2
print(" [Resumed after yield 2]")
print(" [About to yield 3]")
yield 3
print(" [Generator ending]")
print("Creating generator:")
gen = demo_generator()
print(f"Type: {type(gen)}\n")
print("First next():")
value = next(gen)
print(f"Got value: {value}\n")
print("Second next():")
value = next(gen)
print(f"Got value: {value}\n")
print("Third next():")
value = next(gen)
print(f"Got value: {value}\n")
print("Fourth next() (will raise StopIteration):")
try:
next(gen)
except StopIteration:
print("StopIteration raised!")
Classic Example: Fibonacci Sequence#
Generators shine when producing sequences:
def fibonacci(n):
"""
Generate the first n Fibonacci numbers.
Memory efficient: doesn't store all numbers in a list.
"""
a, b = 0, 1
for _ in range(n):
yield a
a, b = b, a + b
print("First 15 Fibonacci numbers:")
for i, fib in enumerate(fibonacci(15), 1):
print(f"F({i}) = {fib}")
# Can convert to list if needed
print("\nAs a list:", list(fibonacci(10)))
Part 3: Memory Efficiency - The Power of Lazy Evaluation#
List vs Generator: Memory Comparison#
Letโs see why generators are memory-efficient:
import sys
# List approach: stores all values in memory
def squares_list(n):
"""Return list of squares from 0 to n-1."""
return [x**2 for x in range(n)]
# Generator approach: computes on-demand
def squares_generator(n):
"""Yield squares from 0 to n-1."""
for x in range(n):
yield x**2
# Compare memory usage
n = 100000
# List version
squares_l = squares_list(n)
list_size = sys.getsizeof(squares_l)
print(f"List of {n:,} squares:")
print(f" Memory: {list_size:,} bytes ({list_size / 1024 / 1024:.2f} MB)")
print(f" First 5: {squares_l[:5]}")
# Generator version
squares_g = squares_generator(n)
gen_size = sys.getsizeof(squares_g)
print(f"\nGenerator for {n:,} squares:")
print(f" Memory: {gen_size:,} bytes ({gen_size / 1024:.2f} KB)")
print(f" First 5: {[next(squares_g) for _ in range(5)]}")
# Memory savings
savings = (list_size - gen_size) / list_size * 100
print(f"\n๐ฏ Memory savings: {savings:.2f}%")
print(f" ({list_size / gen_size:.0f}x smaller)")
Generator Expressions#
Like list comprehensions, but with () instead of []:
# List comprehension - creates entire list
list_comp = [x**2 for x in range(10)]
print(f"List comprehension: {list_comp}")
print(f"Type: {type(list_comp)}")
print(f"Size: {sys.getsizeof(list_comp)} bytes\n")
# Generator expression - creates generator
gen_exp = (x**2 for x in range(10))
print(f"Generator expression: {gen_exp}")
print(f"Type: {type(gen_exp)}")
print(f"Size: {sys.getsizeof(gen_exp)} bytes")
print(f"Values: {list(gen_exp)}")
# Perfect for operations that don't need the full list
print("\n๐ฏ Use cases for generator expressions:")
# Sum (only needs one value at a time)
total = sum(x**2 for x in range(1000000))
print(f"Sum of first million squares: {total:,}")
# Any/all (can short-circuit)
has_large = any(x > 50 for x in range(100))
print(f"Has number > 50: {has_large}")
# Max/min
largest = max(x**2 for x in range(1000))
print(f"Largest square: {largest:,}")
Real-World Example: Processing Large Files#
Generators excel at processing large files line by line:
def process_large_file(filename):
"""
Generator that processes file line by line.
Memory-efficient: doesn't load entire file into memory.
Useful for multi-gigabyte log files.
"""
with open(filename, 'r') as f:
for line in f: # File objects are iterators!
# Process each line
cleaned = line.strip()
if cleaned and not cleaned.startswith('#'):
yield cleaned
def count_errors_in_log(filename):
"""Count ERROR lines in a log file (memory-efficient)."""
return sum(1 for line in process_large_file(filename)
if 'ERROR' in line)
# Example simulation (without actual file)
def simulate_log_lines():
"""Simulate log file processing."""
logs = [
"INFO: Application started",
"DEBUG: Loading config",
"ERROR: Failed to connect to database",
"INFO: Retrying connection",
"ERROR: Connection timeout",
"INFO: Using fallback database",
"# This is a comment",
"",
"ERROR: Invalid user input",
]
for log in logs:
yield log.strip()
print("Processing log file:")
error_count = sum(1 for line in simulate_log_lines()
if line and not line.startswith('#') and 'ERROR' in line)
print(f"Found {error_count} errors")
print("\n๐ Log summary:")
log_types = {}
for line in simulate_log_lines():
if line and not line.startswith('#'):
log_type = line.split(':')[0] if ':' in line else 'UNKNOWN'
log_types[log_type] = log_types.get(log_type, 0) + 1
for log_type, count in sorted(log_types.items()):
print(f" {log_type}: {count}")
Part 4: Generator Pipelines - Composing Data Transformations#
Building Data Pipelines#
Generators can be chained to create elegant data processing pipelines:
def read_data(n=20):
"""Stage 1: Generate data source."""
print("[Stage 1: Generating data]")
for i in range(1, n + 1):
yield i
def filter_even(numbers):
"""Stage 2: Filter only even numbers."""
print("[Stage 2: Filtering even numbers]")
for num in numbers:
if num % 2 == 0:
print(f" โ {num} is even")
yield num
else:
print(f" โ {num} is odd (skipped)")
def square(numbers):
"""Stage 3: Square each number."""
print("[Stage 3: Squaring numbers]")
for num in numbers:
result = num ** 2
print(f" {num}ยฒ = {result}")
yield result
def take(n, iterable):
"""Stage 4: Take only first n items."""
print(f"[Stage 4: Taking first {n} items]")
for i, item in enumerate(iterable):
if i >= n:
break
yield item
# Build the pipeline
print("Building pipeline: data โ filter_even โ square โ take(5)\n")
pipeline = take(5, square(filter_even(read_data(20))))
print("\nExecuting pipeline (lazy evaluation):")
print("="*50)
result = list(pipeline)
print("="*50)
print(f"\nFinal result: {result}")
print(f"\n๐ฏ Notice: Each stage processes on-demand!")
Pipeline Pattern: ETL (Extract, Transform, Load)#
A common pattern in data engineering:
def extract_records():
"""Extract: Simulate reading from data source."""
records = [
{"id": 1, "name": "Alice", "age": 30, "city": "NYC"},
{"id": 2, "name": "Bob", "age": 25, "city": "LA"},
{"id": 3, "name": "Charlie", "age": 35, "city": "NYC"},
{"id": 4, "name": "David", "age": 28, "city": "SF"},
{"id": 5, "name": "Eve", "age": 32, "city": "NYC"},
]
for record in records:
yield record
def transform_filter_city(records, city):
"""Transform: Filter by city."""
for record in records:
if record['city'] == city:
yield record
def transform_add_category(records):
"""Transform: Add age category."""
for record in records:
if record['age'] < 30:
record['category'] = 'Young'
else:
record['category'] = 'Senior'
yield record
def load_to_storage(records):
"""Load: Simulate saving to database."""
results = []
for record in records:
print(f"Saving: {record}")
results.append(record)
return results
# ETL Pipeline
print("ETL Pipeline: Extract โ Filter(NYC) โ Add Category โ Load\n")
pipeline = transform_add_category(
transform_filter_city(
extract_records(),
city='NYC'
)
)
saved_records = load_to_storage(pipeline)
print(f"\nโ
Loaded {len(saved_records)} records")
Part 5: Infinite Generators - Unbounded Sequences#
Creating Infinite Sequences#
Generators can represent infinite sequences (use with caution!):
def infinite_counter(start=0):
"""Generate infinite sequence of integers."""
n = start
while True: # Infinite loop!
yield n
n += 1
# Safe: use with a limit
counter = infinite_counter(100)
print("First 10 numbers starting from 100:")
for _ in range(10):
print(next(counter), end=" ")
print("\n\n๐ Infinite Fibonacci:")
def fibonacci_infinite():
"""Generate Fibonacci numbers forever."""
a, b = 0, 1
while True:
yield a
a, b = b, a + b
# Take only what you need
fib = fibonacci_infinite()
print("First 20 Fibonacci numbers:")
for i, num in enumerate(fib):
if i >= 20:
break
print(num, end=" ")
Practical Use: Cycle and Repeat Patterns#
from itertools import cycle, repeat, islice
# Cycle: repeat sequence infinitely
colors = cycle(['red', 'green', 'blue'])
print("Cycling through colors (first 10):")
for i, color in enumerate(colors):
if i >= 10:
break
print(f" {i}: {color}")
# Repeat: repeat single value
print("\nRepeat 'X' 5 times:")
for val in repeat('X', 5):
print(val, end=" ")
# Combining with zip for padding
print("\n\nZipping with infinite repeat:")
names = ['Alice', 'Bob', 'Charlie']
scores = [95, 87] # Fewer scores than names
# Pad scores with 0
padded_scores = islice(iter(scores + [0] * len(names)), len(names))
for name, score in zip(names, scores + [0] * len(names)):
print(f" {name}: {score}")
Part 6: Advanced Generator Features#
Generator Methods: send(), throw(), close()#
Generators can receive values and exceptions:
def running_average():
"""
Coroutine that maintains a running average.
Uses send() to receive values.
"""
total = 0
count = 0
average = None
while True:
# Receive value sent via send()
value = yield average
if value is None:
break
total += value
count += 1
average = total / count
# Create coroutine
avg = running_average()
# MUST call next() or send(None) to prime the coroutine
next(avg) # Advance to first yield
print("Running average coroutine:")
print(f" Send 10: {avg.send(10)}")
print(f" Send 20: {avg.send(20)}")
print(f" Send 30: {avg.send(30)}")
print(f" Send 40: {avg.send(40)}")
print(f"\nโ
Average of [10, 20, 30, 40] = {avg.send(50)}")
Using throw() to Send Exceptions#
def generator_with_exception_handling():
"""
Generator that can handle exceptions sent via throw().
"""
try:
while True:
value = yield
print(f" Received: {value}")
except ValueError as e:
print(f" โ ๏ธ Caught ValueError: {e}")
yield "Error handled"
finally:
print(" ๐ Generator closing")
gen = generator_with_exception_handling()
next(gen) # Prime
print("Sending values:")
gen.send(10)
gen.send(20)
print("\nThrowing exception:")
try:
result = gen.throw(ValueError, "Invalid input!")
print(f" Result after exception: {result}")
except StopIteration:
print(" Generator stopped")
Using close() to Stop a Generator#
def generator_with_cleanup():
"""
Generator with cleanup logic.
"""
try:
print(" ๐ง Setting up resources...")
for i in range(10):
yield i
finally:
print(" ๐งน Cleaning up resources...")
gen = generator_with_cleanup()
print("Getting first 3 values:")
for _ in range(3):
print(f" Value: {next(gen)}")
print("\nClosing generator early:")
gen.close()
print("\nTrying to use closed generator:")
try:
next(gen)
except StopIteration:
print(" โ Generator is closed!")
Part 7: yield from - Delegating to Subgenerators#
Generator Delegation#
The yield from syntax delegates to another generator:
def generator1():
"""First generator."""
yield 1
yield 2
yield 3
def generator2():
"""Second generator."""
yield 'a'
yield 'b'
yield 'c'
# Without yield from (manual delegation)
def combined_manual():
"""Manually combine generators."""
for value in generator1():
yield value
for value in generator2():
yield value
# With yield from (cleaner)
def combined_yield_from():
"""Use yield from for delegation."""
yield from generator1()
yield from generator2()
print("Manual delegation:")
print(list(combined_manual()))
print("\nUsing yield from:")
print(list(combined_yield_from()))
Flattening Nested Structures#
def flatten(nested_list):
"""
Recursively flatten a nested list.
Uses yield from for elegant recursion.
"""
for item in nested_list:
if isinstance(item, list):
# Recursively flatten sublists
yield from flatten(item)
else:
yield item
nested = [1, [2, 3, [4, 5]], 6, [7, [8, 9]]]
print(f"Nested: {nested}")
print(f"Flattened: {list(flatten(nested))}")
# More complex example
complex_nested = [
1,
[2, 3],
[[4, 5], [6]],
[[[7]], 8],
9
]
print(f"\nComplex nested: {complex_nested}")
print(f"Flattened: {list(flatten(complex_nested))}")
Part 8: Real-World Applications#
Example 1: Batch Processing for Machine Learning#
def batch_generator(data, batch_size):
"""
Generate batches for training neural networks.
Memory-efficient: doesn't load all batches at once.
"""
for i in range(0, len(data), batch_size):
yield data[i:i + batch_size]
# Simulate training data
training_data = list(range(1, 101)) # 100 samples
batch_size = 10
print(f"Training on {len(training_data)} samples in batches of {batch_size}\n")
for epoch in range(1, 3): # 2 epochs
print(f"Epoch {epoch}:")
for batch_num, batch in enumerate(batch_generator(training_data, batch_size), 1):
# Simulate training
avg = sum(batch) / len(batch)
print(f" Batch {batch_num}: size={len(batch)}, avg={avg:.1f}")
print()
Example 2: API Pagination Handler#
def fetch_paginated_api(max_pages=5):
"""
Simulate fetching paginated API results.
In real code, this would make HTTP requests.
Generator allows processing results as they arrive.
"""
page = 1
while page <= max_pages:
# Simulate API response
results = [
{"id": (page - 1) * 10 + i, "value": f"Item {(page - 1) * 10 + i}"}
for i in range(1, 11)
]
print(f" ๐ฅ Fetched page {page}")
# Yield each result
for result in results:
yield result
page += 1
# Check if there are more pages (in real code, check API response)
if page > max_pages:
print(f" โ
No more pages\n")
break
print("Fetching API results:\n")
for i, item in enumerate(fetch_paginated_api(max_pages=3), 1):
if i <= 5 or i > 25: # Show first and last few
print(f" Item {i}: {item}")
elif i == 6:
print(f" ... (processing items 6-25) ...")
Example 3: Moving Average Calculator#
from collections import deque
def moving_average(data, window_size):
"""
Calculate moving average over a sliding window.
Memory-efficient for large datasets.
"""
window = deque(maxlen=window_size)
for value in data:
window.append(value)
if len(window) == window_size:
yield sum(window) / window_size
# Stock prices simulation
prices = [100, 102, 98, 105, 110, 108, 112, 115, 111, 114]
window = 3
print(f"Stock prices: {prices}")
print(f"\nMoving average (window={window}):")
for i, avg in enumerate(moving_average(prices, window), window):
print(f" Day {i}: ${avg:.2f}")
Part 9: Performance Comparison#
Benchmark: List vs Generator#
import time
import sys
def benchmark_list_vs_generator():
"""Compare performance of list vs generator."""
n = 1000000
# List approach
start = time.time()
list_result = [x**2 for x in range(n)]
first_10_list = list_result[:10]
list_time = time.time() - start
list_memory = sys.getsizeof(list_result)
# Generator approach
start = time.time()
gen_result = (x**2 for x in range(n))
first_10_gen = [next(gen_result) for _ in range(10)]
gen_time = time.time() - start
gen_memory = sys.getsizeof(gen_result)
print(f"Computing first 10 squares from {n:,} numbers:\n")
print("List Comprehension:")
print(f" Time: {list_time*1000:.2f} ms")
print(f" Memory: {list_memory:,} bytes ({list_memory/1024/1024:.2f} MB)")
print(f" Result: {first_10_list}")
print("\nGenerator Expression:")
print(f" Time: {gen_time*1000:.4f} ms")
print(f" Memory: {gen_memory:,} bytes")
print(f" Result: {first_10_gen}")
print("\n๐ Comparison:")
print(f" Speed: Generator is {list_time/gen_time:.0f}x faster")
print(f" Memory: Generator uses {list_memory/gen_memory:.0f}x less memory")
benchmark_list_vs_generator()
Exercises#
Exercise 1: Custom Range Iterator#
Implement a custom MyRange class that mimics Pythonโs range() behavior using the iterator protocol.
# Your code here
class MyRange:
"""
Custom range implementation.
Should support:
- MyRange(stop)
- MyRange(start, stop)
- MyRange(start, stop, step)
"""
def __init__(self, *args):
# TODO: Implement __init__
pass
def __iter__(self):
# TODO: Return iterator
pass
def __next__(self):
# TODO: Return next value or raise StopIteration
pass
# Test your implementation
# print("MyRange(5):", list(MyRange(5)))
# print("MyRange(2, 8):", list(MyRange(2, 8)))
# print("MyRange(0, 10, 2):", list(MyRange(0, 10, 2)))
# print("MyRange(10, 0, -1):", list(MyRange(10, 0, -1)))
Exercise 2: Generator Pipeline for Data Processing#
Build the generator pipeline from the original exercise:
Generate random numbers between 1 and 100
Filter numbers divisible by both 3 and 5 (divisible by 15)
Transform each number by multiplying by 2
Stop after finding 10 numbers that meet the criteria
Compare memory usage to a list-based approach.
# Your code here
import random
import sys
# TODO: Implement generator pipeline
def generate_random_numbers():
"""Generate infinite stream of random numbers between 1 and 100."""
pass # TODO
def filter_divisible_by_15(numbers):
"""Filter numbers divisible by 15."""
pass # TODO
def multiply_by_2(numbers):
"""Multiply each number by 2."""
pass # TODO
def take_n(iterable, n):
"""Take first n items from iterable."""
pass # TODO
# Build pipeline
# pipeline = ...
# result = list(pipeline)
# print(f"Result: {result}")
# Compare with list-based approach
# TODO: Implement list-based version and compare memory usage
Exercise 3: File Processing with Generators#
Create a generator function that:
Reads a CSV-like string (simulate file reading)
Parses each line into a dictionary
Filters rows where a specific column meets a condition
Yields the processed records
# Your code here
# Sample CSV data
csv_data = """name,age,city,salary
Alice,30,NYC,80000
Bob,25,LA,65000
Charlie,35,NYC,95000
David,28,SF,75000
Eve,32,NYC,88000
Frank,29,LA,70000"""
def parse_csv_lines(csv_string):
"""
Generator that parses CSV string into dictionaries.
Yields one dictionary per row.
"""
pass # TODO
def filter_records(records, column, condition):
"""
Filter records based on condition.
Args:
records: Generator of dictionaries
column: Column name to check
condition: Function that returns True/False
"""
pass # TODO
# Test your implementation
# records = parse_csv_lines(csv_data)
# high_earners = filter_records(records, 'salary', lambda x: int(x) > 75000)
# for record in high_earners:
# print(record)
Exercise 4: Coroutine-Based Logger#
Create a coroutine that receives log messages via send() and:
Categorizes them by level (INFO, WARNING, ERROR)
Maintains counts for each level
Returns summary statistics when receiving
None
# Your code here
def log_analyzer():
"""
Coroutine that analyzes log messages.
Send log messages like: "ERROR: Connection failed"
Send None to get summary statistics.
"""
pass # TODO
# Test your implementation
# logger = log_analyzer()
# next(logger) # Prime the coroutine
# logger.send("INFO: Application started")
# logger.send("ERROR: Connection failed")
# logger.send("WARNING: High memory usage")
# logger.send("ERROR: Timeout occurred")
# logger.send("INFO: Request completed")
# summary = logger.send(None)
# print(f"Summary: {summary}")
Pro Tips#
๐ฏ Best Practices#
Use generators for large datasets: When data doesnโt fit in memory
Prefer generator expressions: More concise than generator functions for simple transformations
Prime coroutines: Always call
next()orsend(None)before usingsend()Be careful with infinite generators: Always use limiting mechanisms (take_n, islice, break)
Use
yield fromfor delegation: Cleaner than manual for-loop delegationConsider itertools: Built-in module with powerful generator utilities
Document generator state: Make clear if generator is single-use or reusable
Use generators for pipelines: Chain operations for readable, efficient code
โ ๏ธ Common Mistakes#
Forgetting generators are single-use: Canโt iterate twice without recreating
Not priming coroutines: Must call
next()beforesend()Converting to list unnecessarily: Defeats the purpose of lazy evaluation
Infinite generators without limits: Can cause infinite loops
Ignoring StopIteration: Should be handled in manual iteration
Mixing iteration protocols: Donโt mix
__iter__/__next__withyieldin same classNot using
yield from: Manual delegation is more error-proneForgetting cleanup: Use try/finally or context managers for resource cleanup
๐ When to Use What#
Use Lists When:
Data fits comfortably in memory
Need random access or indexing
Need to iterate multiple times
Want to modify elements in-place
Use Generators When:
Data is very large or infinite
Only need to iterate once
Processing data in a pipeline
Want to minimize memory usage
Implementing iteration protocol
Key Takeaways#
Iterators implement
__iter__()and__next__()for custom iteration logicGenerators provide elegant iterator creation using
yieldkeywordLazy evaluation means values are computed on-demand, saving memory
Generator expressions offer memory-efficient alternative to list comprehensions
Pipelines chain generators for readable, efficient data processing
Infinite sequences are possible with generators (use carefully)
Coroutines use
send(),throw(), andclose()for bidirectional communicationyield fromsimplifies generator delegation and subgenerator handlingPerformance: Generators often faster and always more memory-efficient for large data
Real-world: Essential for big data, streaming, ETL, ML batching, and API handling
Next Steps#
Explore itertools: Study
itertoolsmodule (chain, product, permutations, etc.)Async generators: Learn about
async defandasync forfor asynchronous iterationContext managers: Combine generators with context managers using
contextlib.contextmanagerData streaming: Build real-time data processing pipelines
Performance profiling: Use
timeitandmemory_profilerto measure improvementsPractice with large datasets: Process real CSV files, logs, or API data
Resources:
Continue to the next lesson on Algorithms and Complexity to apply these concepts to algorithm design!