Lesson 6: Computer Systems and Theory#
Master computer architecture, operating systems, and theoretical computer science
Real-World Context#
Understanding computer systems is essential for writing efficient code, debugging performance issues, and system design. This knowledge powers: high-performance computing (Google scale), operating systems (Linux kernel), embedded systems (smartphones), and distributed systems (cloud infrastructure).
What Youโll Learn#
Computer Architecture: CPU, memory hierarchy, instruction execution
Memory Management: Virtual memory, paging, caching strategies
Concurrency: Threads, processes, synchronization primitives
Operating Systems: Scheduling, file systems, I/O management
Computational Complexity: P vs NP, complexity classes, reductions
Automata Theory: Finite state machines, regular expressions, grammars
Modern Hardware: Multi-core, GPUs, SIMD, memory models
System Design: Scalability, reliability, performance optimization
Prerequisites: Python, basic data structures, algorithms
Time: 4-5 hours
Part 1: Computer Architecture Fundamentals#
Von Neumann Architecture (1945)#
Core Components:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CPU (Central Processing Unit) โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ Control Unit โ โ ALU โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Registers โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ (Bus)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Memory (RAM) โ
โ [Instructions] [Data] [Stack] โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ I/O Devices โ
โ Keyboard, Display, Disk, Network โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
CPU Execution Cycle: Fetch-Decode-Execute#
Fetch: Load instruction from memory (PC โ Memory โ IR)
Decode: Interpret instruction (IR โ Control Unit)
Execute: Perform operation (ALU, Registers)
Write-back: Store result (Registers โ Memory)
Instruction Types#
Category |
Examples |
Purpose |
|---|---|---|
Data Transfer |
LOAD, STORE, MOVE |
Move data between registers/memory |
Arithmetic |
ADD, SUB, MUL, DIV |
Mathematical operations |
Logical |
AND, OR, XOR, NOT |
Bitwise operations |
Control Flow |
JMP, JZ, CALL, RET |
Program flow control |
I/O |
IN, OUT |
Interact with devices |
class SimpleCPU:
"""Educational CPU simulator with fetch-decode-execute cycle."""
def __init__(self):
# Registers
self.registers = {'A': 0, 'B': 0, 'C': 0, 'D': 0}
self.pc = 0 # Program Counter
self.ir = None # Instruction Register
self.flags = {'zero': False, 'carry': False} # Status flags
# Memory
self.memory = [0] * 256
# Statistics
self.cycles = 0
def fetch(self, program):
"""Fetch instruction from program memory."""
if self.pc < len(program):
self.ir = program[self.pc]
self.pc += 1
self.cycles += 1 # Fetch takes 1 cycle
return True
return False
def decode_execute(self):
"""Decode and execute instruction."""
if self.ir is None:
return
parts = self.ir.split()
opcode = parts[0]
if opcode == 'LOAD':
# LOAD reg, value
reg, value = parts[1], int(parts[2])
self.registers[reg] = value
print(f" LOAD {reg}, {value} โ {reg}={value}")
elif opcode == 'ADD':
# ADD reg1, reg2, dest
r1, r2, dest = parts[1], parts[2], parts[3]
result = self.registers[r1] + self.registers[r2]
self.registers[dest] = result
self.flags['zero'] = (result == 0)
print(f" ADD {r1}, {r2}, {dest} โ {dest}={result}")
elif opcode == 'SUB':
# SUB reg1, reg2, dest
r1, r2, dest = parts[1], parts[2], parts[3]
result = self.registers[r1] - self.registers[r2]
self.registers[dest] = result
self.flags['zero'] = (result == 0)
print(f" SUB {r1}, {r2}, {dest} โ {dest}={result}")
elif opcode == 'MUL':
# MUL reg1, reg2, dest
r1, r2, dest = parts[1], parts[2], parts[3]
result = self.registers[r1] * self.registers[r2]
self.registers[dest] = result
print(f" MUL {r1}, {r2}, {dest} โ {dest}={result}")
elif opcode == 'STORE':
# STORE reg, address
reg, addr = parts[1], int(parts[2])
self.memory[addr] = self.registers[reg]
print(f" STORE {reg}, [{addr}] โ Memory[{addr}]={self.registers[reg]}")
elif opcode == 'JMP':
# JMP address
addr = int(parts[1])
self.pc = addr
print(f" JMP {addr} โ PC={addr}")
elif opcode == 'HALT':
print(f" HALT")
return False
self.cycles += 1 # Execute takes 1 cycle (simplified)
return True
def run(self, program):
"""Execute complete program."""
print("๐ฅ๏ธ CPU Execution Start\n")
print("Cycle | Instruction")
print("โ" * 40)
while self.fetch(program):
print(f"{self.cycles:3d} |", end=" ")
if not self.decode_execute():
break
print("\n" + "โ" * 40)
print("โ
Execution Complete\n")
print(f"Final Registers: {self.registers}")
print(f"Total Cycles: {self.cycles}")
print(f"Instructions Executed: {self.pc}")
# Example program: Calculate (10 + 20) * 3
program = [
"LOAD A 10",
"LOAD B 20",
"ADD A B C", # C = A + B = 30
"LOAD D 3",
"MUL C D A", # A = C * D = 90
"STORE A 100", # Store result at memory[100]
"HALT"
]
cpu = SimpleCPU()
cpu.run(program)
Part 2: Memory Hierarchy and Caching#
The Memory Wall Problem#
CPU speed has increased 1000ร faster than memory speed over 30 years!
Memory Hierarchy (Latency & Size)#
Level |
Latency |
Size |
Cost/GB |
|---|---|---|---|
Registers |
0.3 ns (~1 cycle) |
1 KB |
N/A |
L1 Cache |
1 ns (~4 cycles) |
32-64 KB |
$10,000 |
L2 Cache |
3 ns (~12 cycles) |
256 KB-1 MB |
$1,000 |
L3 Cache |
12 ns (~40 cycles) |
8-64 MB |
$100 |
RAM |
100 ns (~300 cycles) |
8-64 GB |
$5 |
SSD |
100 ฮผs (100,000 ns) |
256 GB-2 TB |
$0.20 |
HDD |
10 ms (10,000,000 ns) |
1-10 TB |
$0.02 |
Cache Principles#
Temporal Locality: Recently accessed data likely to be accessed again
Spatial Locality: Nearby data likely to be accessed together
Cache Replacement Policies#
Policy |
Strategy |
Use Case |
|---|---|---|
LRU |
Evict least recently used |
General purpose (most common) |
LFU |
Evict least frequently used |
Long-term patterns |
FIFO |
Evict oldest |
Simple hardware |
Random |
Evict random entry |
Fastest, unpredictable |
from collections import OrderedDict
import random
class CacheSimulator:
"""Simulate different cache replacement policies."""
def __init__(self, capacity, policy='LRU'):
self.capacity = capacity
self.policy = policy
self.cache = OrderedDict() if policy == 'LRU' else {}
self.access_counts = {} # For LFU
self.hits = 0
self.misses = 0
self.evictions = 0
def access(self, address):
"""Access memory address."""
if address in self.cache:
self.hits += 1
if self.policy == 'LRU':
self.cache.move_to_end(address) # Mark as recently used
elif self.policy == 'LFU':
self.access_counts[address] += 1
return 'HIT'
else:
self.misses += 1
self._load_to_cache(address)
return 'MISS'
def _load_to_cache(self, address):
"""Load address into cache, evicting if necessary."""
# Evict if full
if len(self.cache) >= self.capacity:
self._evict()
# Add to cache
self.cache[address] = f"data_{address}"
if self.policy == 'LFU':
self.access_counts[address] = 1
def _evict(self):
"""Evict entry based on policy."""
self.evictions += 1
if self.policy == 'LRU':
# Remove least recently used (first item)
self.cache.popitem(last=False)
elif self.policy == 'LFU':
# Remove least frequently used
lfu_key = min(self.access_counts, key=self.access_counts.get)
del self.cache[lfu_key]
del self.access_counts[lfu_key]
elif self.policy == 'FIFO':
# Remove first added
first_key = next(iter(self.cache))
del self.cache[first_key]
elif self.policy == 'RANDOM':
# Remove random
random_key = random.choice(list(self.cache.keys()))
del self.cache[random_key]
def stats(self):
"""Calculate performance statistics."""
total = self.hits + self.misses
hit_rate = self.hits / total if total > 0 else 0
return {
'hits': self.hits,
'misses': self.misses,
'hit_rate': hit_rate,
'evictions': self.evictions
}
# Compare cache policies
print("๐ Cache Policy Comparison\n")
# Access pattern with locality
access_pattern = (
[1, 2, 3, 4, 5] * 3 + # Repeated accesses (temporal locality)
list(range(1, 10)) + # Sequential access (spatial locality)
[1, 2, 3] * 5 + # Hot data
list(range(10, 20)) # New data
)
policies = ['LRU', 'LFU', 'FIFO', 'RANDOM']
results = {}
for policy in policies:
cache = CacheSimulator(capacity=5, policy=policy)
for addr in access_pattern:
cache.access(addr)
results[policy] = cache.stats()
# Display results
print(f"{'Policy':<10} {'Hits':<8} {'Misses':<8} {'Hit Rate':<10} {'Evictions':<10}")
print("โ" * 50)
for policy, stats in results.items():
print(f"{policy:<10} {stats['hits']:<8} {stats['misses']:<8} "
f"{stats['hit_rate']*100:>7.1f}% {stats['evictions']:<10}")
print("\n๐ Winner: LRU typically performs best for most workloads")
Cache Performance Analysis#
Average Access Time:
AAT = Hit_Rate ร Cache_Latency + Miss_Rate ร (Cache_Latency + Memory_Latency)
Example:
Cache hit: 4 cycles
Memory access: 100 cycles
Hit rate: 95%
AAT = 0.95 ร 4 + 0.05 ร (4 + 100) = 3.8 + 5.2 = 9 cycles
Without cache: 100 cycles โ With cache: 9 cycles = 11ร speedup!
Part 3: Concurrency and Parallelism#
Key Concepts#
Concept |
Definition |
Example |
|---|---|---|
Concurrency |
Multiple tasks making progress (may be interleaved) |
Single-core multitasking |
Parallelism |
Multiple tasks executing simultaneously |
Multi-core execution |
Thread |
Lightweight execution unit (shares memory) |
Worker threads |
Process |
Heavy execution unit (isolated memory) |
Separate programs |
Amdahlโs Law (Parallel Speedup)#
Speedup = 1 / (S + (1 - S) / N)
Where:
S = Serial fraction (cannot be parallelized)
N = Number of processors
Example: If 10% is serial (S=0.1), max speedup with infinite cores = 10ร
Synchronization Problems#
Race Condition: Multiple threads access shared data without coordination
Deadlock: Circular waiting for resources
Starvation: Thread never gets CPU time
Priority Inversion: Low-priority task blocks high-priority task
import threading
import time
class ThreadSafeBankAccount:
"""Bank account with proper synchronization."""
def __init__(self, balance=0):
self.balance = balance
self.lock = threading.Lock() # Mutual exclusion
self.transactions = 0
def deposit(self, amount):
"""Thread-safe deposit."""
with self.lock: # Acquire lock automatically
current = self.balance
time.sleep(0.0001) # Simulate processing delay
self.balance = current + amount
self.transactions += 1
def withdraw(self, amount):
"""Thread-safe withdrawal."""
with self.lock:
if self.balance >= amount:
current = self.balance
time.sleep(0.0001)
self.balance = current - amount
self.transactions += 1
return True
return False
def get_balance(self):
"""Thread-safe balance check."""
with self.lock:
return self.balance
# Demonstrate race condition (without lock)
class UnsafeBankAccount:
def __init__(self, balance=0):
self.balance = balance
def deposit(self, amount):
current = self.balance
time.sleep(0.0001) # Context switch happens here!
self.balance = current + amount
def run_concurrent_transactions(account, num_threads, deposits_per_thread):
"""Run concurrent deposit operations."""
def worker():
for _ in range(deposits_per_thread):
account.deposit(10)
threads = [threading.Thread(target=worker) for _ in range(num_threads)]
start = time.time()
for t in threads:
t.start()
for t in threads:
t.join()
elapsed = time.time() - start
return elapsed
# Test unsafe vs safe
print("๐งช Testing Thread Safety\n")
num_threads = 10
deposits_per_thread = 100
expected = num_threads * deposits_per_thread * 10
print("Unsafe Account (Race Condition):")
unsafe = UnsafeBankAccount(0)
time_unsafe = run_concurrent_transactions(unsafe, num_threads, deposits_per_thread)
print(f" Expected: ${expected}")
print(f" Actual: ${unsafe.balance}")
print(f" Error: ${expected - unsafe.balance} (lost updates!)")
print(f" Time: {time_unsafe:.3f}s\n")
print("Safe Account (With Lock):")
safe = ThreadSafeBankAccount(0)
time_safe = run_concurrent_transactions(safe, num_threads, deposits_per_thread)
print(f" Expected: ${expected}")
print(f" Actual: ${safe.balance}")
print(f" Error: $0 (correct!)")
print(f" Time: {time_safe:.3f}s")
print(f"\nโ ๏ธ Synchronization overhead: {time_safe/time_unsafe:.2f}x slower")
print("But correctness is worth it!")
Producer-Consumer Problem (Classic Concurrency)#
Problem: Multiple producers create items, multiple consumers process them.
Solution: Bounded buffer with semaphores
from threading import Semaphore, Thread
from queue import Queue
import random
class BoundedBuffer:
"""Thread-safe bounded buffer for producer-consumer."""
def __init__(self, capacity):
self.buffer = Queue(maxsize=capacity)
self.empty = Semaphore(capacity) # Count empty slots
self.full = Semaphore(0) # Count full slots
self.mutex = Semaphore(1) # Mutual exclusion
def produce(self, item):
"""Add item to buffer."""
self.empty.acquire() # Wait for empty slot
self.mutex.acquire() # Lock buffer
self.buffer.put(item)
print(f" Produced: {item} (buffer size: {self.buffer.qsize()})")
self.mutex.release() # Unlock buffer
self.full.release() # Signal full slot
def consume(self):
"""Remove item from buffer."""
self.full.acquire() # Wait for full slot
self.mutex.acquire() # Lock buffer
item = self.buffer.get()
print(f" Consumed: {item} (buffer size: {self.buffer.qsize()})")
self.mutex.release() # Unlock buffer
self.empty.release() # Signal empty slot
return item
# Demo
buffer = BoundedBuffer(capacity=5)
items_produced = 0
items_consumed = 0
def producer(buffer, count):
global items_produced
for i in range(count):
item = f"item_{i}"
buffer.produce(item)
items_produced += 1
time.sleep(random.uniform(0.01, 0.05))
def consumer(buffer, count):
global items_consumed
for _ in range(count):
buffer.consume()
items_consumed += 1
time.sleep(random.uniform(0.01, 0.05))
print("๐ญ Producer-Consumer Simulation\n")
num_items = 10
producers = [Thread(target=producer, args=(buffer, num_items // 2)) for _ in range(2)]
consumers = [Thread(target=consumer, args=(buffer, num_items // 2)) for _ in range(2)]
for p in producers:
p.start()
for c in consumers:
c.start()
for p in producers:
p.join()
for c in consumers:
c.join()
print(f"\nโ
Complete: Produced {items_produced}, Consumed {items_consumed}")
Part 4: Computational Complexity Theory#
Complexity Classes#
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ EXPTIME (2^n) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ PSPACE (polynomial) โโ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ NP-Complete โโโ
โ โ โ โโโโโโโโโโโโโโโโโโ โโโ
โ โ โ โ NP-Hard โ โโโ
โ โ โ โ โโโโโโโโโโโโ โ โโโ
โ โ โ โ โ P โ โ โโโ
โ โ โ โ โโโโโโโโโโโโ โ โโโ
โ โ โ โโโโโโโโโโโโโโโโโโ โโโ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
P vs NP (Million Dollar Question)#
P (Polynomial Time):
Problems solvable in O(n^k) time
Examples: Sorting, shortest path, matrix multiplication
NP (Nondeterministic Polynomial):
Solutions verifiable in polynomial time
Examples: SAT, Traveling Salesman, Graph Coloring
NP-Complete:
Hardest problems in NP
If any NP-Complete problem is in P, then P = NP
Examples: SAT, 3-SAT, Hamiltonian Path
NP-Hard:
At least as hard as NP-Complete
May not be in NP
Examples: Halting Problem, TSP optimization
Famous NP-Complete Problems#
Problem |
Description |
Verification Time |
|---|---|---|
SAT |
Boolean satisfiability |
O(n) |
3-SAT |
SAT with 3 literals per clause |
O(n) |
TSP Decision |
Is there tour < k? |
O(n) |
Vertex Cover |
Cover with k vertices? |
O(n) |
Knapsack |
Can fit with value โฅ k? |
O(n) |
import math
from itertools import permutations
import time
import numpy as np
def tsp_brute_force(distances):
"""Solve TSP with brute force - O(n!) exponential."""
n = len(distances)
cities = list(range(n))
min_distance = float('inf')
best_path = None
permutations_checked = 0
# Try all permutations (starting from city 0)
for perm in permutations(cities[1:]):
path = [0] + list(perm) + [0] # Return to start
distance = sum(distances[path[i]][path[i+1]] for i in range(len(path)-1))
permutations_checked += 1
if distance < min_distance:
min_distance = distance
best_path = path
return best_path, min_distance, permutations_checked
def tsp_greedy(distances):
"""Approximate TSP with greedy nearest neighbor - O(nยฒ) polynomial."""
n = len(distances)
unvisited = set(range(1, n))
path = [0]
total_distance = 0
current = 0
while unvisited:
# Find nearest unvisited city
nearest = min(unvisited, key=lambda city: distances[current][city])
total_distance += distances[current][nearest]
path.append(nearest)
unvisited.remove(nearest)
current = nearest
# Return to start
total_distance += distances[current][0]
path.append(0)
return path, total_distance
def verify_tsp_solution(distances, path, claimed_distance):
"""Verify TSP solution - O(n) polynomial (easy to verify!)."""
actual_distance = sum(distances[path[i]][path[i+1]] for i in range(len(path)-1))
return actual_distance == claimed_distance
# Compare algorithms
print("๐บ๏ธ Traveling Salesman Problem (TSP)\n")
# Small example (5 cities)
distances = np.array([
[0, 10, 15, 20, 25],
[10, 0, 35, 25, 30],
[15, 35, 0, 30, 20],
[20, 25, 30, 0, 15],
[25, 30, 20, 15, 0]
])
n_cities = len(distances)
print(f"Problem: {n_cities} cities\n")
# Exact solution (brute force)
start = time.time()
optimal_path, optimal_dist, perms = tsp_brute_force(distances)
brute_time = time.time() - start
print(f"Optimal Solution (Brute Force):")
print(f" Path: {optimal_path}")
print(f" Distance: {optimal_dist}")
print(f" Permutations checked: {perms:,}")
print(f" Time: {brute_time:.6f}s\n")
# Approximation (greedy)
start = time.time()
greedy_path, greedy_dist = tsp_greedy(distances)
greedy_time = time.time() - start
print(f"Greedy Approximation:")
print(f" Path: {greedy_path}")
print(f" Distance: {greedy_dist}")
print(f" Approximation ratio: {greedy_dist/optimal_dist:.2f}x")
print(f" Time: {greedy_time:.6f}s")
print(f" Speedup: {brute_time/greedy_time:.0f}x faster\n")
# Verification is fast!
start = time.time()
is_valid = verify_tsp_solution(distances, optimal_path, optimal_dist)
verify_time = time.time() - start
print(f"Solution Verification:")
print(f" Valid: {is_valid}")
print(f" Time: {verify_time:.6f}s (instant!)\n")
# Scaling analysis
print("๐ Complexity Analysis:")
print(f" Brute force: O({n_cities}!) = {math.factorial(n_cities-1):,} permutations")
print(f" Greedy: O({n_cities}ยฒ) = {n_cities**2} operations")
print(f" Verification: O({n_cities}) = {n_cities} operations")
print(f"\n For 10 cities: {math.factorial(9):,} permutations! (intractable)")
print(f" For 20 cities: {math.factorial(19):,} permutations! (universe ends first)")
Reductions (Proving NP-Completeness)#
Idea: To prove problem B is NP-Complete:
Show B is in NP (solution verifiable in polynomial time)
Reduce known NP-Complete problem A to B
If we can solve B quickly, we can solve A quickly
Cook-Levin Theorem (1971): SAT is NP-Complete (first proven NP-Complete problem)
Part 5: Automata Theory and Formal Languages#
Chomsky Hierarchy#
Type |
Language Class |
Automaton |
Example |
|---|---|---|---|
Type 0 |
Recursively Enumerable |
Turing Machine |
Any computable |
Type 1 |
Context-Sensitive |
Linear-bounded |
Natural language |
Type 2 |
Context-Free |
Pushdown |
Programming languages |
Type 3 |
Regular |
Finite State |
Pattern matching |
Finite State Machine (FSM)#
Components:
States (Q): Finite set
Alphabet (ฮฃ): Input symbols
Transition function (ฮด): Q ร ฮฃ โ Q
Start state (qโ)
Accept states (F)
Applications:
Regular expressions
Lexical analysis (tokenization)
Protocol verification
Game AI
class FiniteStateMachine:
"""General-purpose finite state machine."""
def __init__(self, states, alphabet, transitions, start_state, accept_states):
self.states = states
self.alphabet = alphabet
self.transitions = transitions # {(state, symbol): next_state}
self.start_state = start_state
self.accept_states = accept_states
self.current_state = start_state
def reset(self):
"""Reset to start state."""
self.current_state = self.start_state
def transition(self, symbol):
"""Process single symbol."""
key = (self.current_state, symbol)
if key in self.transitions:
self.current_state = self.transitions[key]
return True
return False # Invalid transition
def accepts(self, input_string):
"""Check if FSM accepts the input string."""
self.reset()
for i, symbol in enumerate(input_string):
if not self.transition(symbol):
print(f" Rejected at position {i}: '{symbol}'")
return False
print(f" '{symbol}' โ {self.current_state}")
accepted = self.current_state in self.accept_states
return accepted
# Example: FSM that accepts binary strings with even number of 1s
print("๐ค FSM: Accept binary strings with even number of 1s\n")
fsm_even_ones = FiniteStateMachine(
states={'EVEN', 'ODD'},
alphabet={'0', '1'},
transitions={
('EVEN', '0'): 'EVEN', # Stay in EVEN on 0
('EVEN', '1'): 'ODD', # Go to ODD on 1
('ODD', '0'): 'ODD', # Stay in ODD on 0
('ODD', '1'): 'EVEN', # Go to EVEN on 1
},
start_state='EVEN',
accept_states={'EVEN'}
)
test_strings = ['0011', '101', '1100', '111']
for test in test_strings:
print(f"Testing: '{test}'")
result = fsm_even_ones.accepts(test)
ones_count = test.count('1')
print(f" Result: {'โ
ACCEPT' if result else 'โ REJECT'}")
print(f" (Has {ones_count} ones)\n")
Regular Expressions โก Finite Automata#
Theorem: Regular languages can be recognized by both:
Regular expressions
Finite state machines
Example: (0|1)*1(0|1)* matches strings containing at least one โ1โ
Part 6: Virtual Memory and Paging#
Why Virtual Memory?#
Benefits:
Isolation: Processes canโt access each otherโs memory
Larger address space: Programs think they have more RAM
Sharing: Multiple processes can share code (libraries)
Protection: Read-only, read-write, execute permissions
Address Translation#
Virtual Address = [Page Number | Offset]
โ
Page Table Lookup
โ
Physical Address = [Frame Number | Offset]
Page Replacement Algorithms#
Algorithm |
Strategy |
Complexity |
|---|---|---|
FIFO |
Remove oldest page |
O(1) |
LRU |
Remove least recently used |
O(1) with hardware |
LFU |
Remove least frequently used |
O(log n) |
Clock |
Approximation of LRU |
O(1) |
class VirtualMemorySimulator:
"""Simulate virtual memory with paging."""
def __init__(self, num_frames, page_size, replacement='LRU'):
self.page_size = page_size
self.num_frames = num_frames
self.replacement = replacement
# Page table: virtual page โ physical frame
self.page_table = {}
# Physical memory
self.frames = [None] * num_frames
self.frame_age = {} # For LRU
# Statistics
self.page_faults = 0
self.page_hits = 0
self.access_count = 0
def translate(self, virtual_address):
"""Translate virtual address to physical address."""
page_num = virtual_address // self.page_size
offset = virtual_address % self.page_size
self.access_count += 1
if page_num in self.page_table:
# Page hit!
self.page_hits += 1
frame = self.page_table[page_num]
self.frame_age[frame] = self.access_count # Update access time
physical_addr = frame * self.page_size + offset
return physical_addr, 'HIT'
else:
# Page fault!
self.page_faults += 1
frame = self._load_page(page_num)
physical_addr = frame * self.page_size + offset
return physical_addr, 'FAULT'
def _load_page(self, page_num):
"""Load page into physical memory."""
# Find free frame or evict
if None in self.frames:
frame = self.frames.index(None)
else:
frame = self._evict_page()
# Load page
self.frames[frame] = page_num
self.page_table[page_num] = frame
self.frame_age[frame] = self.access_count
return frame
def _evict_page(self):
"""Evict page based on replacement policy."""
if self.replacement == 'LRU':
# Remove least recently used
lru_frame = min(self.frame_age, key=self.frame_age.get)
elif self.replacement == 'FIFO':
# Remove oldest (first added)
lru_frame = 0 # Simplified: always evict first
else:
lru_frame = 0
# Remove from page table
old_page = self.frames[lru_frame]
if old_page in self.page_table:
del self.page_table[old_page]
return lru_frame
def stats(self):
"""Return performance statistics."""
total = self.page_hits + self.page_faults
hit_rate = self.page_hits / total if total > 0 else 0
return {
'hits': self.page_hits,
'faults': self.page_faults,
'hit_rate': hit_rate,
'fault_rate': 1 - hit_rate
}
# Simulate memory access pattern
print("๐พ Virtual Memory Simulation\n")
vm = VirtualMemorySimulator(num_frames=4, page_size=1000, replacement='LRU')
# Access pattern (address โ page number)
# 0-999 โ page 0, 1000-1999 โ page 1, etc.
access_pattern = [
100, # Page 0 (fault)
1200, # Page 1 (fault)
250, # Page 0 (hit)
2500, # Page 2 (fault)
3100, # Page 3 (fault)
4000, # Page 4 (fault, eviction!)
1250, # Page 1 (hit)
2750, # Page 2 (hit)
100, # Page 0 (fault, was evicted)
]
print(f"{'Virtual Addr':<15} {'Page':<8} {'Physical Addr':<15} {'Result'}")
print("โ" * 55)
for vaddr in access_pattern:
paddr, result = vm.translate(vaddr)
page = vaddr // vm.page_size
status = "๐ข HIT" if result == 'HIT' else "๐ด FAULT"
print(f"{vaddr:<15} {page:<8} {paddr:<15} {status}")
print("\n" + "โ" * 55)
stats = vm.stats()
print(f"Page Hits: {stats['hits']}")
print(f"Page Faults: {stats['faults']}")
print(f"Hit Rate: {stats['hit_rate']*100:.1f}%")
print(f"Fault Rate: {stats['fault_rate']*100:.1f}%")
Part 7: Modern Hardware and Parallelism#
Multi-Core vs GPU#
Feature |
CPU (Multi-core) |
GPU |
|---|---|---|
Cores |
4-64 |
1000s-10000s |
Clock Speed |
3-5 GHz |
1-2 GHz |
Use Case |
General purpose, complex control flow |
Data parallel, simple operations |
Memory |
Large cache, low latency |
High bandwidth, high latency |
Best For |
Databases, web servers, compilers |
Deep learning, graphics, crypto |
SIMD (Single Instruction, Multiple Data)#
Idea: Apply same operation to multiple data elements simultaneously
Example: Vector addition
Scalar: for i in range(n): c[i] = a[i] + b[i] # n operations
SIMD: c = a + b # 1 operation (hardware vectorized)
Applications:
Image/video processing
Scientific computing
Machine learning
Cryptography
from multiprocessing import Pool, cpu_count
import numpy as np
def parallel_computation_demo():
"""Compare serial vs parallel execution."""
def compute_intensive_task(n):
"""Simulate CPU-intensive task."""
result = 0
for i in range(n):
result += i ** 2
return result
tasks = [1000000] * 8 # 8 tasks
# Serial execution
start = time.time()
serial_results = [compute_intensive_task(t) for t in tasks]
serial_time = time.time() - start
# Parallel execution
start = time.time()
with Pool(processes=cpu_count()) as pool:
parallel_results = pool.map(compute_intensive_task, tasks)
parallel_time = time.time() - start
print("โก Parallel Computation Comparison\n")
print(f"Number of tasks: {len(tasks)}")
print(f"CPU cores: {cpu_count()}\n")
print(f"Serial time: {serial_time:.3f}s")
print(f"Parallel time: {parallel_time:.3f}s")
print(f"Speedup: {serial_time/parallel_time:.2f}x")
print(f"Efficiency: {(serial_time/parallel_time)/cpu_count()*100:.1f}%")
# Amdahl's Law prediction
print(f"\n๐ Amdahl's Law Analysis:")
print(f" Assuming 100% parallelizable (S=0):")
print(f" Theoretical max speedup: {cpu_count()}x")
print(f" Actual speedup: {serial_time/parallel_time:.2f}x")
print(f" Overhead: ~{(1 - parallel_time*cpu_count()/serial_time)*100:.1f}%")
parallel_computation_demo()
Part 8: Exercises#
Exercise 1: Process Scheduler (โญโญ)#
Implement CPU scheduling algorithms:
Round Robin: Each process gets fixed time slice
Shortest Job First: Execute shortest process first
Priority Scheduling: Execute highest priority first
Compare average waiting time and turnaround time
Visualize Gantt chart
# Exercise 1: Your code here
# Hint: Create Process class with arrival_time, burst_time, priority
# Hint: Implement scheduler that maintains ready queue
Exercise 2: Deadlock Detection (โญโญโญ)#
Implement deadlock detection:
Create resource allocation graph
Implement cycle detection algorithm
Simulate Bankerโs Algorithm for deadlock avoidance
Test with various resource request patterns
# Exercise 2: Your code here
# Hint: Use graph representation (adjacency list)
# Hint: DFS for cycle detection
Exercise 3: Memory Allocator (โญโญโญ)#
Implement dynamic memory allocation:
First-fit: Allocate first block large enough
Best-fit: Allocate smallest block large enough
Worst-fit: Allocate largest available block
Handle fragmentation (external and internal)
Compare allocation success rate and fragmentation
# Exercise 3: Your code here
# Hint: Maintain list of free blocks
# Hint: Merge adjacent free blocks
Exercise 4: NP-Completeness Proof (โญโญโญโญ)#
Prove a problem is NP-Complete:
Choose problem (e.g., Vertex Cover, Subset Sum)
Show itโs in NP (create verifier)
Reduce from known NP-Complete problem (e.g., 3-SAT)
Implement reduction algorithm
Test with examples
# Exercise 4: Your code here
# Hint: Start with verifier (polynomial time check)
# Hint: For reduction, show if we can solve B, we can solve A
Exercise 5: Build a Virtual Machine (โญโญโญโญ)#
Create a simple stack-based VM:
Design bytecode instruction set (PUSH, POP, ADD, etc.)
Implement stack operations
Support function calls (CALL, RET)
Add conditional jumps (JZ, JNZ)
Write โcompilerโ to generate bytecode from simple expressions
# Exercise 5: Your code here
# Hint: Maintain operand stack and call stack
# Hint: VM state: {pc, stack, memory, callstack}
Self-Check Quiz#
Test your understanding:
What is the fetch-decode-execute cycle?
A) How CPU processes instructions
B) How memory is accessed
C) How cache works
D) How I/O happens
Which cache replacement policy is most common?
A) FIFO
B) LRU
C) Random
D) LFU
What is the difference between concurrency and parallelism?
A) No difference
B) Concurrency is interleaved, parallelism is simultaneous
C) Parallelism is slower
D) Concurrency requires multiple CPUs
What does P = NP mean?
A) All problems are easy
B) Problems with fast verification also have fast solutions
C) Computers are perfect
D) No hard problems exist
What is NP-Complete?
A) Easiest problems
B) Hardest problems in NP
C) Impossible problems
D) Polynomial problems
What can a finite state machine recognize?
A) Any language
B) Regular languages only
C) Context-free languages
D) Natural languages
What is a page fault?
A) Hardware error
B) Accessing page not in physical memory
C) Software bug
D) Network error
What is Amdahlโs Law about?
A) Cache performance
B) Parallel speedup limits
C) Memory bandwidth
D) Disk throughput
Which synchronization primitive prevents race conditions?
A) Barrier
B) Mutex/Lock
C) Semaphore
D) All of the above
What is SIMD?
A) Sequential processing
B) Same operation on multiple data
C) Multiple operations on same data
D) Random processing
Answers: 1-A, 2-B, 3-B, 4-B, 5-B, 6-B, 7-B, 8-B, 9-D, 10-B
Key Takeaways#
Computer Architecture#
โ CPU executes fetch-decode-execute cycle
โ Memory hierarchy: Registers โ Cache โ RAM โ Disk
โ Cache exploits temporal and spatial locality
โ LRU is most common cache replacement policy
Concurrency#
โ Concurrency โ Parallelism (interleaved vs simultaneous)
โ Race conditions require synchronization (locks, semaphores)
โ Amdahlโs Law limits parallel speedup
โ Always test concurrent code thoroughly
Complexity Theory#
โ P = problems solvable in polynomial time
โ NP = problems verifiable in polynomial time
โ NP-Complete = hardest problems in NP
โ If P = NP, crypto breaks (major unsolved problem)
Automata Theory#
โ FSMs recognize regular languages
โ Regular expressions โก Finite automata
โ Pushdown automata โก Context-free grammars
โ Turing machines โก General computation
Virtual Memory#
โ Provides isolation and protection
โ Page faults load pages from disk
โ TLB caches address translations
โ Page replacement crucial for performance
Pro Tips#
Profile before optimizing: Measure where time is spent
Cache-friendly code: Access data sequentially when possible
Avoid false sharing: Pad shared data to cache line boundaries
Lock granularity: Fine-grained locks = more parallelism, more overhead
Use lock-free data structures: When possible (hard to get right!)
P vs NP awareness: Donโt try exact solutions for NP-Hard problems at scale
Approximation algorithms: Often good enough in practice
Vectorize code: Use NumPy/SIMD for data parallel operations
Understand your hardware: CPU cores, cache sizes, memory bandwidth
Test concurrency: Race conditions are non-deterministic bugs
Common Mistakes#
โ Forgetting to synchronize shared data
โ Holding locks too long (reduces parallelism)
โ Ignoring cache effects in tight loops
โ Trying exact solutions for large NP-Complete problems
โ Not considering false sharing in parallel code
โ Over-parallelizing (overhead exceeds benefit)
โ Assuming atomic operations when theyโre not
โ Creating too many threads (context switch overhead)
Debug Checklist#
โ ๏ธ Race condition โ Add synchronization, test with ThreadSanitizer
โ ๏ธ Deadlock โ Check lock ordering, use timeout
โ ๏ธ Poor cache performance โ Profile cache misses, improve locality
โ ๏ธ Slow parallel code โ Check for serial bottlenecks (Amdahlโs Law)
โ ๏ธ Page faults โ Increase memory, improve access patterns
Whatโs Next?#
Continue in Hard Track:#
Lesson 7: Project Ideas (apply all knowledge!)
Lesson 8: Classic Problems (interview preparation)
Lesson 9: CTF Challenges (security skills)
Deepen Your Systems Knowledge:#
MIT 6.004: Computation Structures
CMU 15-213: Computer Systems (CSAPP book)
Operating Systems: Three Easy Pieces (free online)
Coursera: Parallel Programming
Practice Projects:#
Build a shell (bash clone)
Implement malloc/free
Write a thread pool
Create a key-value store
Build a simple OS kernel
Resources:#
Books: CSAPP, Modern Operating Systems (Tanenbaum)
Online: OSDev wiki, Linux kernel source
Tools: perf, valgrind, gdb, ThreadSanitizer
Congratulations! You now understand how computers work at the systems level and can reason about complexity! ๐ฅ๏ธ