Lesson 6: Computer Systems and Theory

Lesson 6: Computer Systems and Theory#

Master computer architecture, operating systems, and theoretical computer science

Real-World Context#

Understanding computer systems is essential for writing efficient code, debugging performance issues, and system design. This knowledge powers: high-performance computing (Google scale), operating systems (Linux kernel), embedded systems (smartphones), and distributed systems (cloud infrastructure).

What You’ll Learn#

Computer Architecture: CPU, memory hierarchy, instruction execution
Memory Management: Virtual memory, paging, caching strategies
Concurrency: Threads, processes, synchronization primitives
Operating Systems: Scheduling, file systems, I/O management
Computational Complexity: P vs NP, complexity classes, reductions
Automata Theory: Finite state machines, regular expressions, grammars
Modern Hardware: Multi-core, GPUs, SIMD, memory models
System Design: Scalability, reliability, performance optimization

Prerequisites: Python, basic data structures, algorithms

Time: 4-5 hours

Part 1: Computer Architecture Fundamentals#

Von Neumann Architecture (1945)#

Core Components:

┌─────────────────────────────────────┐
│  CPU (Central Processing Unit)      │
│  ┌──────────────┐  ┌─────────────┐ │
│  │ Control Unit │  │     ALU     │ │
│  └──────────────┘  └─────────────┘ │
│  ┌──────────────────────────────┐  │
│  │       Registers              │  │
│  └──────────────────────────────┘  │
└─────────────────────────────────────┘
         ↕ (Bus)
┌─────────────────────────────────────┐
│          Memory (RAM)                │
│  [Instructions] [Data] [Stack]      │
└─────────────────────────────────────┘
         ↕
┌─────────────────────────────────────┐
│        I/O Devices                   │
│  Keyboard, Display, Disk, Network   │
└─────────────────────────────────────┘

CPU Execution Cycle: Fetch-Decode-Execute#

Fetch: Load instruction from memory (PC → Memory → IR)
Decode: Interpret instruction (IR → Control Unit)
Execute: Perform operation (ALU, Registers)
Write-back: Store result (Registers → Memory)

Instruction Types#

Category	Examples	Purpose
Data Transfer	LOAD, STORE, MOVE	Move data between registers/memory
Arithmetic	ADD, SUB, MUL, DIV	Mathematical operations
Logical	AND, OR, XOR, NOT	Bitwise operations
Control Flow	JMP, JZ, CALL, RET	Program flow control
I/O	IN, OUT	Interact with devices

class SimpleCPU:
    """Educational CPU simulator with fetch-decode-execute cycle."""
    
    def __init__(self):
        # Registers
        self.registers = {'A': 0, 'B': 0, 'C': 0, 'D': 0}
        self.pc = 0  # Program Counter
        self.ir = None  # Instruction Register
        self.flags = {'zero': False, 'carry': False}  # Status flags
        
        # Memory
        self.memory = [0] * 256
        
        # Statistics
        self.cycles = 0
        
    def fetch(self, program):
        """Fetch instruction from program memory."""
        if self.pc < len(program):
            self.ir = program[self.pc]
            self.pc += 1
            self.cycles += 1  # Fetch takes 1 cycle
            return True
        return False
    
    def decode_execute(self):
        """Decode and execute instruction."""
        if self.ir is None:
            return
        
        parts = self.ir.split()
        opcode = parts[0]
        
        if opcode == 'LOAD':
            # LOAD reg, value
            reg, value = parts[1], int(parts[2])
            self.registers[reg] = value
            print(f"  LOAD {reg}, {value}  → {reg}={value}")
            
        elif opcode == 'ADD':
            # ADD reg1, reg2, dest
            r1, r2, dest = parts[1], parts[2], parts[3]
            result = self.registers[r1] + self.registers[r2]
            self.registers[dest] = result
            self.flags['zero'] = (result == 0)
            print(f"  ADD {r1}, {r2}, {dest}  → {dest}={result}")
            
        elif opcode == 'SUB':
            # SUB reg1, reg2, dest
            r1, r2, dest = parts[1], parts[2], parts[3]
            result = self.registers[r1] - self.registers[r2]
            self.registers[dest] = result
            self.flags['zero'] = (result == 0)
            print(f"  SUB {r1}, {r2}, {dest}  → {dest}={result}")
            
        elif opcode == 'MUL':
            # MUL reg1, reg2, dest
            r1, r2, dest = parts[1], parts[2], parts[3]
            result = self.registers[r1] * self.registers[r2]
            self.registers[dest] = result
            print(f"  MUL {r1}, {r2}, {dest}  → {dest}={result}")
            
        elif opcode == 'STORE':
            # STORE reg, address
            reg, addr = parts[1], int(parts[2])
            self.memory[addr] = self.registers[reg]
            print(f"  STORE {reg}, [{addr}]  → Memory[{addr}]={self.registers[reg]}")
            
        elif opcode == 'JMP':
            # JMP address
            addr = int(parts[1])
            self.pc = addr
            print(f"  JMP {addr}  → PC={addr}")
            
        elif opcode == 'HALT':
            print(f"  HALT")
            return False
            
        self.cycles += 1  # Execute takes 1 cycle (simplified)
        return True
    
    def run(self, program):
        """Execute complete program."""
        print("🖥️  CPU Execution Start\n")
        print("Cycle | Instruction")
        print("─" * 40)
        
        while self.fetch(program):
            print(f"{self.cycles:3d}   |", end=" ")
            if not self.decode_execute():
                break
        
        print("\n" + "─" * 40)
        print("✅ Execution Complete\n")
        print(f"Final Registers: {self.registers}")
        print(f"Total Cycles: {self.cycles}")
        print(f"Instructions Executed: {self.pc}")

# Example program: Calculate (10 + 20) * 3
program = [
    "LOAD A 10",
    "LOAD B 20",
    "ADD A B C",    # C = A + B = 30
    "LOAD D 3",
    "MUL C D A",    # A = C * D = 90
    "STORE A 100",  # Store result at memory[100]
    "HALT"
]

cpu = SimpleCPU()
cpu.run(program)

Part 2: Memory Hierarchy and Caching#

The Memory Wall Problem#

CPU speed has increased 1000× faster than memory speed over 30 years!

Memory Hierarchy (Latency & Size)#

Level	Latency	Size	Cost/GB
Registers	0.3 ns (~1 cycle)	1 KB	N/A
L1 Cache	1 ns (~4 cycles)	32-64 KB	$10,000
L2 Cache	3 ns (~12 cycles)	256 KB-1 MB	$1,000
L3 Cache	12 ns (~40 cycles)	8-64 MB	$100
RAM	100 ns (~300 cycles)	8-64 GB	$5
SSD	100 μs (100,000 ns)	256 GB-2 TB	$0.20
HDD	10 ms (10,000,000 ns)	1-10 TB	$0.02

Cache Principles#

Temporal Locality: Recently accessed data likely to be accessed again
Spatial Locality: Nearby data likely to be accessed together

Cache Replacement Policies#

Policy	Strategy	Use Case
LRU	Evict least recently used	General purpose (most common)
LFU	Evict least frequently used	Long-term patterns
FIFO	Evict oldest	Simple hardware
Random	Evict random entry	Fastest, unpredictable

from collections import OrderedDict
import random

class CacheSimulator:
    """Simulate different cache replacement policies."""
    
    def __init__(self, capacity, policy='LRU'):
        self.capacity = capacity
        self.policy = policy
        self.cache = OrderedDict() if policy == 'LRU' else {}
        self.access_counts = {}  # For LFU
        self.hits = 0
        self.misses = 0
        self.evictions = 0
        
    def access(self, address):
        """Access memory address."""
        if address in self.cache:
            self.hits += 1
            
            if self.policy == 'LRU':
                self.cache.move_to_end(address)  # Mark as recently used
            elif self.policy == 'LFU':
                self.access_counts[address] += 1
                
            return 'HIT'
        else:
            self.misses += 1
            self._load_to_cache(address)
            return 'MISS'
    
    def _load_to_cache(self, address):
        """Load address into cache, evicting if necessary."""
        # Evict if full
        if len(self.cache) >= self.capacity:
            self._evict()
            
        # Add to cache
        self.cache[address] = f"data_{address}"
        if self.policy == 'LFU':
            self.access_counts[address] = 1
    
    def _evict(self):
        """Evict entry based on policy."""
        self.evictions += 1
        
        if self.policy == 'LRU':
            # Remove least recently used (first item)
            self.cache.popitem(last=False)
            
        elif self.policy == 'LFU':
            # Remove least frequently used
            lfu_key = min(self.access_counts, key=self.access_counts.get)
            del self.cache[lfu_key]
            del self.access_counts[lfu_key]
            
        elif self.policy == 'FIFO':
            # Remove first added
            first_key = next(iter(self.cache))
            del self.cache[first_key]
            
        elif self.policy == 'RANDOM':
            # Remove random
            random_key = random.choice(list(self.cache.keys()))
            del self.cache[random_key]
    
    def stats(self):
        """Calculate performance statistics."""
        total = self.hits + self.misses
        hit_rate = self.hits / total if total > 0 else 0
        return {
            'hits': self.hits,
            'misses': self.misses,
            'hit_rate': hit_rate,
            'evictions': self.evictions
        }

# Compare cache policies
print("📊 Cache Policy Comparison\n")

# Access pattern with locality
access_pattern = (
    [1, 2, 3, 4, 5] * 3 +       # Repeated accesses (temporal locality)
    list(range(1, 10)) +         # Sequential access (spatial locality)
    [1, 2, 3] * 5 +              # Hot data
    list(range(10, 20))          # New data
)

policies = ['LRU', 'LFU', 'FIFO', 'RANDOM']
results = {}

for policy in policies:
    cache = CacheSimulator(capacity=5, policy=policy)
    
    for addr in access_pattern:
        cache.access(addr)
    
    results[policy] = cache.stats()

# Display results
print(f"{'Policy':<10} {'Hits':<8} {'Misses':<8} {'Hit Rate':<10} {'Evictions':<10}")
print("─" * 50)
for policy, stats in results.items():
    print(f"{policy:<10} {stats['hits']:<8} {stats['misses']:<8} "
          f"{stats['hit_rate']*100:>7.1f}%  {stats['evictions']:<10}")

print("\n🏆 Winner: LRU typically performs best for most workloads")

Cache Performance Analysis#

Average Access Time:

AAT = Hit_Rate × Cache_Latency + Miss_Rate × (Cache_Latency + Memory_Latency)

Example:

Cache hit: 4 cycles
Memory access: 100 cycles
Hit rate: 95%

AAT = 0.95 × 4 + 0.05 × (4 + 100) = 3.8 + 5.2 = 9 cycles

Without cache: 100 cycles → With cache: 9 cycles = 11× speedup!

Part 3: Concurrency and Parallelism#

Key Concepts#

Concept	Definition	Example
Concurrency	Multiple tasks making progress (may be interleaved)	Single-core multitasking
Parallelism	Multiple tasks executing simultaneously	Multi-core execution
Thread	Lightweight execution unit (shares memory)	Worker threads
Process	Heavy execution unit (isolated memory)	Separate programs

Amdahl’s Law (Parallel Speedup)#

Speedup = 1 / (S + (1 - S) / N)

Where:

S = Serial fraction (cannot be parallelized)
N = Number of processors

Example: If 10% is serial (S=0.1), max speedup with infinite cores = 10×

Synchronization Problems#

Race Condition: Multiple threads access shared data without coordination
Deadlock: Circular waiting for resources
Starvation: Thread never gets CPU time
Priority Inversion: Low-priority task blocks high-priority task

import threading
import time

class ThreadSafeBankAccount:
    """Bank account with proper synchronization."""
    
    def __init__(self, balance=0):
        self.balance = balance
        self.lock = threading.Lock()  # Mutual exclusion
        self.transactions = 0
        
    def deposit(self, amount):
        """Thread-safe deposit."""
        with self.lock:  # Acquire lock automatically
            current = self.balance
            time.sleep(0.0001)  # Simulate processing delay
            self.balance = current + amount
            self.transactions += 1
            
    def withdraw(self, amount):
        """Thread-safe withdrawal."""
        with self.lock:
            if self.balance >= amount:
                current = self.balance
                time.sleep(0.0001)
                self.balance = current - amount
                self.transactions += 1
                return True
            return False
    
    def get_balance(self):
        """Thread-safe balance check."""
        with self.lock:
            return self.balance

# Demonstrate race condition (without lock)
class UnsafeBankAccount:
    def __init__(self, balance=0):
        self.balance = balance
    
    def deposit(self, amount):
        current = self.balance
        time.sleep(0.0001)  # Context switch happens here!
        self.balance = current + amount

def run_concurrent_transactions(account, num_threads, deposits_per_thread):
    """Run concurrent deposit operations."""
    def worker():
        for _ in range(deposits_per_thread):
            account.deposit(10)
    
    threads = [threading.Thread(target=worker) for _ in range(num_threads)]
    
    start = time.time()
    for t in threads:
        t.start()
    for t in threads:
        t.join()
    elapsed = time.time() - start
    
    return elapsed

# Test unsafe vs safe
print("🧪 Testing Thread Safety\n")

num_threads = 10
deposits_per_thread = 100
expected = num_threads * deposits_per_thread * 10

print("Unsafe Account (Race Condition):")
unsafe = UnsafeBankAccount(0)
time_unsafe = run_concurrent_transactions(unsafe, num_threads, deposits_per_thread)
print(f"  Expected: ${expected}")
print(f"  Actual: ${unsafe.balance}")
print(f"  Error: ${expected - unsafe.balance} (lost updates!)")
print(f"  Time: {time_unsafe:.3f}s\n")

print("Safe Account (With Lock):")
safe = ThreadSafeBankAccount(0)
time_safe = run_concurrent_transactions(safe, num_threads, deposits_per_thread)
print(f"  Expected: ${expected}")
print(f"  Actual: ${safe.balance}")
print(f"  Error: $0 (correct!)")
print(f"  Time: {time_safe:.3f}s")

print(f"\n⚠️  Synchronization overhead: {time_safe/time_unsafe:.2f}x slower")
print("But correctness is worth it!")

Producer-Consumer Problem (Classic Concurrency)#

Problem: Multiple producers create items, multiple consumers process them.

Solution: Bounded buffer with semaphores

from threading import Semaphore, Thread
from queue import Queue
import random

class BoundedBuffer:
    """Thread-safe bounded buffer for producer-consumer."""
    
    def __init__(self, capacity):
        self.buffer = Queue(maxsize=capacity)
        self.empty = Semaphore(capacity)  # Count empty slots
        self.full = Semaphore(0)  # Count full slots
        self.mutex = Semaphore(1)  # Mutual exclusion
        
    def produce(self, item):
        """Add item to buffer."""
        self.empty.acquire()  # Wait for empty slot
        self.mutex.acquire()  # Lock buffer
        
        self.buffer.put(item)
        print(f"  Produced: {item} (buffer size: {self.buffer.qsize()})")
        
        self.mutex.release()  # Unlock buffer
        self.full.release()  # Signal full slot
        
    def consume(self):
        """Remove item from buffer."""
        self.full.acquire()  # Wait for full slot
        self.mutex.acquire()  # Lock buffer
        
        item = self.buffer.get()
        print(f"  Consumed: {item} (buffer size: {self.buffer.qsize()})")
        
        self.mutex.release()  # Unlock buffer
        self.empty.release()  # Signal empty slot
        return item

# Demo
buffer = BoundedBuffer(capacity=5)
items_produced = 0
items_consumed = 0

def producer(buffer, count):
    global items_produced
    for i in range(count):
        item = f"item_{i}"
        buffer.produce(item)
        items_produced += 1
        time.sleep(random.uniform(0.01, 0.05))

def consumer(buffer, count):
    global items_consumed
    for _ in range(count):
        buffer.consume()
        items_consumed += 1
        time.sleep(random.uniform(0.01, 0.05))

print("🏭 Producer-Consumer Simulation\n")

num_items = 10
producers = [Thread(target=producer, args=(buffer, num_items // 2)) for _ in range(2)]
consumers = [Thread(target=consumer, args=(buffer, num_items // 2)) for _ in range(2)]

for p in producers:
    p.start()
for c in consumers:
    c.start()

for p in producers:
    p.join()
for c in consumers:
    c.join()

print(f"\n✅ Complete: Produced {items_produced}, Consumed {items_consumed}")

Part 4: Computational Complexity Theory#

Complexity Classes#

        ┌──────────────────────────────┐
        │     EXPTIME (2^n)           │
        │  ┌─────────────────────────┐│
        │  │  PSPACE (polynomial)    ││
        │  │ ┌──────────────────────┐││
        │  │ │   NP-Complete        │││
        │  │ │  ┌────────────────┐  │││
        │  │ │  │  NP-Hard       │  │││
        │  │ │  │  ┌──────────┐  │  │││
        │  │ │  │  │    P     │  │  │││
        │  │ │  │  └──────────┘  │  │││
        │  │ │  └────────────────┘  │││
        │  │ └──────────────────────┘││
        │  └─────────────────────────┘│
        └──────────────────────────────┘

P vs NP (Million Dollar Question)#

P (Polynomial Time):

Problems solvable in O(n^k) time
Examples: Sorting, shortest path, matrix multiplication

NP (Nondeterministic Polynomial):

Solutions verifiable in polynomial time
Examples: SAT, Traveling Salesman, Graph Coloring

NP-Complete:

Hardest problems in NP
If any NP-Complete problem is in P, then P = NP
Examples: SAT, 3-SAT, Hamiltonian Path

NP-Hard:

At least as hard as NP-Complete
May not be in NP
Examples: Halting Problem, TSP optimization

Famous NP-Complete Problems#

Problem	Description	Verification Time
SAT	Boolean satisfiability	O(n)
3-SAT	SAT with 3 literals per clause	O(n)
TSP Decision	Is there tour < k?	O(n)
Vertex Cover	Cover with k vertices?	O(n)
Knapsack	Can fit with value ≥ k?	O(n)

import math
from itertools import permutations
import time
import numpy as np

def tsp_brute_force(distances):
    """Solve TSP with brute force - O(n!) exponential."""
    n = len(distances)
    cities = list(range(n))
    min_distance = float('inf')
    best_path = None
    permutations_checked = 0
    
    # Try all permutations (starting from city 0)
    for perm in permutations(cities[1:]):
        path = [0] + list(perm) + [0]  # Return to start
        distance = sum(distances[path[i]][path[i+1]] for i in range(len(path)-1))
        permutations_checked += 1
        
        if distance < min_distance:
            min_distance = distance
            best_path = path
    
    return best_path, min_distance, permutations_checked

def tsp_greedy(distances):
    """Approximate TSP with greedy nearest neighbor - O(n²) polynomial."""
    n = len(distances)
    unvisited = set(range(1, n))
    path = [0]
    total_distance = 0
    
    current = 0
    while unvisited:
        # Find nearest unvisited city
        nearest = min(unvisited, key=lambda city: distances[current][city])
        total_distance += distances[current][nearest]
        path.append(nearest)
        unvisited.remove(nearest)
        current = nearest
    
    # Return to start
    total_distance += distances[current][0]
    path.append(0)
    
    return path, total_distance

def verify_tsp_solution(distances, path, claimed_distance):
    """Verify TSP solution - O(n) polynomial (easy to verify!)."""
    actual_distance = sum(distances[path[i]][path[i+1]] for i in range(len(path)-1))
    return actual_distance == claimed_distance

# Compare algorithms
print("🗺️  Traveling Salesman Problem (TSP)\n")

# Small example (5 cities)
distances = np.array([
    [0, 10, 15, 20, 25],
    [10, 0, 35, 25, 30],
    [15, 35, 0, 30, 20],
    [20, 25, 30, 0, 15],
    [25, 30, 20, 15, 0]
])

n_cities = len(distances)

print(f"Problem: {n_cities} cities\n")

# Exact solution (brute force)
start = time.time()
optimal_path, optimal_dist, perms = tsp_brute_force(distances)
brute_time = time.time() - start

print(f"Optimal Solution (Brute Force):")
print(f"  Path: {optimal_path}")
print(f"  Distance: {optimal_dist}")
print(f"  Permutations checked: {perms:,}")
print(f"  Time: {brute_time:.6f}s\n")

# Approximation (greedy)
start = time.time()
greedy_path, greedy_dist = tsp_greedy(distances)
greedy_time = time.time() - start

print(f"Greedy Approximation:")
print(f"  Path: {greedy_path}")
print(f"  Distance: {greedy_dist}")
print(f"  Approximation ratio: {greedy_dist/optimal_dist:.2f}x")
print(f"  Time: {greedy_time:.6f}s")
print(f"  Speedup: {brute_time/greedy_time:.0f}x faster\n")

# Verification is fast!
start = time.time()
is_valid = verify_tsp_solution(distances, optimal_path, optimal_dist)
verify_time = time.time() - start

print(f"Solution Verification:")
print(f"  Valid: {is_valid}")
print(f"  Time: {verify_time:.6f}s (instant!)\n")

# Scaling analysis
print("📊 Complexity Analysis:")
print(f"  Brute force: O({n_cities}!) = {math.factorial(n_cities-1):,} permutations")
print(f"  Greedy: O({n_cities}²) = {n_cities**2} operations")
print(f"  Verification: O({n_cities}) = {n_cities} operations")
print(f"\n  For 10 cities: {math.factorial(9):,} permutations! (intractable)")
print(f"  For 20 cities: {math.factorial(19):,} permutations! (universe ends first)")

Reductions (Proving NP-Completeness)#

Idea: To prove problem B is NP-Complete:

Show B is in NP (solution verifiable in polynomial time)
Reduce known NP-Complete problem A to B
If we can solve B quickly, we can solve A quickly

Cook-Levin Theorem (1971): SAT is NP-Complete (first proven NP-Complete problem)

Part 5: Automata Theory and Formal Languages#

Chomsky Hierarchy#

Type	Language Class	Automaton	Example
Type 0	Recursively Enumerable	Turing Machine	Any computable
Type 1	Context-Sensitive	Linear-bounded	Natural language
Type 2	Context-Free	Pushdown	Programming languages
Type 3	Regular	Finite State	Pattern matching

Finite State Machine (FSM)#

Components:

States (Q): Finite set
Alphabet (Σ): Input symbols
Transition function (δ): Q × Σ → Q
Start state (q₀)
Accept states (F)

Applications:

Regular expressions
Lexical analysis (tokenization)
Protocol verification
Game AI

class FiniteStateMachine:
    """General-purpose finite state machine."""
    
    def __init__(self, states, alphabet, transitions, start_state, accept_states):
        self.states = states
        self.alphabet = alphabet
        self.transitions = transitions  # {(state, symbol): next_state}
        self.start_state = start_state
        self.accept_states = accept_states
        self.current_state = start_state
        
    def reset(self):
        """Reset to start state."""
        self.current_state = self.start_state
        
    def transition(self, symbol):
        """Process single symbol."""
        key = (self.current_state, symbol)
        if key in self.transitions:
            self.current_state = self.transitions[key]
            return True
        return False  # Invalid transition
        
    def accepts(self, input_string):
        """Check if FSM accepts the input string."""
        self.reset()
        
        for i, symbol in enumerate(input_string):
            if not self.transition(symbol):
                print(f"  Rejected at position {i}: '{symbol}'")
                return False
            print(f"  '{symbol}' → {self.current_state}")
        
        accepted = self.current_state in self.accept_states
        return accepted

# Example: FSM that accepts binary strings with even number of 1s
print("🤖 FSM: Accept binary strings with even number of 1s\n")

fsm_even_ones = FiniteStateMachine(
    states={'EVEN', 'ODD'},
    alphabet={'0', '1'},
    transitions={
        ('EVEN', '0'): 'EVEN',  # Stay in EVEN on 0
        ('EVEN', '1'): 'ODD',   # Go to ODD on 1
        ('ODD', '0'): 'ODD',    # Stay in ODD on 0
        ('ODD', '1'): 'EVEN',   # Go to EVEN on 1
    },
    start_state='EVEN',
    accept_states={'EVEN'}
)

test_strings = ['0011', '101', '1100', '111']

for test in test_strings:
    print(f"Testing: '{test}'")
    result = fsm_even_ones.accepts(test)
    ones_count = test.count('1')
    print(f"  Result: {'✅ ACCEPT' if result else '❌ REJECT'}")
    print(f"  (Has {ones_count} ones)\n")

Regular Expressions ≡ Finite Automata#

Theorem: Regular languages can be recognized by both:

Regular expressions
Finite state machines

Example: (0|1)*1(0|1)* matches strings containing at least one ‘1’

Part 6: Virtual Memory and Paging#

Why Virtual Memory?#

Benefits:

Isolation: Processes can’t access each other’s memory
Larger address space: Programs think they have more RAM
Sharing: Multiple processes can share code (libraries)
Protection: Read-only, read-write, execute permissions

Address Translation#

Virtual Address = [Page Number | Offset]
                        ↓
                  Page Table Lookup
                        ↓
Physical Address = [Frame Number | Offset]

Page Replacement Algorithms#

Algorithm	Strategy	Complexity
FIFO	Remove oldest page	O(1)
LRU	Remove least recently used	O(1) with hardware
LFU	Remove least frequently used	O(log n)
Clock	Approximation of LRU	O(1)

class VirtualMemorySimulator:
    """Simulate virtual memory with paging."""
    
    def __init__(self, num_frames, page_size, replacement='LRU'):
        self.page_size = page_size
        self.num_frames = num_frames
        self.replacement = replacement
        
        # Page table: virtual page → physical frame
        self.page_table = {}
        
        # Physical memory
        self.frames = [None] * num_frames
        self.frame_age = {}  # For LRU
        
        # Statistics
        self.page_faults = 0
        self.page_hits = 0
        self.access_count = 0
        
    def translate(self, virtual_address):
        """Translate virtual address to physical address."""
        page_num = virtual_address // self.page_size
        offset = virtual_address % self.page_size
        
        self.access_count += 1
        
        if page_num in self.page_table:
            # Page hit!
            self.page_hits += 1
            frame = self.page_table[page_num]
            self.frame_age[frame] = self.access_count  # Update access time
            physical_addr = frame * self.page_size + offset
            return physical_addr, 'HIT'
        else:
            # Page fault!
            self.page_faults += 1
            frame = self._load_page(page_num)
            physical_addr = frame * self.page_size + offset
            return physical_addr, 'FAULT'
    
    def _load_page(self, page_num):
        """Load page into physical memory."""
        # Find free frame or evict
        if None in self.frames:
            frame = self.frames.index(None)
        else:
            frame = self._evict_page()
        
        # Load page
        self.frames[frame] = page_num
        self.page_table[page_num] = frame
        self.frame_age[frame] = self.access_count
        
        return frame
    
    def _evict_page(self):
        """Evict page based on replacement policy."""
        if self.replacement == 'LRU':
            # Remove least recently used
            lru_frame = min(self.frame_age, key=self.frame_age.get)
        elif self.replacement == 'FIFO':
            # Remove oldest (first added)
            lru_frame = 0  # Simplified: always evict first
        else:
            lru_frame = 0
        
        # Remove from page table
        old_page = self.frames[lru_frame]
        if old_page in self.page_table:
            del self.page_table[old_page]
        
        return lru_frame
    
    def stats(self):
        """Return performance statistics."""
        total = self.page_hits + self.page_faults
        hit_rate = self.page_hits / total if total > 0 else 0
        return {
            'hits': self.page_hits,
            'faults': self.page_faults,
            'hit_rate': hit_rate,
            'fault_rate': 1 - hit_rate
        }

# Simulate memory access pattern
print("💾 Virtual Memory Simulation\n")

vm = VirtualMemorySimulator(num_frames=4, page_size=1000, replacement='LRU')

# Access pattern (address → page number)
# 0-999 → page 0, 1000-1999 → page 1, etc.
access_pattern = [
    100,    # Page 0 (fault)
    1200,   # Page 1 (fault)
    250,    # Page 0 (hit)
    2500,   # Page 2 (fault)
    3100,   # Page 3 (fault)
    4000,   # Page 4 (fault, eviction!)
    1250,   # Page 1 (hit)
    2750,   # Page 2 (hit)
    100,    # Page 0 (fault, was evicted)
]

print(f"{'Virtual Addr':<15} {'Page':<8} {'Physical Addr':<15} {'Result'}")
print("─" * 55)

for vaddr in access_pattern:
    paddr, result = vm.translate(vaddr)
    page = vaddr // vm.page_size
    status = "🟢 HIT" if result == 'HIT' else "🔴 FAULT"
    print(f"{vaddr:<15} {page:<8} {paddr:<15} {status}")

print("\n" + "─" * 55)
stats = vm.stats()
print(f"Page Hits: {stats['hits']}")
print(f"Page Faults: {stats['faults']}")
print(f"Hit Rate: {stats['hit_rate']*100:.1f}%")
print(f"Fault Rate: {stats['fault_rate']*100:.1f}%")

Part 7: Modern Hardware and Parallelism#

Multi-Core vs GPU#

Feature	CPU (Multi-core)	GPU
Cores	4-64	1000s-10000s
Clock Speed	3-5 GHz	1-2 GHz
Use Case	General purpose, complex control flow	Data parallel, simple operations
Memory	Large cache, low latency	High bandwidth, high latency
Best For	Databases, web servers, compilers	Deep learning, graphics, crypto

SIMD (Single Instruction, Multiple Data)#

Idea: Apply same operation to multiple data elements simultaneously

Example: Vector addition

Scalar: for i in range(n): c[i] = a[i] + b[i]  # n operations
SIMD:   c = a + b                              # 1 operation (hardware vectorized)

Applications:

Image/video processing
Scientific computing
Machine learning
Cryptography

from multiprocessing import Pool, cpu_count
import numpy as np

def parallel_computation_demo():
    """Compare serial vs parallel execution."""
    
    def compute_intensive_task(n):
        """Simulate CPU-intensive task."""
        result = 0
        for i in range(n):
            result += i ** 2
        return result
    
    tasks = [1000000] * 8  # 8 tasks
    
    # Serial execution
    start = time.time()
    serial_results = [compute_intensive_task(t) for t in tasks]
    serial_time = time.time() - start
    
    # Parallel execution
    start = time.time()
    with Pool(processes=cpu_count()) as pool:
        parallel_results = pool.map(compute_intensive_task, tasks)
    parallel_time = time.time() - start
    
    print("⚡ Parallel Computation Comparison\n")
    print(f"Number of tasks: {len(tasks)}")
    print(f"CPU cores: {cpu_count()}\n")
    print(f"Serial time: {serial_time:.3f}s")
    print(f"Parallel time: {parallel_time:.3f}s")
    print(f"Speedup: {serial_time/parallel_time:.2f}x")
    print(f"Efficiency: {(serial_time/parallel_time)/cpu_count()*100:.1f}%")
    
    # Amdahl's Law prediction
    print(f"\n📊 Amdahl's Law Analysis:")
    print(f"  Assuming 100% parallelizable (S=0):")
    print(f"  Theoretical max speedup: {cpu_count()}x")
    print(f"  Actual speedup: {serial_time/parallel_time:.2f}x")
    print(f"  Overhead: ~{(1 - parallel_time*cpu_count()/serial_time)*100:.1f}%")

parallel_computation_demo()

Part 8: Exercises#

Exercise 1: Process Scheduler (⭐⭐)#

Implement CPU scheduling algorithms:

Round Robin: Each process gets fixed time slice
Shortest Job First: Execute shortest process first
Priority Scheduling: Execute highest priority first
Compare average waiting time and turnaround time
Visualize Gantt chart

# Exercise 1: Your code here
# Hint: Create Process class with arrival_time, burst_time, priority
# Hint: Implement scheduler that maintains ready queue

Exercise 2: Deadlock Detection (⭐⭐⭐)#

Implement deadlock detection:

Create resource allocation graph
Implement cycle detection algorithm
Simulate Banker’s Algorithm for deadlock avoidance
Test with various resource request patterns

# Exercise 2: Your code here
# Hint: Use graph representation (adjacency list)
# Hint: DFS for cycle detection

Exercise 3: Memory Allocator (⭐⭐⭐)#

Implement dynamic memory allocation:

First-fit: Allocate first block large enough
Best-fit: Allocate smallest block large enough
Worst-fit: Allocate largest available block
Handle fragmentation (external and internal)
Compare allocation success rate and fragmentation

# Exercise 3: Your code here
# Hint: Maintain list of free blocks
# Hint: Merge adjacent free blocks

Exercise 4: NP-Completeness Proof (⭐⭐⭐⭐)#

Prove a problem is NP-Complete:

Choose problem (e.g., Vertex Cover, Subset Sum)
Show it’s in NP (create verifier)
Reduce from known NP-Complete problem (e.g., 3-SAT)
Implement reduction algorithm
Test with examples

# Exercise 4: Your code here
# Hint: Start with verifier (polynomial time check)
# Hint: For reduction, show if we can solve B, we can solve A

Exercise 5: Build a Virtual Machine (⭐⭐⭐⭐)#

Create a simple stack-based VM:

Design bytecode instruction set (PUSH, POP, ADD, etc.)
Implement stack operations
Support function calls (CALL, RET)
Add conditional jumps (JZ, JNZ)
Write “compiler” to generate bytecode from simple expressions

# Exercise 5: Your code here
# Hint: Maintain operand stack and call stack
# Hint: VM state: {pc, stack, memory, callstack}

Self-Check Quiz#

Test your understanding:

What is the fetch-decode-execute cycle?
- A) How CPU processes instructions
- B) How memory is accessed
- C) How cache works
- D) How I/O happens
Which cache replacement policy is most common?
- A) FIFO
- B) LRU
- C) Random
- D) LFU
What is the difference between concurrency and parallelism?
- A) No difference
- B) Concurrency is interleaved, parallelism is simultaneous
- C) Parallelism is slower
- D) Concurrency requires multiple CPUs
What does P = NP mean?
- A) All problems are easy
- B) Problems with fast verification also have fast solutions
- C) Computers are perfect
- D) No hard problems exist
What is NP-Complete?
- A) Easiest problems
- B) Hardest problems in NP
- C) Impossible problems
- D) Polynomial problems
What can a finite state machine recognize?
- A) Any language
- B) Regular languages only
- C) Context-free languages
- D) Natural languages
What is a page fault?
- A) Hardware error
- B) Accessing page not in physical memory
- C) Software bug
- D) Network error
What is Amdahl’s Law about?
- A) Cache performance
- B) Parallel speedup limits
- C) Memory bandwidth
- D) Disk throughput
Which synchronization primitive prevents race conditions?
- A) Barrier
- B) Mutex/Lock
- C) Semaphore
- D) All of the above
What is SIMD?
- A) Sequential processing
- B) Same operation on multiple data
- C) Multiple operations on same data
- D) Random processing

Answers: 1-A, 2-B, 3-B, 4-B, 5-B, 6-B, 7-B, 8-B, 9-D, 10-B

Key Takeaways#

Computer Architecture#

✅ CPU executes fetch-decode-execute cycle
✅ Memory hierarchy: Registers → Cache → RAM → Disk
✅ Cache exploits temporal and spatial locality
✅ LRU is most common cache replacement policy

Concurrency#

✅ Concurrency ≠ Parallelism (interleaved vs simultaneous)
✅ Race conditions require synchronization (locks, semaphores)
✅ Amdahl’s Law limits parallel speedup
✅ Always test concurrent code thoroughly

Complexity Theory#

✅ P = problems solvable in polynomial time
✅ NP = problems verifiable in polynomial time
✅ NP-Complete = hardest problems in NP
✅ If P = NP, crypto breaks (major unsolved problem)

Automata Theory#

✅ FSMs recognize regular languages
✅ Regular expressions ≡ Finite automata
✅ Pushdown automata ≡ Context-free grammars
✅ Turing machines ≡ General computation

Virtual Memory#

✅ Provides isolation and protection
✅ Page faults load pages from disk
✅ TLB caches address translations
✅ Page replacement crucial for performance

Pro Tips#

Profile before optimizing: Measure where time is spent
Cache-friendly code: Access data sequentially when possible
Avoid false sharing: Pad shared data to cache line boundaries
Lock granularity: Fine-grained locks = more parallelism, more overhead
Use lock-free data structures: When possible (hard to get right!)
P vs NP awareness: Don’t try exact solutions for NP-Hard problems at scale
Approximation algorithms: Often good enough in practice
Vectorize code: Use NumPy/SIMD for data parallel operations
Understand your hardware: CPU cores, cache sizes, memory bandwidth
Test concurrency: Race conditions are non-deterministic bugs

Common Mistakes#

❌ Forgetting to synchronize shared data
❌ Holding locks too long (reduces parallelism)
❌ Ignoring cache effects in tight loops
❌ Trying exact solutions for large NP-Complete problems
❌ Not considering false sharing in parallel code
❌ Over-parallelizing (overhead exceeds benefit)
❌ Assuming atomic operations when they’re not
❌ Creating too many threads (context switch overhead)

Debug Checklist#

⚠️ Race condition → Add synchronization, test with ThreadSanitizer
⚠️ Deadlock → Check lock ordering, use timeout
⚠️ Poor cache performance → Profile cache misses, improve locality
⚠️ Slow parallel code → Check for serial bottlenecks (Amdahl’s Law)
⚠️ Page faults → Increase memory, improve access patterns

What’s Next?#

Continue in Hard Track:#

Lesson 7: Project Ideas (apply all knowledge!)
Lesson 8: Classic Problems (interview preparation)
Lesson 9: CTF Challenges (security skills)

Deepen Your Systems Knowledge:#

MIT 6.004: Computation Structures
CMU 15-213: Computer Systems (CSAPP book)
Operating Systems: Three Easy Pieces (free online)
Coursera: Parallel Programming

Practice Projects:#

Build a shell (bash clone)
Implement malloc/free
Write a thread pool
Create a key-value store
Build a simple OS kernel

Resources:#

Books: CSAPP, Modern Operating Systems (Tanenbaum)
Online: OSDev wiki, Linux kernel source
Tools: perf, valgrind, gdb, ThreadSanitizer

Congratulations! You now understand how computers work at the systems level and can reason about complexity! 🖥️