Lesson 6: Computer Systems and Theory

Contents

Lesson 6: Computer Systems and Theory#

Master computer architecture, operating systems, and theoretical computer science

Real-World Context#

Understanding computer systems is essential for writing efficient code, debugging performance issues, and system design. This knowledge powers: high-performance computing (Google scale), operating systems (Linux kernel), embedded systems (smartphones), and distributed systems (cloud infrastructure).

What Youโ€™ll Learn#

  1. Computer Architecture: CPU, memory hierarchy, instruction execution

  2. Memory Management: Virtual memory, paging, caching strategies

  3. Concurrency: Threads, processes, synchronization primitives

  4. Operating Systems: Scheduling, file systems, I/O management

  5. Computational Complexity: P vs NP, complexity classes, reductions

  6. Automata Theory: Finite state machines, regular expressions, grammars

  7. Modern Hardware: Multi-core, GPUs, SIMD, memory models

  8. System Design: Scalability, reliability, performance optimization

Prerequisites: Python, basic data structures, algorithms

Time: 4-5 hours

Part 1: Computer Architecture Fundamentals#

Von Neumann Architecture (1945)#

Core Components:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  CPU (Central Processing Unit)      โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚ Control Unit โ”‚  โ”‚     ALU     โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚       Registers              โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ†• (Bus)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚          Memory (RAM)                โ”‚
โ”‚  [Instructions] [Data] [Stack]      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ†•
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚        I/O Devices                   โ”‚
โ”‚  Keyboard, Display, Disk, Network   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

CPU Execution Cycle: Fetch-Decode-Execute#

  1. Fetch: Load instruction from memory (PC โ†’ Memory โ†’ IR)

  2. Decode: Interpret instruction (IR โ†’ Control Unit)

  3. Execute: Perform operation (ALU, Registers)

  4. Write-back: Store result (Registers โ†’ Memory)

Instruction Types#

Category

Examples

Purpose

Data Transfer

LOAD, STORE, MOVE

Move data between registers/memory

Arithmetic

ADD, SUB, MUL, DIV

Mathematical operations

Logical

AND, OR, XOR, NOT

Bitwise operations

Control Flow

JMP, JZ, CALL, RET

Program flow control

I/O

IN, OUT

Interact with devices

class SimpleCPU:
    """Educational CPU simulator with fetch-decode-execute cycle."""
    
    def __init__(self):
        # Registers
        self.registers = {'A': 0, 'B': 0, 'C': 0, 'D': 0}
        self.pc = 0  # Program Counter
        self.ir = None  # Instruction Register
        self.flags = {'zero': False, 'carry': False}  # Status flags
        
        # Memory
        self.memory = [0] * 256
        
        # Statistics
        self.cycles = 0
        
    def fetch(self, program):
        """Fetch instruction from program memory."""
        if self.pc < len(program):
            self.ir = program[self.pc]
            self.pc += 1
            self.cycles += 1  # Fetch takes 1 cycle
            return True
        return False
    
    def decode_execute(self):
        """Decode and execute instruction."""
        if self.ir is None:
            return
        
        parts = self.ir.split()
        opcode = parts[0]
        
        if opcode == 'LOAD':
            # LOAD reg, value
            reg, value = parts[1], int(parts[2])
            self.registers[reg] = value
            print(f"  LOAD {reg}, {value}  โ†’ {reg}={value}")
            
        elif opcode == 'ADD':
            # ADD reg1, reg2, dest
            r1, r2, dest = parts[1], parts[2], parts[3]
            result = self.registers[r1] + self.registers[r2]
            self.registers[dest] = result
            self.flags['zero'] = (result == 0)
            print(f"  ADD {r1}, {r2}, {dest}  โ†’ {dest}={result}")
            
        elif opcode == 'SUB':
            # SUB reg1, reg2, dest
            r1, r2, dest = parts[1], parts[2], parts[3]
            result = self.registers[r1] - self.registers[r2]
            self.registers[dest] = result
            self.flags['zero'] = (result == 0)
            print(f"  SUB {r1}, {r2}, {dest}  โ†’ {dest}={result}")
            
        elif opcode == 'MUL':
            # MUL reg1, reg2, dest
            r1, r2, dest = parts[1], parts[2], parts[3]
            result = self.registers[r1] * self.registers[r2]
            self.registers[dest] = result
            print(f"  MUL {r1}, {r2}, {dest}  โ†’ {dest}={result}")
            
        elif opcode == 'STORE':
            # STORE reg, address
            reg, addr = parts[1], int(parts[2])
            self.memory[addr] = self.registers[reg]
            print(f"  STORE {reg}, [{addr}]  โ†’ Memory[{addr}]={self.registers[reg]}")
            
        elif opcode == 'JMP':
            # JMP address
            addr = int(parts[1])
            self.pc = addr
            print(f"  JMP {addr}  โ†’ PC={addr}")
            
        elif opcode == 'HALT':
            print(f"  HALT")
            return False
            
        self.cycles += 1  # Execute takes 1 cycle (simplified)
        return True
    
    def run(self, program):
        """Execute complete program."""
        print("๐Ÿ–ฅ๏ธ  CPU Execution Start\n")
        print("Cycle | Instruction")
        print("โ”€" * 40)
        
        while self.fetch(program):
            print(f"{self.cycles:3d}   |", end=" ")
            if not self.decode_execute():
                break
        
        print("\n" + "โ”€" * 40)
        print("โœ… Execution Complete\n")
        print(f"Final Registers: {self.registers}")
        print(f"Total Cycles: {self.cycles}")
        print(f"Instructions Executed: {self.pc}")

# Example program: Calculate (10 + 20) * 3
program = [
    "LOAD A 10",
    "LOAD B 20",
    "ADD A B C",    # C = A + B = 30
    "LOAD D 3",
    "MUL C D A",    # A = C * D = 90
    "STORE A 100",  # Store result at memory[100]
    "HALT"
]

cpu = SimpleCPU()
cpu.run(program)

Part 2: Memory Hierarchy and Caching#

The Memory Wall Problem#

CPU speed has increased 1000ร— faster than memory speed over 30 years!

Memory Hierarchy (Latency & Size)#

Level

Latency

Size

Cost/GB

Registers

0.3 ns (~1 cycle)

1 KB

N/A

L1 Cache

1 ns (~4 cycles)

32-64 KB

$10,000

L2 Cache

3 ns (~12 cycles)

256 KB-1 MB

$1,000

L3 Cache

12 ns (~40 cycles)

8-64 MB

$100

RAM

100 ns (~300 cycles)

8-64 GB

$5

SSD

100 ฮผs (100,000 ns)

256 GB-2 TB

$0.20

HDD

10 ms (10,000,000 ns)

1-10 TB

$0.02

Cache Principles#

  1. Temporal Locality: Recently accessed data likely to be accessed again

  2. Spatial Locality: Nearby data likely to be accessed together

Cache Replacement Policies#

Policy

Strategy

Use Case

LRU

Evict least recently used

General purpose (most common)

LFU

Evict least frequently used

Long-term patterns

FIFO

Evict oldest

Simple hardware

Random

Evict random entry

Fastest, unpredictable

from collections import OrderedDict
import random

class CacheSimulator:
    """Simulate different cache replacement policies."""
    
    def __init__(self, capacity, policy='LRU'):
        self.capacity = capacity
        self.policy = policy
        self.cache = OrderedDict() if policy == 'LRU' else {}
        self.access_counts = {}  # For LFU
        self.hits = 0
        self.misses = 0
        self.evictions = 0
        
    def access(self, address):
        """Access memory address."""
        if address in self.cache:
            self.hits += 1
            
            if self.policy == 'LRU':
                self.cache.move_to_end(address)  # Mark as recently used
            elif self.policy == 'LFU':
                self.access_counts[address] += 1
                
            return 'HIT'
        else:
            self.misses += 1
            self._load_to_cache(address)
            return 'MISS'
    
    def _load_to_cache(self, address):
        """Load address into cache, evicting if necessary."""
        # Evict if full
        if len(self.cache) >= self.capacity:
            self._evict()
            
        # Add to cache
        self.cache[address] = f"data_{address}"
        if self.policy == 'LFU':
            self.access_counts[address] = 1
    
    def _evict(self):
        """Evict entry based on policy."""
        self.evictions += 1
        
        if self.policy == 'LRU':
            # Remove least recently used (first item)
            self.cache.popitem(last=False)
            
        elif self.policy == 'LFU':
            # Remove least frequently used
            lfu_key = min(self.access_counts, key=self.access_counts.get)
            del self.cache[lfu_key]
            del self.access_counts[lfu_key]
            
        elif self.policy == 'FIFO':
            # Remove first added
            first_key = next(iter(self.cache))
            del self.cache[first_key]
            
        elif self.policy == 'RANDOM':
            # Remove random
            random_key = random.choice(list(self.cache.keys()))
            del self.cache[random_key]
    
    def stats(self):
        """Calculate performance statistics."""
        total = self.hits + self.misses
        hit_rate = self.hits / total if total > 0 else 0
        return {
            'hits': self.hits,
            'misses': self.misses,
            'hit_rate': hit_rate,
            'evictions': self.evictions
        }

# Compare cache policies
print("๐Ÿ“Š Cache Policy Comparison\n")

# Access pattern with locality
access_pattern = (
    [1, 2, 3, 4, 5] * 3 +       # Repeated accesses (temporal locality)
    list(range(1, 10)) +         # Sequential access (spatial locality)
    [1, 2, 3] * 5 +              # Hot data
    list(range(10, 20))          # New data
)

policies = ['LRU', 'LFU', 'FIFO', 'RANDOM']
results = {}

for policy in policies:
    cache = CacheSimulator(capacity=5, policy=policy)
    
    for addr in access_pattern:
        cache.access(addr)
    
    results[policy] = cache.stats()

# Display results
print(f"{'Policy':<10} {'Hits':<8} {'Misses':<8} {'Hit Rate':<10} {'Evictions':<10}")
print("โ”€" * 50)
for policy, stats in results.items():
    print(f"{policy:<10} {stats['hits']:<8} {stats['misses']:<8} "
          f"{stats['hit_rate']*100:>7.1f}%  {stats['evictions']:<10}")

print("\n๐Ÿ† Winner: LRU typically performs best for most workloads")

Cache Performance Analysis#

Average Access Time:

AAT = Hit_Rate ร— Cache_Latency + Miss_Rate ร— (Cache_Latency + Memory_Latency)

Example:

  • Cache hit: 4 cycles

  • Memory access: 100 cycles

  • Hit rate: 95%

AAT = 0.95 ร— 4 + 0.05 ร— (4 + 100) = 3.8 + 5.2 = 9 cycles

Without cache: 100 cycles โ†’ With cache: 9 cycles = 11ร— speedup!

Part 3: Concurrency and Parallelism#

Key Concepts#

Concept

Definition

Example

Concurrency

Multiple tasks making progress (may be interleaved)

Single-core multitasking

Parallelism

Multiple tasks executing simultaneously

Multi-core execution

Thread

Lightweight execution unit (shares memory)

Worker threads

Process

Heavy execution unit (isolated memory)

Separate programs

Amdahlโ€™s Law (Parallel Speedup)#

Speedup = 1 / (S + (1 - S) / N)

Where:

  • S = Serial fraction (cannot be parallelized)

  • N = Number of processors

Example: If 10% is serial (S=0.1), max speedup with infinite cores = 10ร—

Synchronization Problems#

  1. Race Condition: Multiple threads access shared data without coordination

  2. Deadlock: Circular waiting for resources

  3. Starvation: Thread never gets CPU time

  4. Priority Inversion: Low-priority task blocks high-priority task

import threading
import time

class ThreadSafeBankAccount:
    """Bank account with proper synchronization."""
    
    def __init__(self, balance=0):
        self.balance = balance
        self.lock = threading.Lock()  # Mutual exclusion
        self.transactions = 0
        
    def deposit(self, amount):
        """Thread-safe deposit."""
        with self.lock:  # Acquire lock automatically
            current = self.balance
            time.sleep(0.0001)  # Simulate processing delay
            self.balance = current + amount
            self.transactions += 1
            
    def withdraw(self, amount):
        """Thread-safe withdrawal."""
        with self.lock:
            if self.balance >= amount:
                current = self.balance
                time.sleep(0.0001)
                self.balance = current - amount
                self.transactions += 1
                return True
            return False
    
    def get_balance(self):
        """Thread-safe balance check."""
        with self.lock:
            return self.balance

# Demonstrate race condition (without lock)
class UnsafeBankAccount:
    def __init__(self, balance=0):
        self.balance = balance
    
    def deposit(self, amount):
        current = self.balance
        time.sleep(0.0001)  # Context switch happens here!
        self.balance = current + amount

def run_concurrent_transactions(account, num_threads, deposits_per_thread):
    """Run concurrent deposit operations."""
    def worker():
        for _ in range(deposits_per_thread):
            account.deposit(10)
    
    threads = [threading.Thread(target=worker) for _ in range(num_threads)]
    
    start = time.time()
    for t in threads:
        t.start()
    for t in threads:
        t.join()
    elapsed = time.time() - start
    
    return elapsed

# Test unsafe vs safe
print("๐Ÿงช Testing Thread Safety\n")

num_threads = 10
deposits_per_thread = 100
expected = num_threads * deposits_per_thread * 10

print("Unsafe Account (Race Condition):")
unsafe = UnsafeBankAccount(0)
time_unsafe = run_concurrent_transactions(unsafe, num_threads, deposits_per_thread)
print(f"  Expected: ${expected}")
print(f"  Actual: ${unsafe.balance}")
print(f"  Error: ${expected - unsafe.balance} (lost updates!)")
print(f"  Time: {time_unsafe:.3f}s\n")

print("Safe Account (With Lock):")
safe = ThreadSafeBankAccount(0)
time_safe = run_concurrent_transactions(safe, num_threads, deposits_per_thread)
print(f"  Expected: ${expected}")
print(f"  Actual: ${safe.balance}")
print(f"  Error: $0 (correct!)")
print(f"  Time: {time_safe:.3f}s")

print(f"\nโš ๏ธ  Synchronization overhead: {time_safe/time_unsafe:.2f}x slower")
print("But correctness is worth it!")

Producer-Consumer Problem (Classic Concurrency)#

Problem: Multiple producers create items, multiple consumers process them.

Solution: Bounded buffer with semaphores

from threading import Semaphore, Thread
from queue import Queue
import random

class BoundedBuffer:
    """Thread-safe bounded buffer for producer-consumer."""
    
    def __init__(self, capacity):
        self.buffer = Queue(maxsize=capacity)
        self.empty = Semaphore(capacity)  # Count empty slots
        self.full = Semaphore(0)  # Count full slots
        self.mutex = Semaphore(1)  # Mutual exclusion
        
    def produce(self, item):
        """Add item to buffer."""
        self.empty.acquire()  # Wait for empty slot
        self.mutex.acquire()  # Lock buffer
        
        self.buffer.put(item)
        print(f"  Produced: {item} (buffer size: {self.buffer.qsize()})")
        
        self.mutex.release()  # Unlock buffer
        self.full.release()  # Signal full slot
        
    def consume(self):
        """Remove item from buffer."""
        self.full.acquire()  # Wait for full slot
        self.mutex.acquire()  # Lock buffer
        
        item = self.buffer.get()
        print(f"  Consumed: {item} (buffer size: {self.buffer.qsize()})")
        
        self.mutex.release()  # Unlock buffer
        self.empty.release()  # Signal empty slot
        return item

# Demo
buffer = BoundedBuffer(capacity=5)
items_produced = 0
items_consumed = 0

def producer(buffer, count):
    global items_produced
    for i in range(count):
        item = f"item_{i}"
        buffer.produce(item)
        items_produced += 1
        time.sleep(random.uniform(0.01, 0.05))

def consumer(buffer, count):
    global items_consumed
    for _ in range(count):
        buffer.consume()
        items_consumed += 1
        time.sleep(random.uniform(0.01, 0.05))

print("๐Ÿญ Producer-Consumer Simulation\n")

num_items = 10
producers = [Thread(target=producer, args=(buffer, num_items // 2)) for _ in range(2)]
consumers = [Thread(target=consumer, args=(buffer, num_items // 2)) for _ in range(2)]

for p in producers:
    p.start()
for c in consumers:
    c.start()

for p in producers:
    p.join()
for c in consumers:
    c.join()

print(f"\nโœ… Complete: Produced {items_produced}, Consumed {items_consumed}")

Part 4: Computational Complexity Theory#

Complexity Classes#

        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚     EXPTIME (2^n)           โ”‚
        โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”โ”‚
        โ”‚  โ”‚  PSPACE (polynomial)    โ”‚โ”‚
        โ”‚  โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”โ”‚โ”‚
        โ”‚  โ”‚ โ”‚   NP-Complete        โ”‚โ”‚โ”‚
        โ”‚  โ”‚ โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚โ”‚โ”‚
        โ”‚  โ”‚ โ”‚  โ”‚  NP-Hard       โ”‚  โ”‚โ”‚โ”‚
        โ”‚  โ”‚ โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚  โ”‚โ”‚โ”‚
        โ”‚  โ”‚ โ”‚  โ”‚  โ”‚    P     โ”‚  โ”‚  โ”‚โ”‚โ”‚
        โ”‚  โ”‚ โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚  โ”‚โ”‚โ”‚
        โ”‚  โ”‚ โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚โ”‚โ”‚
        โ”‚  โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜โ”‚โ”‚
        โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

P vs NP (Million Dollar Question)#

P (Polynomial Time):

  • Problems solvable in O(n^k) time

  • Examples: Sorting, shortest path, matrix multiplication

NP (Nondeterministic Polynomial):

  • Solutions verifiable in polynomial time

  • Examples: SAT, Traveling Salesman, Graph Coloring

NP-Complete:

  • Hardest problems in NP

  • If any NP-Complete problem is in P, then P = NP

  • Examples: SAT, 3-SAT, Hamiltonian Path

NP-Hard:

  • At least as hard as NP-Complete

  • May not be in NP

  • Examples: Halting Problem, TSP optimization

Famous NP-Complete Problems#

Problem

Description

Verification Time

SAT

Boolean satisfiability

O(n)

3-SAT

SAT with 3 literals per clause

O(n)

TSP Decision

Is there tour < k?

O(n)

Vertex Cover

Cover with k vertices?

O(n)

Knapsack

Can fit with value โ‰ฅ k?

O(n)

import math
from itertools import permutations
import time
import numpy as np

def tsp_brute_force(distances):
    """Solve TSP with brute force - O(n!) exponential."""
    n = len(distances)
    cities = list(range(n))
    min_distance = float('inf')
    best_path = None
    permutations_checked = 0
    
    # Try all permutations (starting from city 0)
    for perm in permutations(cities[1:]):
        path = [0] + list(perm) + [0]  # Return to start
        distance = sum(distances[path[i]][path[i+1]] for i in range(len(path)-1))
        permutations_checked += 1
        
        if distance < min_distance:
            min_distance = distance
            best_path = path
    
    return best_path, min_distance, permutations_checked

def tsp_greedy(distances):
    """Approximate TSP with greedy nearest neighbor - O(nยฒ) polynomial."""
    n = len(distances)
    unvisited = set(range(1, n))
    path = [0]
    total_distance = 0
    
    current = 0
    while unvisited:
        # Find nearest unvisited city
        nearest = min(unvisited, key=lambda city: distances[current][city])
        total_distance += distances[current][nearest]
        path.append(nearest)
        unvisited.remove(nearest)
        current = nearest
    
    # Return to start
    total_distance += distances[current][0]
    path.append(0)
    
    return path, total_distance

def verify_tsp_solution(distances, path, claimed_distance):
    """Verify TSP solution - O(n) polynomial (easy to verify!)."""
    actual_distance = sum(distances[path[i]][path[i+1]] for i in range(len(path)-1))
    return actual_distance == claimed_distance

# Compare algorithms
print("๐Ÿ—บ๏ธ  Traveling Salesman Problem (TSP)\n")

# Small example (5 cities)
distances = np.array([
    [0, 10, 15, 20, 25],
    [10, 0, 35, 25, 30],
    [15, 35, 0, 30, 20],
    [20, 25, 30, 0, 15],
    [25, 30, 20, 15, 0]
])

n_cities = len(distances)

print(f"Problem: {n_cities} cities\n")

# Exact solution (brute force)
start = time.time()
optimal_path, optimal_dist, perms = tsp_brute_force(distances)
brute_time = time.time() - start

print(f"Optimal Solution (Brute Force):")
print(f"  Path: {optimal_path}")
print(f"  Distance: {optimal_dist}")
print(f"  Permutations checked: {perms:,}")
print(f"  Time: {brute_time:.6f}s\n")

# Approximation (greedy)
start = time.time()
greedy_path, greedy_dist = tsp_greedy(distances)
greedy_time = time.time() - start

print(f"Greedy Approximation:")
print(f"  Path: {greedy_path}")
print(f"  Distance: {greedy_dist}")
print(f"  Approximation ratio: {greedy_dist/optimal_dist:.2f}x")
print(f"  Time: {greedy_time:.6f}s")
print(f"  Speedup: {brute_time/greedy_time:.0f}x faster\n")

# Verification is fast!
start = time.time()
is_valid = verify_tsp_solution(distances, optimal_path, optimal_dist)
verify_time = time.time() - start

print(f"Solution Verification:")
print(f"  Valid: {is_valid}")
print(f"  Time: {verify_time:.6f}s (instant!)\n")

# Scaling analysis
print("๐Ÿ“Š Complexity Analysis:")
print(f"  Brute force: O({n_cities}!) = {math.factorial(n_cities-1):,} permutations")
print(f"  Greedy: O({n_cities}ยฒ) = {n_cities**2} operations")
print(f"  Verification: O({n_cities}) = {n_cities} operations")
print(f"\n  For 10 cities: {math.factorial(9):,} permutations! (intractable)")
print(f"  For 20 cities: {math.factorial(19):,} permutations! (universe ends first)")

Reductions (Proving NP-Completeness)#

Idea: To prove problem B is NP-Complete:

  1. Show B is in NP (solution verifiable in polynomial time)

  2. Reduce known NP-Complete problem A to B

  3. If we can solve B quickly, we can solve A quickly

Cook-Levin Theorem (1971): SAT is NP-Complete (first proven NP-Complete problem)

Part 5: Automata Theory and Formal Languages#

Chomsky Hierarchy#

Type

Language Class

Automaton

Example

Type 0

Recursively Enumerable

Turing Machine

Any computable

Type 1

Context-Sensitive

Linear-bounded

Natural language

Type 2

Context-Free

Pushdown

Programming languages

Type 3

Regular

Finite State

Pattern matching

Finite State Machine (FSM)#

Components:

  • States (Q): Finite set

  • Alphabet (ฮฃ): Input symbols

  • Transition function (ฮด): Q ร— ฮฃ โ†’ Q

  • Start state (qโ‚€)

  • Accept states (F)

Applications:

  • Regular expressions

  • Lexical analysis (tokenization)

  • Protocol verification

  • Game AI

class FiniteStateMachine:
    """General-purpose finite state machine."""
    
    def __init__(self, states, alphabet, transitions, start_state, accept_states):
        self.states = states
        self.alphabet = alphabet
        self.transitions = transitions  # {(state, symbol): next_state}
        self.start_state = start_state
        self.accept_states = accept_states
        self.current_state = start_state
        
    def reset(self):
        """Reset to start state."""
        self.current_state = self.start_state
        
    def transition(self, symbol):
        """Process single symbol."""
        key = (self.current_state, symbol)
        if key in self.transitions:
            self.current_state = self.transitions[key]
            return True
        return False  # Invalid transition
        
    def accepts(self, input_string):
        """Check if FSM accepts the input string."""
        self.reset()
        
        for i, symbol in enumerate(input_string):
            if not self.transition(symbol):
                print(f"  Rejected at position {i}: '{symbol}'")
                return False
            print(f"  '{symbol}' โ†’ {self.current_state}")
        
        accepted = self.current_state in self.accept_states
        return accepted

# Example: FSM that accepts binary strings with even number of 1s
print("๐Ÿค– FSM: Accept binary strings with even number of 1s\n")

fsm_even_ones = FiniteStateMachine(
    states={'EVEN', 'ODD'},
    alphabet={'0', '1'},
    transitions={
        ('EVEN', '0'): 'EVEN',  # Stay in EVEN on 0
        ('EVEN', '1'): 'ODD',   # Go to ODD on 1
        ('ODD', '0'): 'ODD',    # Stay in ODD on 0
        ('ODD', '1'): 'EVEN',   # Go to EVEN on 1
    },
    start_state='EVEN',
    accept_states={'EVEN'}
)

test_strings = ['0011', '101', '1100', '111']

for test in test_strings:
    print(f"Testing: '{test}'")
    result = fsm_even_ones.accepts(test)
    ones_count = test.count('1')
    print(f"  Result: {'โœ… ACCEPT' if result else 'โŒ REJECT'}")
    print(f"  (Has {ones_count} ones)\n")

Regular Expressions โ‰ก Finite Automata#

Theorem: Regular languages can be recognized by both:

  1. Regular expressions

  2. Finite state machines

Example: (0|1)*1(0|1)* matches strings containing at least one โ€˜1โ€™

Part 6: Virtual Memory and Paging#

Why Virtual Memory?#

Benefits:

  1. Isolation: Processes canโ€™t access each otherโ€™s memory

  2. Larger address space: Programs think they have more RAM

  3. Sharing: Multiple processes can share code (libraries)

  4. Protection: Read-only, read-write, execute permissions

Address Translation#

Virtual Address = [Page Number | Offset]
                        โ†“
                  Page Table Lookup
                        โ†“
Physical Address = [Frame Number | Offset]

Page Replacement Algorithms#

Algorithm

Strategy

Complexity

FIFO

Remove oldest page

O(1)

LRU

Remove least recently used

O(1) with hardware

LFU

Remove least frequently used

O(log n)

Clock

Approximation of LRU

O(1)

class VirtualMemorySimulator:
    """Simulate virtual memory with paging."""
    
    def __init__(self, num_frames, page_size, replacement='LRU'):
        self.page_size = page_size
        self.num_frames = num_frames
        self.replacement = replacement
        
        # Page table: virtual page โ†’ physical frame
        self.page_table = {}
        
        # Physical memory
        self.frames = [None] * num_frames
        self.frame_age = {}  # For LRU
        
        # Statistics
        self.page_faults = 0
        self.page_hits = 0
        self.access_count = 0
        
    def translate(self, virtual_address):
        """Translate virtual address to physical address."""
        page_num = virtual_address // self.page_size
        offset = virtual_address % self.page_size
        
        self.access_count += 1
        
        if page_num in self.page_table:
            # Page hit!
            self.page_hits += 1
            frame = self.page_table[page_num]
            self.frame_age[frame] = self.access_count  # Update access time
            physical_addr = frame * self.page_size + offset
            return physical_addr, 'HIT'
        else:
            # Page fault!
            self.page_faults += 1
            frame = self._load_page(page_num)
            physical_addr = frame * self.page_size + offset
            return physical_addr, 'FAULT'
    
    def _load_page(self, page_num):
        """Load page into physical memory."""
        # Find free frame or evict
        if None in self.frames:
            frame = self.frames.index(None)
        else:
            frame = self._evict_page()
        
        # Load page
        self.frames[frame] = page_num
        self.page_table[page_num] = frame
        self.frame_age[frame] = self.access_count
        
        return frame
    
    def _evict_page(self):
        """Evict page based on replacement policy."""
        if self.replacement == 'LRU':
            # Remove least recently used
            lru_frame = min(self.frame_age, key=self.frame_age.get)
        elif self.replacement == 'FIFO':
            # Remove oldest (first added)
            lru_frame = 0  # Simplified: always evict first
        else:
            lru_frame = 0
        
        # Remove from page table
        old_page = self.frames[lru_frame]
        if old_page in self.page_table:
            del self.page_table[old_page]
        
        return lru_frame
    
    def stats(self):
        """Return performance statistics."""
        total = self.page_hits + self.page_faults
        hit_rate = self.page_hits / total if total > 0 else 0
        return {
            'hits': self.page_hits,
            'faults': self.page_faults,
            'hit_rate': hit_rate,
            'fault_rate': 1 - hit_rate
        }

# Simulate memory access pattern
print("๐Ÿ’พ Virtual Memory Simulation\n")

vm = VirtualMemorySimulator(num_frames=4, page_size=1000, replacement='LRU')

# Access pattern (address โ†’ page number)
# 0-999 โ†’ page 0, 1000-1999 โ†’ page 1, etc.
access_pattern = [
    100,    # Page 0 (fault)
    1200,   # Page 1 (fault)
    250,    # Page 0 (hit)
    2500,   # Page 2 (fault)
    3100,   # Page 3 (fault)
    4000,   # Page 4 (fault, eviction!)
    1250,   # Page 1 (hit)
    2750,   # Page 2 (hit)
    100,    # Page 0 (fault, was evicted)
]

print(f"{'Virtual Addr':<15} {'Page':<8} {'Physical Addr':<15} {'Result'}")
print("โ”€" * 55)

for vaddr in access_pattern:
    paddr, result = vm.translate(vaddr)
    page = vaddr // vm.page_size
    status = "๐ŸŸข HIT" if result == 'HIT' else "๐Ÿ”ด FAULT"
    print(f"{vaddr:<15} {page:<8} {paddr:<15} {status}")

print("\n" + "โ”€" * 55)
stats = vm.stats()
print(f"Page Hits: {stats['hits']}")
print(f"Page Faults: {stats['faults']}")
print(f"Hit Rate: {stats['hit_rate']*100:.1f}%")
print(f"Fault Rate: {stats['fault_rate']*100:.1f}%")

Part 7: Modern Hardware and Parallelism#

Multi-Core vs GPU#

Feature

CPU (Multi-core)

GPU

Cores

4-64

1000s-10000s

Clock Speed

3-5 GHz

1-2 GHz

Use Case

General purpose, complex control flow

Data parallel, simple operations

Memory

Large cache, low latency

High bandwidth, high latency

Best For

Databases, web servers, compilers

Deep learning, graphics, crypto

SIMD (Single Instruction, Multiple Data)#

Idea: Apply same operation to multiple data elements simultaneously

Example: Vector addition

Scalar: for i in range(n): c[i] = a[i] + b[i]  # n operations
SIMD:   c = a + b                              # 1 operation (hardware vectorized)

Applications:

  • Image/video processing

  • Scientific computing

  • Machine learning

  • Cryptography

from multiprocessing import Pool, cpu_count
import numpy as np

def parallel_computation_demo():
    """Compare serial vs parallel execution."""
    
    def compute_intensive_task(n):
        """Simulate CPU-intensive task."""
        result = 0
        for i in range(n):
            result += i ** 2
        return result
    
    tasks = [1000000] * 8  # 8 tasks
    
    # Serial execution
    start = time.time()
    serial_results = [compute_intensive_task(t) for t in tasks]
    serial_time = time.time() - start
    
    # Parallel execution
    start = time.time()
    with Pool(processes=cpu_count()) as pool:
        parallel_results = pool.map(compute_intensive_task, tasks)
    parallel_time = time.time() - start
    
    print("โšก Parallel Computation Comparison\n")
    print(f"Number of tasks: {len(tasks)}")
    print(f"CPU cores: {cpu_count()}\n")
    print(f"Serial time: {serial_time:.3f}s")
    print(f"Parallel time: {parallel_time:.3f}s")
    print(f"Speedup: {serial_time/parallel_time:.2f}x")
    print(f"Efficiency: {(serial_time/parallel_time)/cpu_count()*100:.1f}%")
    
    # Amdahl's Law prediction
    print(f"\n๐Ÿ“Š Amdahl's Law Analysis:")
    print(f"  Assuming 100% parallelizable (S=0):")
    print(f"  Theoretical max speedup: {cpu_count()}x")
    print(f"  Actual speedup: {serial_time/parallel_time:.2f}x")
    print(f"  Overhead: ~{(1 - parallel_time*cpu_count()/serial_time)*100:.1f}%")

parallel_computation_demo()

Part 8: Exercises#

Exercise 1: Process Scheduler (โญโญ)#

Implement CPU scheduling algorithms:

  1. Round Robin: Each process gets fixed time slice

  2. Shortest Job First: Execute shortest process first

  3. Priority Scheduling: Execute highest priority first

  4. Compare average waiting time and turnaround time

  5. Visualize Gantt chart

# Exercise 1: Your code here
# Hint: Create Process class with arrival_time, burst_time, priority
# Hint: Implement scheduler that maintains ready queue

Exercise 2: Deadlock Detection (โญโญโญ)#

Implement deadlock detection:

  1. Create resource allocation graph

  2. Implement cycle detection algorithm

  3. Simulate Bankerโ€™s Algorithm for deadlock avoidance

  4. Test with various resource request patterns

# Exercise 2: Your code here
# Hint: Use graph representation (adjacency list)
# Hint: DFS for cycle detection

Exercise 3: Memory Allocator (โญโญโญ)#

Implement dynamic memory allocation:

  1. First-fit: Allocate first block large enough

  2. Best-fit: Allocate smallest block large enough

  3. Worst-fit: Allocate largest available block

  4. Handle fragmentation (external and internal)

  5. Compare allocation success rate and fragmentation

# Exercise 3: Your code here
# Hint: Maintain list of free blocks
# Hint: Merge adjacent free blocks

Exercise 4: NP-Completeness Proof (โญโญโญโญ)#

Prove a problem is NP-Complete:

  1. Choose problem (e.g., Vertex Cover, Subset Sum)

  2. Show itโ€™s in NP (create verifier)

  3. Reduce from known NP-Complete problem (e.g., 3-SAT)

  4. Implement reduction algorithm

  5. Test with examples

# Exercise 4: Your code here
# Hint: Start with verifier (polynomial time check)
# Hint: For reduction, show if we can solve B, we can solve A

Exercise 5: Build a Virtual Machine (โญโญโญโญ)#

Create a simple stack-based VM:

  1. Design bytecode instruction set (PUSH, POP, ADD, etc.)

  2. Implement stack operations

  3. Support function calls (CALL, RET)

  4. Add conditional jumps (JZ, JNZ)

  5. Write โ€œcompilerโ€ to generate bytecode from simple expressions

# Exercise 5: Your code here
# Hint: Maintain operand stack and call stack
# Hint: VM state: {pc, stack, memory, callstack}

Self-Check Quiz#

Test your understanding:

  1. What is the fetch-decode-execute cycle?

    • A) How CPU processes instructions

    • B) How memory is accessed

    • C) How cache works

    • D) How I/O happens

  2. Which cache replacement policy is most common?

    • A) FIFO

    • B) LRU

    • C) Random

    • D) LFU

  3. What is the difference between concurrency and parallelism?

    • A) No difference

    • B) Concurrency is interleaved, parallelism is simultaneous

    • C) Parallelism is slower

    • D) Concurrency requires multiple CPUs

  4. What does P = NP mean?

    • A) All problems are easy

    • B) Problems with fast verification also have fast solutions

    • C) Computers are perfect

    • D) No hard problems exist

  5. What is NP-Complete?

    • A) Easiest problems

    • B) Hardest problems in NP

    • C) Impossible problems

    • D) Polynomial problems

  6. What can a finite state machine recognize?

    • A) Any language

    • B) Regular languages only

    • C) Context-free languages

    • D) Natural languages

  7. What is a page fault?

    • A) Hardware error

    • B) Accessing page not in physical memory

    • C) Software bug

    • D) Network error

  8. What is Amdahlโ€™s Law about?

    • A) Cache performance

    • B) Parallel speedup limits

    • C) Memory bandwidth

    • D) Disk throughput

  9. Which synchronization primitive prevents race conditions?

    • A) Barrier

    • B) Mutex/Lock

    • C) Semaphore

    • D) All of the above

  10. What is SIMD?

    • A) Sequential processing

    • B) Same operation on multiple data

    • C) Multiple operations on same data

    • D) Random processing

Answers: 1-A, 2-B, 3-B, 4-B, 5-B, 6-B, 7-B, 8-B, 9-D, 10-B

Key Takeaways#

Computer Architecture#

  • โœ… CPU executes fetch-decode-execute cycle

  • โœ… Memory hierarchy: Registers โ†’ Cache โ†’ RAM โ†’ Disk

  • โœ… Cache exploits temporal and spatial locality

  • โœ… LRU is most common cache replacement policy

Concurrency#

  • โœ… Concurrency โ‰  Parallelism (interleaved vs simultaneous)

  • โœ… Race conditions require synchronization (locks, semaphores)

  • โœ… Amdahlโ€™s Law limits parallel speedup

  • โœ… Always test concurrent code thoroughly

Complexity Theory#

  • โœ… P = problems solvable in polynomial time

  • โœ… NP = problems verifiable in polynomial time

  • โœ… NP-Complete = hardest problems in NP

  • โœ… If P = NP, crypto breaks (major unsolved problem)

Automata Theory#

  • โœ… FSMs recognize regular languages

  • โœ… Regular expressions โ‰ก Finite automata

  • โœ… Pushdown automata โ‰ก Context-free grammars

  • โœ… Turing machines โ‰ก General computation

Virtual Memory#

  • โœ… Provides isolation and protection

  • โœ… Page faults load pages from disk

  • โœ… TLB caches address translations

  • โœ… Page replacement crucial for performance

Pro Tips#

  1. Profile before optimizing: Measure where time is spent

  2. Cache-friendly code: Access data sequentially when possible

  3. Avoid false sharing: Pad shared data to cache line boundaries

  4. Lock granularity: Fine-grained locks = more parallelism, more overhead

  5. Use lock-free data structures: When possible (hard to get right!)

  6. P vs NP awareness: Donโ€™t try exact solutions for NP-Hard problems at scale

  7. Approximation algorithms: Often good enough in practice

  8. Vectorize code: Use NumPy/SIMD for data parallel operations

  9. Understand your hardware: CPU cores, cache sizes, memory bandwidth

  10. Test concurrency: Race conditions are non-deterministic bugs

Common Mistakes#

  • โŒ Forgetting to synchronize shared data

  • โŒ Holding locks too long (reduces parallelism)

  • โŒ Ignoring cache effects in tight loops

  • โŒ Trying exact solutions for large NP-Complete problems

  • โŒ Not considering false sharing in parallel code

  • โŒ Over-parallelizing (overhead exceeds benefit)

  • โŒ Assuming atomic operations when theyโ€™re not

  • โŒ Creating too many threads (context switch overhead)

Debug Checklist#

  • โš ๏ธ Race condition โ†’ Add synchronization, test with ThreadSanitizer

  • โš ๏ธ Deadlock โ†’ Check lock ordering, use timeout

  • โš ๏ธ Poor cache performance โ†’ Profile cache misses, improve locality

  • โš ๏ธ Slow parallel code โ†’ Check for serial bottlenecks (Amdahlโ€™s Law)

  • โš ๏ธ Page faults โ†’ Increase memory, improve access patterns

Whatโ€™s Next?#

Continue in Hard Track:#

  • Lesson 7: Project Ideas (apply all knowledge!)

  • Lesson 8: Classic Problems (interview preparation)

  • Lesson 9: CTF Challenges (security skills)

Deepen Your Systems Knowledge:#

  • MIT 6.004: Computation Structures

  • CMU 15-213: Computer Systems (CSAPP book)

  • Operating Systems: Three Easy Pieces (free online)

  • Coursera: Parallel Programming

Practice Projects:#

  1. Build a shell (bash clone)

  2. Implement malloc/free

  3. Write a thread pool

  4. Create a key-value store

  5. Build a simple OS kernel

Resources:#

  • Books: CSAPP, Modern Operating Systems (Tanenbaum)

  • Online: OSDev wiki, Linux kernel source

  • Tools: perf, valgrind, gdb, ThreadSanitizer


Congratulations! You now understand how computers work at the systems level and can reason about complexity! ๐Ÿ–ฅ๏ธ