Command Line Tools#

The Power of Unix Tools#

Real-world analogy: Think of command-line tools as a Swiss Army knife for working with files and data. Each tool does one thing extremely well, and when combined together, they become incredibly powerful.

Modern developers spend significant time working with text files: code, logs, configuration files, data exports. Mastering command-line tools makes you 10x more efficient at these tasks.

What youโ€™ll learn:

  • Essential text processing tools (grep, sed, awk)

  • File finding and manipulation

  • Data transformation and analysis

  • Pipeline composition

  • Real-world automation workflows

Time investment: 60-90 minutes
Difficulty: Intermediate


1. Finding Files: find#

The find command searches for files and directories based on various criteria.

Basic Syntax#

find [path] [options] [expression]

Common Use Cases#

# Create test files for examples
!mkdir -p test_files/subdir
!touch test_files/file1.txt test_files/file2.py test_files/file3.js
!touch test_files/subdir/nested.txt test_files/README.md

# Find all files in current directory and subdirectories
!find test_files -type f
# Find files by name pattern
!find test_files -name "*.txt"

# Case-insensitive name search
!find test_files -iname "readme*"
# Find files modified in last 7 days
!find test_files -type f -mtime -7

# Find files larger than 1MB
!find test_files -type f -size +1M

# Find and execute command on each file
!find test_files -name "*.txt" -exec wc -l {} \;

๐ŸŽฏ Pro Tips for find#

# Delete all .pyc files (Python cache)
find . -name "*.pyc" -delete

# Find empty directories
find . -type d -empty

# Find files with specific permissions
find . -type f -perm 644

# Find and count files by extension
find . -type f -name "*.py" | wc -l

2. Searching Text: grep#

grep (Global Regular Expression Print) searches for patterns in text files.

Basic Usage#

# Create sample file
sample_text = """Python is awesome
python is easy to learn
Java is verbose
JavaScript is everywhere
Error: File not found
Warning: Deprecated function
INFO: Server started on port 8080
"""

with open('test_files/sample.log', 'w') as f:
    f.write(sample_text)

# Basic search
!grep "Python" test_files/sample.log
# Case-insensitive search
!grep -i "python" test_files/sample.log

# Show line numbers
!grep -n "is" test_files/sample.log

# Count matches
!grep -c "is" test_files/sample.log
# Invert match (lines NOT containing pattern)
!grep -v "Python" test_files/sample.log

# Search recursively in directories
!grep -r "Python" test_files/

# Show context (lines before and after)
!grep -A 1 -B 1 "Error" test_files/sample.log  # 1 line after, 1 before

Regular Expressions with grep#

# Match lines starting with specific text
!grep "^Error" test_files/sample.log

# Match lines ending with specific text
!grep "8080$" test_files/sample.log

# Match either pattern (OR)
!grep -E "Error|Warning" test_files/sample.log

๐ŸŽฏ Real-World grep Examples#

# Find all TODO comments in Python files
grep -rn "TODO" --include="*.py" .

# Find IP addresses in logs
grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" access.log

# Find error messages (case insensitive)
grep -i "error\|exception\|failed" app.log

# Find functions in Python code
grep -n "^def " *.py

3. Stream Editing: sed#

sed (Stream EDitor) performs text transformations on input streams.

Basic Substitution#

# Create test file
!echo -e "Hello World\nHello Universe\nHello Everyone" > test_files/greetings.txt

# Replace first occurrence on each line
!sed 's/Hello/Hi/' test_files/greetings.txt
# Replace ALL occurrences (global flag)
!echo "foo bar foo baz foo" | sed 's/foo/FOO/g'

# Case-insensitive replacement
!echo "Hello hello HELLO" | sed 's/hello/hi/gI'
# Delete lines matching pattern
!sed '/World/d' test_files/greetings.txt

# Delete empty lines
!echo -e "line1\n\nline2\n\nline3" | sed '/^$/d'

# Print only lines matching pattern
!sed -n '/Universe/p' test_files/greetings.txt

Advanced sed Usage#

# In-place editing (modify file)
!cp test_files/greetings.txt test_files/greetings_backup.txt
!sed -i 's/Hello/Greetings/' test_files/greetings_backup.txt
!cat test_files/greetings_backup.txt
# Multiple commands
!echo "foo bar" | sed 's/foo/FOO/; s/bar/BAR/'

# Line number ranges
!sed -n '1,2p' test_files/greetings.txt  # Print lines 1-2

๐ŸŽฏ Real-World sed Examples#

# Remove trailing whitespace
sed 's/[[:space:]]*$//' file.txt

# Add line numbers
sed = file.txt | sed 'N;s/\n/\t/'

# Comment out lines containing pattern
sed '/DEBUG/s/^/# /' config.py

# Replace date format: 2024-01-15 โ†’ 01/15/2024
sed 's/\([0-9]\{4\}\)-\([0-9]\{2\}\)-\([0-9]\{2\}\)/\2\/\3\/\1/'

4. Pattern Scanning: awk#

awk is a powerful text processing language, perfect for structured data.

Basic awk Structure#

awk 'pattern { action }' file
# Create sample data
data = """Alice 25 Engineer
Bob 30 Designer
Charlie 35 Manager
Diana 28 Developer
"""

with open('test_files/employees.txt', 'w') as f:
    f.write(data)

# Print entire file
!awk '{print}' test_files/employees.txt
# Print specific columns (fields)
!awk '{print $1, $3}' test_files/employees.txt  # Name and job title

# Print with custom separator
!awk '{print $1 " is a " $3}' test_files/employees.txt
# Pattern matching
!awk '/Developer/ {print}' test_files/employees.txt

# Conditional logic
!awk '$2 > 28 {print $1, "is over 28"}' test_files/employees.txt

Built-in Variables#

# NR = Record Number (line number)
!awk '{print NR, $0}' test_files/employees.txt

# NF = Number of Fields
!awk '{print $1, "has", NF, "fields"}' test_files/employees.txt
# BEGIN and END blocks
!awk 'BEGIN {print "Employee Report"} {print $1} END {print "Total:", NR}' test_files/employees.txt

Advanced awk#

# Create CSV data
csv_data = """name,age,salary
Alice,25,75000
Bob,30,85000
Charlie,35,95000
"""

with open('test_files/salaries.csv', 'w') as f:
    f.write(csv_data)

# Process CSV (custom field separator)
!awk -F',' 'NR>1 {print $1, "earns", $3}' test_files/salaries.csv
# Calculate sum
!awk -F',' 'NR>1 {sum+=$3} END {print "Total salary:", sum}' test_files/salaries.csv

# Calculate average
!awk -F',' 'NR>1 {sum+=$3; count++} END {print "Average:", sum/count}' test_files/salaries.csv

๐ŸŽฏ Real-World awk Examples#

# Print unique values from column
awk '{print $3}' employees.txt | sort | uniq

# Format output as table
awk '{printf "%-10s %-5s %-15s\n", $1, $2, $3}' employees.txt

# Parse Apache access logs
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -10

# Extract JSON-like values
awk -F'"' '{print $4}' data.txt

5. Text Manipulation: cut, paste, sort, uniq#

cut - Extract Columns#

# Extract specific characters
!echo "Hello World" | cut -c 1-5

# Extract fields (default delimiter: tab)
!echo -e "A\tB\tC\tD" | cut -f 1,3
# Custom delimiter
!echo "Alice,25,Engineer" | cut -d',' -f1,3

sort - Sort Lines#

# Create unsorted data
!echo -e "banana\napple\ncherry\napricot" > test_files/fruits.txt

# Alphabetical sort
!sort test_files/fruits.txt

# Reverse sort
!sort -r test_files/fruits.txt
# Numeric sort
!echo -e "10\n2\n100\n20" | sort -n

# Sort by specific column
!sort -k2 -n test_files/employees.txt  # Sort by age (column 2)

uniq - Remove Duplicates#

# Create file with duplicates
!echo -e "apple\napple\nbanana\napple\ncherry\nbanana" > test_files/duplicates.txt

# Remove adjacent duplicates (must sort first!)
!sort test_files/duplicates.txt | uniq
# Count occurrences
!sort test_files/duplicates.txt | uniq -c

# Show only duplicates
!sort test_files/duplicates.txt | uniq -d

6. Pipelines: Combining Tools#

The true power comes from combining tools with pipes (|).

# Create log file
log_data = """2024-01-15 ERROR Database connection failed
2024-01-15 INFO Server started
2024-01-15 ERROR File not found: config.json
2024-01-15 WARN Deprecated function used
2024-01-15 ERROR Connection timeout
2024-01-15 INFO Request processed
"""

with open('test_files/app.log', 'w') as f:
    f.write(log_data)

# Find and count errors
!grep "ERROR" test_files/app.log | wc -l
# Extract unique error messages
!grep "ERROR" test_files/app.log | cut -d' ' -f4- | sort | uniq
# Complex pipeline: Find most common log levels
!cat test_files/app.log | awk '{print $3}' | sort | uniq -c | sort -rn

Real-World Pipeline Examples#

# Analyze which Python files have the most lines
!find . -name "*.py" -type f -exec wc -l {} \; 2>/dev/null | sort -rn | head -5

๐ŸŽฏ Pro Pipeline Patterns#

# Find largest files in directory
du -ah . | sort -rh | head -20

# Find most frequently used commands in bash history
history | awk '{print $2}' | sort | uniq -c | sort -rn | head -10

# Count TODO comments per file
find . -name "*.py" -exec grep -c "TODO" {} \; -print | paste - -

# Extract email addresses from files
grep -roh "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" . | sort | uniq

7. Data Processing: jq and Modern Tools#

jq - JSON Processor#

jq is like sed for JSON data.

# Create JSON test data
import json

data = {
    "users": [
        {"name": "Alice", "age": 25, "role": "Engineer"},
        {"name": "Bob", "age": 30, "role": "Designer"},
        {"name": "Charlie", "age": 35, "role": "Manager"}
    ],
    "company": "Tech Corp"
}

with open('test_files/data.json', 'w') as f:
    json.dump(data, f, indent=2)

!cat test_files/data.json
# Pretty print JSON
!jq '.' test_files/data.json 2>/dev/null || echo "jq not installed (optional tool)"
# Extract specific field
!jq '.company' test_files/data.json 2>/dev/null || echo "jq not installed"

# Extract array elements
!jq '.users[0].name' test_files/data.json 2>/dev/null || echo "jq not installed"

# Map over array
!jq '.users[] | .name' test_files/data.json 2>/dev/null || echo "jq not installed"

Python Alternative for JSON#

# Using Python for JSON processing (always available)
import json

with open('test_files/data.json') as f:
    data = json.load(f)

# Extract names
names = [user['name'] for user in data['users']]
print("Names:", names)

# Filter users over 25
senior = [u for u in data['users'] if u['age'] > 25]
print("\nSenior users:", senior)

8. Text Statistics: wc, head, tail#

wc - Word Count#

# Count lines, words, characters
!wc test_files/sample.log

# Count only lines
!wc -l test_files/sample.log

# Count only words
!wc -w test_files/sample.log

head and tail - First/Last Lines#

# First 3 lines
!head -n 3 test_files/sample.log

# Last 3 lines
!tail -n 3 test_files/sample.log
# Follow file (live updates) - useful for logs
# tail -f /var/log/syslog  # Would continuously show new lines

# Skip first N lines
!tail -n +2 test_files/salaries.csv  # Skip header

9. Process Substitution and Advanced Techniques#

Compare Outputs#

# Create two files
!echo -e "apple\nbanana\ncherry" > test_files/list1.txt
!echo -e "banana\ncherry\ndate" > test_files/list2.txt

# Show lines unique to first file
!comm -23 <(sort test_files/list1.txt) <(sort test_files/list2.txt)

xargs - Build Command Lines#

# Find and process files
!find test_files -name "*.txt" | head -3 | xargs wc -l
# Parallel processing
!echo -e "test1\ntest2\ntest3" | xargs -I {} echo "Processing: {}"

10. Modern Alternatives#

Fast Modern Tools#

Classic

Modern Alternative

Why Better

grep

ripgrep (rg)

10-100x faster, respects .gitignore

find

fd

Simpler syntax, faster

cat

bat

Syntax highlighting, line numbers

ls

exa

Better formatting, git aware

du

dust

Visual tree, faster

Example: ripgrep#

# ripgrep syntax (if installed)
# Much faster than grep, respects .gitignore automatically

# rg "TODO"                    # Search current directory
# rg "TODO" -t py             # Only Python files
# rg "TODO" -g '!tests'       # Exclude tests directory
# rg "function" -A 3          # Show 3 lines after match

print("Install ripgrep: cargo install ripgrep")
print("Or: apt install ripgrep (Ubuntu)")
print("Or: brew install ripgrep (macOS)")

๐ŸŽฏ Hands-On Exercises#

Exercise 1: Log Analysis#

You have a web server log file. Extract useful statistics:

  • Count total requests

  • Find most common status codes

  • List top 5 most accessed URLs

# Create sample access log
access_log = """192.168.1.1 - - [15/Jan/2024:10:15:30] "GET /home 200
192.168.1.2 - - [15/Jan/2024:10:15:31] "GET /about 200
192.168.1.1 - - [15/Jan/2024:10:15:32] "GET /home 200
192.168.1.3 - - [15/Jan/2024:10:15:33] "GET /contact 404
192.168.1.2 - - [15/Jan/2024:10:15:34] "POST /api/login 200
192.168.1.4 - - [15/Jan/2024:10:15:35] "GET /admin 403
192.168.1.1 - - [15/Jan/2024:10:15:36] "GET /home 200
"""

with open('test_files/access.log', 'w') as f:
    f.write(access_log)

# TODO: Your solution here
# Count total requests:

# Most common status codes:

# Top URLs:

Exercise 2: CSV Processing#

Process a sales CSV file to find:

  • Total sales amount

  • Average sale price

  • Top 3 products by quantity

# Create sales data
sales_csv = """product,quantity,price
Laptop,5,1200
Mouse,15,25
Keyboard,10,75
Monitor,8,300
Laptop,3,1200
Mouse,20,25
"""

with open('test_files/sales.csv', 'w') as f:
    f.write(sales_csv)

# TODO: Your solution here
# Total sales:

# Average price:

# Top products:

Exercise 3: Code Analysis#

Analyze Python source files:

  • Count total lines of code (excluding comments and blanks)

  • Find all function definitions

  • List files with TODO comments

# Create sample Python file
py_code = '''# Calculate factorial
def factorial(n):
    """Calculate factorial recursively"""
    # TODO: Add input validation
    if n <= 1:
        return 1
    return n * factorial(n-1)

def fibonacci(n):
    """Calculate nth Fibonacci number"""
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)
'''

with open('test_files/math_utils.py', 'w') as f:
    f.write(py_code)

# TODO: Your solution here
# Count LOC:

# Find functions:

# Find TODOs:

Exercise 4: Data Transformation#

Transform a space-separated file to CSV format with headers.

# Input data (space-separated)
!echo -e "Alice 25 Engineer\nBob 30 Designer\nCharlie 35 Manager" > test_files/input.txt

# TODO: Convert to CSV with headers: name,age,role
# Your solution here:

Exercise 5: Pipeline Challenge#

Create a one-liner pipeline to:

  1. Find all .txt files

  2. Count words in each

  3. Show only files with more than 10 words

  4. Sort by word count (descending)

# TODO: Your one-liner here

๐Ÿ“ Self-Check Quiz#

Test your understanding:

  1. Whatโ€™s the difference between grep -E and regular grep?

    • Extended regex support vs basic regex

  2. Why must you sort before uniq?

    • uniq only removes adjacent duplicates

  3. What does awk '{print $NF}' do?

    • Prints the last field of each line

  4. How to replace text in a file in-place with sed?

    • sed -i 's/old/new/g' file.txt

  5. Whatโ€™s the difference between > and >>?

    • > overwrites, >> appends

  6. How to search case-insensitively with grep?

    • grep -i pattern file

  7. What does 2>&1 do?

    • Redirects stderr to stdout

  8. How to process CSV with awk?

    • awk -F',' '{print $1}' file.csv

  9. Whatโ€™s a pipeline?

    • Chaining commands with | to pass output as input

  10. How to follow a log file in real-time?

    • tail -f logfile


๐ŸŽ“ Key Takeaways#

  1. Unix Philosophy: Do one thing well, compose tools

  2. find: Search for files by name, type, size, time

  3. grep: Search text patterns, supports regex

  4. sed: Stream editor for text transformation

  5. awk: Pattern scanning and processing language

  6. Pipelines: Chain tools with | for powerful workflows

  7. Modern tools: ripgrep, fd, bat are faster alternatives

  8. Practice: Master these tools through daily use

Remember: These tools are 40+ years old and still dominant because theyโ€™re incredibly powerful and composable!


๐Ÿš€ Next Steps#

  1. Practice daily: Use these tools in your workflow

  2. Read man pages: man grep, man awk for deep dives

  3. Install modern tools: Try ripgrep, fd, bat

  4. Next lesson: 03_git_essentials.ipynb - Version control mastery


๐Ÿ“š Additional Resources#

# Cleanup test files
!rm -rf test_files/
print("โœ… Test files cleaned up!")
print("\n๐ŸŽ‰ Congratulations on completing Command Line Tools!")
print("You now have powerful text processing skills.")