Command Line Tools

Command Line Tools#

The Power of Unix Tools#

Real-world analogy: Think of command-line tools as a Swiss Army knife for working with files and data. Each tool does one thing extremely well, and when combined together, they become incredibly powerful.

Modern developers spend significant time working with text files: code, logs, configuration files, data exports. Mastering command-line tools makes you 10x more efficient at these tasks.

What you’ll learn:

Essential text processing tools (grep, sed, awk)
File finding and manipulation
Data transformation and analysis
Pipeline composition
Real-world automation workflows

Time investment: 60-90 minutes
Difficulty: Intermediate

1. Finding Files: `find`#

The find command searches for files and directories based on various criteria.

Basic Syntax#

find [path] [options] [expression]

Common Use Cases#

# Create test files for examples
!mkdir -p test_files/subdir
!touch test_files/file1.txt test_files/file2.py test_files/file3.js
!touch test_files/subdir/nested.txt test_files/README.md

# Find all files in current directory and subdirectories
!find test_files -type f

# Find files by name pattern
!find test_files -name "*.txt"

# Case-insensitive name search
!find test_files -iname "readme*"

# Find files modified in last 7 days
!find test_files -type f -mtime -7

# Find files larger than 1MB
!find test_files -type f -size +1M

# Find and execute command on each file
!find test_files -name "*.txt" -exec wc -l {} \;

🎯 Pro Tips for `find`#

# Delete all .pyc files (Python cache)
find . -name "*.pyc" -delete

# Find empty directories
find . -type d -empty

# Find files with specific permissions
find . -type f -perm 644

# Find and count files by extension
find . -type f -name "*.py" | wc -l

2. Searching Text: `grep`#

grep (Global Regular Expression Print) searches for patterns in text files.

Basic Usage#

# Create sample file
sample_text = """Python is awesome
python is easy to learn
Java is verbose
JavaScript is everywhere
Error: File not found
Warning: Deprecated function
INFO: Server started on port 8080
"""

with open('test_files/sample.log', 'w') as f:
    f.write(sample_text)

# Basic search
!grep "Python" test_files/sample.log

# Case-insensitive search
!grep -i "python" test_files/sample.log

# Show line numbers
!grep -n "is" test_files/sample.log

# Count matches
!grep -c "is" test_files/sample.log

# Invert match (lines NOT containing pattern)
!grep -v "Python" test_files/sample.log

# Search recursively in directories
!grep -r "Python" test_files/

# Show context (lines before and after)
!grep -A 1 -B 1 "Error" test_files/sample.log  # 1 line after, 1 before

Regular Expressions with grep#

# Match lines starting with specific text
!grep "^Error" test_files/sample.log

# Match lines ending with specific text
!grep "8080$" test_files/sample.log

# Match either pattern (OR)
!grep -E "Error|Warning" test_files/sample.log

🎯 Real-World grep Examples#

# Find all TODO comments in Python files
grep -rn "TODO" --include="*.py" .

# Find IP addresses in logs
grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" access.log

# Find error messages (case insensitive)
grep -i "error\|exception\|failed" app.log

# Find functions in Python code
grep -n "^def " *.py

3. Stream Editing: `sed`#

sed (Stream EDitor) performs text transformations on input streams.

Basic Substitution#

# Create test file
!echo -e "Hello World\nHello Universe\nHello Everyone" > test_files/greetings.txt

# Replace first occurrence on each line
!sed 's/Hello/Hi/' test_files/greetings.txt

# Replace ALL occurrences (global flag)
!echo "foo bar foo baz foo" | sed 's/foo/FOO/g'

# Case-insensitive replacement
!echo "Hello hello HELLO" | sed 's/hello/hi/gI'

# Delete lines matching pattern
!sed '/World/d' test_files/greetings.txt

# Delete empty lines
!echo -e "line1\n\nline2\n\nline3" | sed '/^$/d'

# Print only lines matching pattern
!sed -n '/Universe/p' test_files/greetings.txt

Advanced sed Usage#

# In-place editing (modify file)
!cp test_files/greetings.txt test_files/greetings_backup.txt
!sed -i 's/Hello/Greetings/' test_files/greetings_backup.txt
!cat test_files/greetings_backup.txt

# Multiple commands
!echo "foo bar" | sed 's/foo/FOO/; s/bar/BAR/'

# Line number ranges
!sed -n '1,2p' test_files/greetings.txt  # Print lines 1-2

🎯 Real-World sed Examples#

# Remove trailing whitespace
sed 's/[[:space:]]*$//' file.txt

# Add line numbers
sed = file.txt | sed 'N;s/\n/\t/'

# Comment out lines containing pattern
sed '/DEBUG/s/^/# /' config.py

# Replace date format: 2024-01-15 → 01/15/2024
sed 's/\([0-9]\{4\}\)-\([0-9]\{2\}\)-\([0-9]\{2\}\)/\2\/\3\/\1/'

4. Pattern Scanning: `awk`#

awk is a powerful text processing language, perfect for structured data.

Basic awk Structure#

awk 'pattern { action }' file

# Create sample data
data = """Alice 25 Engineer
Bob 30 Designer
Charlie 35 Manager
Diana 28 Developer
"""

with open('test_files/employees.txt', 'w') as f:
    f.write(data)

# Print entire file
!awk '{print}' test_files/employees.txt

# Print specific columns (fields)
!awk '{print $1, $3}' test_files/employees.txt  # Name and job title

# Print with custom separator
!awk '{print $1 " is a " $3}' test_files/employees.txt

# Pattern matching
!awk '/Developer/ {print}' test_files/employees.txt

# Conditional logic
!awk '$2 > 28 {print $1, "is over 28"}' test_files/employees.txt

Built-in Variables#

# NR = Record Number (line number)
!awk '{print NR, $0}' test_files/employees.txt

# NF = Number of Fields
!awk '{print $1, "has", NF, "fields"}' test_files/employees.txt

# BEGIN and END blocks
!awk 'BEGIN {print "Employee Report"} {print $1} END {print "Total:", NR}' test_files/employees.txt

Advanced awk#

# Create CSV data
csv_data = """name,age,salary
Alice,25,75000
Bob,30,85000
Charlie,35,95000
"""

with open('test_files/salaries.csv', 'w') as f:
    f.write(csv_data)

# Process CSV (custom field separator)
!awk -F',' 'NR>1 {print $1, "earns", $3}' test_files/salaries.csv

# Calculate sum
!awk -F',' 'NR>1 {sum+=$3} END {print "Total salary:", sum}' test_files/salaries.csv

# Calculate average
!awk -F',' 'NR>1 {sum+=$3; count++} END {print "Average:", sum/count}' test_files/salaries.csv

🎯 Real-World awk Examples#

# Print unique values from column
awk '{print $3}' employees.txt | sort | uniq

# Format output as table
awk '{printf "%-10s %-5s %-15s\n", $1, $2, $3}' employees.txt

# Parse Apache access logs
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -10

# Extract JSON-like values
awk -F'"' '{print $4}' data.txt

5. Text Manipulation: `cut`, `paste`, `sort`, `uniq`#

`cut` - Extract Columns#

# Extract specific characters
!echo "Hello World" | cut -c 1-5

# Extract fields (default delimiter: tab)
!echo -e "A\tB\tC\tD" | cut -f 1,3

# Custom delimiter
!echo "Alice,25,Engineer" | cut -d',' -f1,3

`sort` - Sort Lines#

# Create unsorted data
!echo -e "banana\napple\ncherry\napricot" > test_files/fruits.txt

# Alphabetical sort
!sort test_files/fruits.txt

# Reverse sort
!sort -r test_files/fruits.txt

# Numeric sort
!echo -e "10\n2\n100\n20" | sort -n

# Sort by specific column
!sort -k2 -n test_files/employees.txt  # Sort by age (column 2)

`uniq` - Remove Duplicates#

# Create file with duplicates
!echo -e "apple\napple\nbanana\napple\ncherry\nbanana" > test_files/duplicates.txt

# Remove adjacent duplicates (must sort first!)
!sort test_files/duplicates.txt | uniq

# Count occurrences
!sort test_files/duplicates.txt | uniq -c

# Show only duplicates
!sort test_files/duplicates.txt | uniq -d

6. Pipelines: Combining Tools#

The true power comes from combining tools with pipes (|).

# Create log file
log_data = """2024-01-15 ERROR Database connection failed
2024-01-15 INFO Server started
2024-01-15 ERROR File not found: config.json
2024-01-15 WARN Deprecated function used
2024-01-15 ERROR Connection timeout
2024-01-15 INFO Request processed
"""

with open('test_files/app.log', 'w') as f:
    f.write(log_data)

# Find and count errors
!grep "ERROR" test_files/app.log | wc -l

# Extract unique error messages
!grep "ERROR" test_files/app.log | cut -d' ' -f4- | sort | uniq

# Complex pipeline: Find most common log levels
!cat test_files/app.log | awk '{print $3}' | sort | uniq -c | sort -rn

Real-World Pipeline Examples#

# Analyze which Python files have the most lines
!find . -name "*.py" -type f -exec wc -l {} \; 2>/dev/null | sort -rn | head -5

🎯 Pro Pipeline Patterns#

# Find largest files in directory
du -ah . | sort -rh | head -20

# Find most frequently used commands in bash history
history | awk '{print $2}' | sort | uniq -c | sort -rn | head -10

# Count TODO comments per file
find . -name "*.py" -exec grep -c "TODO" {} \; -print | paste - -

# Extract email addresses from files
grep -roh "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" . | sort | uniq

7. Data Processing: `jq` and Modern Tools#

`jq` - JSON Processor#

jq is like sed for JSON data.

# Create JSON test data
import json

data = {
    "users": [
        {"name": "Alice", "age": 25, "role": "Engineer"},
        {"name": "Bob", "age": 30, "role": "Designer"},
        {"name": "Charlie", "age": 35, "role": "Manager"}
    ],
    "company": "Tech Corp"
}

with open('test_files/data.json', 'w') as f:
    json.dump(data, f, indent=2)

!cat test_files/data.json

# Pretty print JSON
!jq '.' test_files/data.json 2>/dev/null || echo "jq not installed (optional tool)"

# Extract specific field
!jq '.company' test_files/data.json 2>/dev/null || echo "jq not installed"

# Extract array elements
!jq '.users[0].name' test_files/data.json 2>/dev/null || echo "jq not installed"

# Map over array
!jq '.users[] | .name' test_files/data.json 2>/dev/null || echo "jq not installed"

Python Alternative for JSON#

# Using Python for JSON processing (always available)
import json

with open('test_files/data.json') as f:
    data = json.load(f)

# Extract names
names = [user['name'] for user in data['users']]
print("Names:", names)

# Filter users over 25
senior = [u for u in data['users'] if u['age'] > 25]
print("\nSenior users:", senior)

8. Text Statistics: `wc`, `head`, `tail`#

`wc` - Word Count#

# Count lines, words, characters
!wc test_files/sample.log

# Count only lines
!wc -l test_files/sample.log

# Count only words
!wc -w test_files/sample.log

`head` and `tail` - First/Last Lines#

# First 3 lines
!head -n 3 test_files/sample.log

# Last 3 lines
!tail -n 3 test_files/sample.log

# Follow file (live updates) - useful for logs
# tail -f /var/log/syslog  # Would continuously show new lines

# Skip first N lines
!tail -n +2 test_files/salaries.csv  # Skip header

9. Process Substitution and Advanced Techniques#

Compare Outputs#

# Create two files
!echo -e "apple\nbanana\ncherry" > test_files/list1.txt
!echo -e "banana\ncherry\ndate" > test_files/list2.txt

# Show lines unique to first file
!comm -23 <(sort test_files/list1.txt) <(sort test_files/list2.txt)

`xargs` - Build Command Lines#

# Find and process files
!find test_files -name "*.txt" | head -3 | xargs wc -l

# Parallel processing
!echo -e "test1\ntest2\ntest3" | xargs -I {} echo "Processing: {}"

10. Modern Alternatives#

Fast Modern Tools#

Classic	Modern Alternative	Why Better
`grep`	`ripgrep (rg)`	10-100x faster, respects .gitignore
`find`	`fd`	Simpler syntax, faster
`cat`	`bat`	Syntax highlighting, line numbers
`ls`	`exa`	Better formatting, git aware
`du`	`dust`	Visual tree, faster

Example: `ripgrep`#

# ripgrep syntax (if installed)
# Much faster than grep, respects .gitignore automatically

# rg "TODO"                    # Search current directory
# rg "TODO" -t py             # Only Python files
# rg "TODO" -g '!tests'       # Exclude tests directory
# rg "function" -A 3          # Show 3 lines after match

print("Install ripgrep: cargo install ripgrep")
print("Or: apt install ripgrep (Ubuntu)")
print("Or: brew install ripgrep (macOS)")

🎯 Hands-On Exercises#

Exercise 1: Log Analysis#

You have a web server log file. Extract useful statistics:

Count total requests
Find most common status codes
List top 5 most accessed URLs

# Create sample access log
access_log = """192.168.1.1 - - [15/Jan/2024:10:15:30] "GET /home 200
192.168.1.2 - - [15/Jan/2024:10:15:31] "GET /about 200
192.168.1.1 - - [15/Jan/2024:10:15:32] "GET /home 200
192.168.1.3 - - [15/Jan/2024:10:15:33] "GET /contact 404
192.168.1.2 - - [15/Jan/2024:10:15:34] "POST /api/login 200
192.168.1.4 - - [15/Jan/2024:10:15:35] "GET /admin 403
192.168.1.1 - - [15/Jan/2024:10:15:36] "GET /home 200
"""

with open('test_files/access.log', 'w') as f:
    f.write(access_log)

# TODO: Your solution here
# Count total requests:

# Most common status codes:

# Top URLs:

Exercise 2: CSV Processing#

Process a sales CSV file to find:

Total sales amount
Average sale price
Top 3 products by quantity

# Create sales data
sales_csv = """product,quantity,price
Laptop,5,1200
Mouse,15,25
Keyboard,10,75
Monitor,8,300
Laptop,3,1200
Mouse,20,25
"""

with open('test_files/sales.csv', 'w') as f:
    f.write(sales_csv)

# TODO: Your solution here
# Total sales:

# Average price:

# Top products:

Exercise 3: Code Analysis#

Analyze Python source files:

Count total lines of code (excluding comments and blanks)
Find all function definitions
List files with TODO comments

# Create sample Python file
py_code = '''# Calculate factorial
def factorial(n):
    """Calculate factorial recursively"""
    # TODO: Add input validation
    if n <= 1:
        return 1
    return n * factorial(n-1)

def fibonacci(n):
    """Calculate nth Fibonacci number"""
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)
'''

with open('test_files/math_utils.py', 'w') as f:
    f.write(py_code)

# TODO: Your solution here
# Count LOC:

# Find functions:

# Find TODOs:

Exercise 4: Data Transformation#

Transform a space-separated file to CSV format with headers.

# Input data (space-separated)
!echo -e "Alice 25 Engineer\nBob 30 Designer\nCharlie 35 Manager" > test_files/input.txt

# TODO: Convert to CSV with headers: name,age,role
# Your solution here:

Exercise 5: Pipeline Challenge#

Create a one-liner pipeline to:

Find all .txt files
Count words in each
Show only files with more than 10 words
Sort by word count (descending)

# TODO: Your one-liner here

📝 Self-Check Quiz#

Test your understanding:

What’s the difference between grep -E and regular grep?
- Extended regex support vs basic regex
Why must you sort before uniq?
- uniq only removes adjacent duplicates
What does awk '{print $NF}' do?
- Prints the last field of each line
How to replace text in a file in-place with sed?
- sed -i 's/old/new/g' file.txt
What’s the difference between > and >>?
- > overwrites, >> appends
How to search case-insensitively with grep?
- grep -i pattern file
What does 2>&1 do?
- Redirects stderr to stdout
How to process CSV with awk?
- awk -F',' '{print $1}' file.csv
What’s a pipeline?
- Chaining commands with | to pass output as input
How to follow a log file in real-time?
- tail -f logfile

🎓 Key Takeaways#

Unix Philosophy: Do one thing well, compose tools
find: Search for files by name, type, size, time
grep: Search text patterns, supports regex
sed: Stream editor for text transformation
awk: Pattern scanning and processing language
Pipelines: Chain tools with | for powerful workflows
Modern tools: ripgrep, fd, bat are faster alternatives
Practice: Master these tools through daily use

Remember: These tools are 40+ years old and still dominant because they’re incredibly powerful and composable!

🚀 Next Steps#

Practice daily: Use these tools in your workflow
Read man pages: man grep, man awk for deep dives
Install modern tools: Try ripgrep, fd, bat
Next lesson: 03_git_essentials.ipynb - Version control mastery

📚 Additional Resources#

The Art of Command Line
Awk Tutorial
Sed Tutorial
Ripgrep User Guide
explainshell.com - Explains shell commands

# Cleanup test files
!rm -rf test_files/
print("✅ Test files cleaned up!")
print("\n🎉 Congratulations on completing Command Line Tools!")
print("You now have powerful text processing skills.")