This note summarizes common command-line text tools: grep, wc, cut, sort, uniq, tr, diff, join, xargs, and awk.
A shell pipeline passes stdout from one command into stdin of the next command:
command1 | command2 | command3Good pipelines usually do one of these:
- filter rows
- select fields
- normalize text
- sort records
- count or summarize
- format a small report
grep "ERROR" application.log
grep -n "ERROR" application.log
grep -i "timeout" application.log
grep -E "ERROR|WARN" application.logUse:
-nfor line numbers.-ifor case-insensitive matching.-Efor extended regular expressions.-C Nfor context around matches.
wc -l application.log
wc -w notes.md
wc -c payload.binUse input redirection when you only want the number:
wc -l < application.logFor delimited text:
cut -d ',' -f 1,3 data.csvFor fixed-width text:
cut -c 1-10 records.txtcut is fast and simple, but it does not understand quoted CSV. Use a CSV-aware tool when quoting and escaping matter.
sort names.txt
sort -n numbers.txt
sort -t ',' -k2,2 data.csv
sort -u names.txtuniq only compares adjacent lines, so sort first when global deduplication is needed:
sort names.txt | uniq
sort names.txt | uniq -c | sort -nrUppercase:
tr '[:lower:]' '[:upper:]' < input.txtDelete punctuation:
tr -d '[:punct:]' < input.txtSqueeze repeated spaces:
tr -s ' ' < input.txttr works at the character level. It does not understand fields, columns, or structured records.
diff -u old.txt new.txt
diff -ru old-dir new-dirUse unified diffs for reviewable output.
join merges records by a shared key. Inputs must be sorted on the join field.
Example inputs:
1 alice
2 bob
1 admin
2 analyst
Command:
join users.txt roles.txtOutput:
1 alice admin
2 bob analyst
If files are not sorted:
sort -k1,1 users.txt > users.sorted
sort -k1,1 roles.txt > roles.sorted
join users.sorted roles.sortedRun a command for each input item:
printf '%s\n' *.log | xargs -n 1 wc -lHandle spaces safely with null delimiters:
find logs -type f -name '*.log' -print0 | xargs -0 grep -n "ERROR"Parallelize independent work:
find data -type f -print0 | xargs -0 -n 1 -P 4 gzipUse parallelism only when tasks are independent and system load is acceptable.
Print selected fields:
awk '{print $1, $3}' records.txtFilter rows:
awk '$3 > 100 {print $0}' metrics.txtSum a field:
awk '{sum += $2} END {print sum}' sizes.txtUse a delimiter:
awk -F ',' '{print $1, $3}' data.csvTop repeated errors:
grep "ERROR" application.log | sort | uniq -c | sort -nr | headLargest direct children:
du -h --max-depth=1 /path/to/target | sort -hrExtract a column, remove header, count values:
cut -d ',' -f 3 data.csv | tail -n +2 | sort | uniq -c | sort -nrMove a sort key to the front, sort, then format:
awk '{print $3, $1, $2}' records.txt | sort | awk '{print $2, $3, $1}'The transferable lesson is that the shape of each line controls what the next command can easily do.