Grep (Global regular expresion Print)



Download 17.21 Kb.
Date28.04.2018
Size17.21 Kb.
#41831

grep (Global REgular expresion Print)

  • Operation
    • Search a group of files
    • Find all lines that contain a particular regular expression pattern
    • Write the result to an output file
    • grep returns to the prompt with no extra output when it is done
  • Syntax: grep [-cilLnrsvwx] pattern [list of files]
  • Examples
    • find information about the user, harley >grep harley /etc/passwd
    • Find all lines in the files containing the string xxx .
    • >grep xxx .

grep Flags

  • -c count the number of matches
  • -i Ignore case when searching for matches
  • -l List the file names containing matches
  • -L list files that do not have a match
  • -n Write the line number in front of each line
  • -r perform a recursive directory search
  • -s suppress warning and error messages
  • -v search for lines without the matching pattern
  • -w search only for complete words
  • -x lines that exactly match the pattern

Regular Expressions

  • Industry standard way to specify patterns
    • In Java: string.match("pattern");
    • In Java: string.replaceAll("pattern", string)
  • Meta-characters/operators (some need to be escaped)
    • ^ beginning of line, $ end of a line
    • * match 0 or more of the previous group
    • + match 1 or more of the previous group
    • ? match 0 or one of the previous group
    • {n} match n of the previous group
    • {m,n} match m to n of the previous group
    • {n,} match n or more of the previous group
    • | match either the group before or the groups after
    • . match any character except for new line
    • \ literally interpret the following meta-character or operator
  • Note: Many UNIX programs use these (vi, sed, more, grep, awk)

Regular Expression Examples

  • Regular Expression
  • String
  • Match
  • [a-z](12){3}[c-e]{3}
  • a121212cde
  • Yes
  • a.*e+
  • abc12cde
  • Yes
  • a.*f
  • abc12cde
  • No
  • ^a.*e$
  • abc12cde
  • Yes
  • ^b*e$
  • abc12cde
  • No
  • ^a*e$
  • abc12cde
  • No
  • \^.*\$
  • ^ab12cd$
  • Yes
  • ^.*$
  • ^ab12cd$
  • Yes
  • ^*$
  • ^ab12cd$
  • No
  • Note: To use ( ) { } or + grep use the –E (extended) switch or precede with \

More grep Examples

  • Contents of a file called homework
  • Math: problems 12-10 to 12-33, due Monday
  • BasketWeaving: make a 6-inch basket, DONE
  • Psychology: essay on Animal Existentialism, due end of term
  • Surfing:catch at least 10
  • grep commands
  • >grep –v DONE homework displays all but line 2
  • >grep –c DONE homework displays 1
  • >grep –wi ".*a.*" on homework displays all lines
  • >grep –w "m.*e" homework displays line 2
  • >grep –i "d.*e" homework displays lines 1, 2 and 3
  • >grep '\(Ma\|DO\).*' homework displays lines 1 and 2
  • Note: the last example escapes the parentheses and the vertical bar

Sorting Data

  • Background
    • Each line in a file is a record
    • Each line is a series of fields separated by spaces and/or tabs
  • Commands
    • >sort fileName sorts fileName on the 1st field of each line
    • >sort -k 6 fileName sorts on the 6th field of each line
    • >sort –n –k 5 fileName sort on the 5th field numerically
    • >sort –t sort –k4r –k3 abc fileName sort descending on the 4th field, and then ascending on the 3rd with ':' as a delimeter
    • >sort –t ':' fileName sort using ':' as a separator character
    • >sort –u –k2r fileName sort reverse on the 2nd field and remove duplicates (output must be unique)
    • >sort –k 3,4 in a pipe sorts by the key, from field 3 through field 4
    • >sort –k5n –k8 sorts numeric by the 5th field and alphabetic by the 8th

SED (Stream Editor

  • SED is a filter
    • Input from stdin or a file
    • Output to stdout or a file
    • Modifies the input to produce the output
    • Non-interactive
  • Processing
    • Read from an input stream
    • Perform line oriented commands
    • Write to an output stream
  • Syntax: >sed [-i] command | [-e command] … [file]

Search and Replace

  • Search, change and redirect to newFile >sed ‘s/cat/dog/g' file > newFile
  • Search, change, and edit file >sed –i ‘s/cat/dog/g' file
  • Specific range of lines: >sed '5,10s/cat/dog/g' file
  • Lines apply search to lines containing OK: >sed '/OK/s/cat/dog/g' names
  • Lines apply to lines having 2 numeric characters >sed '/[0-9]\{2\}/s/cat/dog/g' names
  • Delete range of lines: >sed '5,10d' file
  • Note: single quotes suppress the shell's interpretation of special characters
  • Note: This syntax works in vi, more, awk

Complex Commands

  • sed –i \
  • -e 's/mon/Monday/g' \
  • -e 's/tue/Tuesday/g' \
  • -e 's/wed/Wednesday/g' \
  • -e 's/thu/Thursday/g' \
  • -e 's/fri/Friday/g' \
  • -e 's/sat/Saturday/g' \
  • -e 's/sun/Sunday/g' \
  • calendar
  • The backslash is a continuation character
  • The –e specifies another command (extension)
  • The '/g/ means change every occurrence on each line, not just the first

AWK

  • AWK (Aho, Weinberger, Kernigham)
  • Special purpose programming language
    • Interpretive
    • Useful for UNIX Scripts
  • Purposes
    • Filter text files based on supplied patterns
    • Produce reports
    • Callable from "vi"
    • Create simple databases
    • Simple mathematical operations
    • Creating scripts
  • Not good for large complicated tasks
  • Other interpretive languages: perl, php

General Syntax

  • The single quote causes the shell to ignore special characters
  • The various clauses are optional
  • Much of the syntax for clauses is c and Java compatible
  • The patterns utilize regular expressions
  • BEGIN {}

  • {}

  • {}

  • {}
  • END {}
  • >awk ''

AWK General Operation

  • Each file consists of a series of records
  • Each record is a series of fields
  • Defaults
    • Record separator: new line character
    • Field separator: white space characters
  • Flow of Operation
    • Read the input file line by line
    • If it matches the line, then process
    • Otherwise skip

Some AWK Simple Examples

  • Print fields of records in a file >awk ' {print $5, $6, $7, $8} ' fileName
  • Print lines with a search string >awk '/gold/ {print}' fileName
  • Print the number of records >awk 'END {print NR, "records"}' fileName
  • Print records using a condition >awk '{if ($3 < 1980) print $3}' fileName or >awk ‘$2 > max {println $2}’ fileName
  • Comparing field to regular expression >awk ‘$2 ~ /[0-9]+/ {print $2}’ fileName
  • Using variables >awk '/gold/{sum += $2} END {print "value = " sum}‘ \
  • fileName

A Longer AWK command

  • awk –F ';' \
  • 'BEGIN \
  • {num_gold=0; wt_gold=0; } \
  • \
  • /[Gg]old/ { num_gold++; wt_gold += $2; } \
  • \
  • END \
  • { printf("\n Gold Pieces: %2d %5.2f\n“, \
  • num_gold, wt_gold); \
  • }' \
  • goldFile
  • Gold
  • 3.5
  • Silver
  • 2.25
  • Bronze
  • 5.31
  • Gold
  • 23.22
  • gold
  • 0.22
  • goldFile
  • Output
  • Gold Pieces: 3 26.94
  • Note: The backslashes are continuation lines
  • Semi colons delimit the fields in the file

Execute Program in a file

  • # awk program summarizing a coin collection
  • BEGIN {num_gold=0; wt_gold=0; }
  • /gold/ {num_gold++; wt_gold += $2};
  • END
  • { val_gold = 485 * wt_gold; printf("\n Gold Pieces: %2d", num_gold);
  • printf("\n Gold Weight: %5.2f", wt_gold);
  • printf("\n Gold Value: %7.2f\n", val_gold);
  • }
  • awk –F ';' –f
  • Output
  • Gold Pieces: 3
  • Gold Weight: 26.94
  • Gold Value: 13065.90

Invoking AWK

  • >awk [-F] [
    ] [-f
    ] [] [- | ]
  • is a field separator (default: space, tab)

  • an AWK program

  • a file containing an AWK program
  • a series of variables to initialize >awk –f program f1=file2 f2=file1 > output
  • - means accept AWK input from STDIN
  • a file containing data to process
  • Note: AWK is often invoked repeatedly in shell scripts

Search Patterns

  • An exact string: /The/
  • A string starting a line: /^The/
  • A string ending a line: /The$/
  • A String ignoring case of first letter: /[Tt]he
  • Decimal: /[0-9]*.[0-9]*/
  • Alphanumeric: /[a-zA-Z0-9]*/
  • Choice between two strings: /(da|De).*/
  • Numeric: /[+-]?[0-9]+/
  • Any Boolean expression: $4>90 or $4>$5

Built in Variables

  • NR: Total number of records
  • NF: Total number of fields
  • FILENAME: The current input file
  • FS: Field separator character
  • RS: Record separator character
  • OFS: Output field separator character
  • ORS: Output record separator character
  • OFMT: The default printf output format

Arrays and control structures

  • Indexed and associative arrays
    • By index: months[3] = "March";
    • Associative: debts["Kim"] = 1000;
    • Note: arrays index from one, not zero
  • Counter Controled: for (i=1, i<100; i++) data[i] = i;
  • Iterator: for (i in myArray) print i, names[i];
  • Pre test: i=0; while (i<20) data[i] = i++;
  • Condition: if (i==1) print debts["Kim"] else print debts["Joe"]; print (i==1)? debts["Kim"] : debts["Joe"];
  • Unconditional control statements
    • break: jump out of a loop
    • continue: next iteration
    • next: get next line of input
    • exit: exit the AWK program

Built-in functions

  • Square root: print sqrt(3.6)
  • Integer portion: print int(3.2)
  • Substring: print substr("abcde", 3,2);
  • Split: letters = split("a;b;c;d;e", ";");
  • Position: print index("gorbachev", "bach"); Note: if a substring doesn't exist, 0 returned Note: Strings index from one, not zero

printf





The database is protected by copyright ©sckool.org 2022
send message

    Main page