next up previous contents
Next: Job Control: Background and Up: The UNIX C Shell Previous: Using Pipes

UNIX Filters and searching

grep

The name grep is short for ''generalized regular expression parser.'' The command grep is a UNIX filter that allows searches for regular expressions and fixed strings within ascii documents.

Regular expressions are patterns or templates that are defined by a combination of ascii strings and metacharacters. The metacharacters, characters that represent something other than their literal meaning, allow you to specify search tasks such as ''Find all strings that start at the beginning of a line that contain a character sequence with three g's in it''.

There is actually a family of grep commands, each command designed for a different task:

  1. grep Searches for limited regular expressions.
  2. egrep (Extended grep) Searches for full regular expressions.
  3. fgrep (Fixed string grep) Searches for fixed strings.

Which grep command you need to use depends on the complexity of the search task. fgrep is the command to use for strings that contain no wildcards, or other metacharacters, just a single text pattern that must be matched exactly. grep is the command to use for general purpose searching that requires the use of wildcards and other metacharacters, specifying string position, string size, character class, or closure. egrep is the extended version of grep. It can handle expressions just like the grep command, but it allows more variability in the search by allowing the search pattern to be ''string1 or string2, followed by string3 or string4''.

The general syntax for grep commands that search for regular expressions is:

grep <expression> <filename> [<filename> ...]

The fgrep command is similar, except that it searches only for fixed strings:

fgrep <string(s)> <filename> [<filename> ...]

Grep commands can be restricted to a single filename, or can be told to search a series of files, either by listing them in order, or by using wildcard characters.

For specific details on the specific metacharacters used and options available for the grep commands, see the online manual page for grep.

sort

A filter for sorting alpha-numeric text fields is appropriately called sort. sort accepts input from stdin by default, so it can be used in a chain of commands, or it can accept input from a file:

cat <file1> <file2> | sort | more

or the equivalent:

sort <file1> <file2> | more

By default, the sort is done according to the character or numeric value in the leftmost column of a field. Fields are separated by tab or space characters by default, but any other field separators can be used by defining them on the command line.

Useful command-line options for sort are as follows:

 
   		 -b 		 Ignore leading space characters in the starting

and ending positions of a field.

-d Dictionary order. Only letters, digits, space, and tab

are significant in the sort.

-f Treat upper and lower case characters as equivalent.

-n Numeric sort. Sort by arithmetic value.

-r Reverse the order of the sort.

-t<c> Use the character <c> as the field delimiter.

+sp.o sp is the starting position for the sort. +0 is

leftmost field. .o is the optional character offset

into a field which indicates where the sort should begin.

-ep.o ep specifies the field number before which the

sort is ended. .o is optional; it specifies that the

sort will end at the character just prior to the .o

offset into the ep field.

EXAMPLES:
sort -d +0 -1 file1 | more

The sort will start at the first field, and end before the second.

sort -d +0.1 -0.2 file1 | more

The sort starts at the first character of the first field, and ends after that same character.


next up previous contents
Next: Job Control: Background and Up: The UNIX C Shell Previous: Using Pipes

Larry Latour
Fri Sep 12 08:12:59 EDT 1997