Files are the heart of UNIX. Unlike most other operating systems, UNIX was designed with a simple, yet highly sophisticated, view of files: Everything is a file. Information stored in an area of a disk or memory is a file; a directory is a file; the keyboard is a file; the screen is a file. This single-minded view makes it easy to write tools that manipulate files, because files have no structure—UNIX sees every file merely as a simple stream of bytes. This makes life much simpler for both the UNIX programmer and the UNIX user. The user benefits from being able to send the contents of a file to a command without having to go through a complex process of opening the file. In a similar way, the user can capture the output of a command in a file without having previously created that file. And perhaps most importantly, the user can send the output of one command directly to the input of another, using memory as a temporary storage device or file. Finally, users benefit from UNIX's unstructured files because they are simply easier to use than files that must conform to one of several highly structured formats.
In a similar way, sending a binary file to the screen can lock the keyboard, put the screen in a mode that changes the displayed character set to one that is clearly not English, dump core, and so on.
While it's true that many files already stored on the system—and certainly every file you create with a text editor (see Chapter 7)—are text files, many are not. UNIX provides a command, file, that attempts to determine the nature of the contents of files when you supply their file names as arguments. You can invoke the file command in one of two ways:
file [-h] [-m mfile] [-f ffile] arg(s) file [-h] [-m mfile] -f ffileThe file command performs a series of tests on each file in the list of arg(s) or on the list of files whose names are contained in the file ffile. If the file being tested is a text file, file examines the first 512 bytes and tries to determine the language in which it is written. The identification is worded by means of the contents of a file called /etc/magic. If you don't like what's in the file, you can use the -m mfile option, replacing mfile with the name of the "magic file" you'd like to use. (Consult your local magician for suitable spells and potions!) Here are the kinds of text files that Unixware Version 1.0's file command can identify:
Don't be concerned if you're not familiar with some of these kinds of text. Many of them are peculiar to UNIX and are explained in later chapters. If the file is not text, file looks near the beginning of the file for a magic number—a number or string that is associated with a file type; an arbitrary value that is couple with a descriptive phrase. Then file uses /etc/magic, which provides a database of magic numbers and kinds of files, or the file specified as mfile to determine the file's contents. If the file being tested is a symbolic link, file follows the link and tries to determine the nature of the contents of the file to which it is linked. The -h option causes file to ignore symbolic links.
The /etc/magic file contains the table of magic numbers and their meanings. For example, here is an excerpt from Unixware Version 1.0's /etc/magic file. The number following uxcore: is the magic number, and the phrase that follows is the file type. The other columns tell file how and where to look for the magic number:
>16 short 2 uxcore:231 executable 0 string uxcore:648 expanded ASCII cpio archive 0 string uxcore:650 ASCII cpio archive >1 byte 0235 uxcore:571 compressed data 0 string uxcore:248 current ar archive 0 short 0432 uxcore:256 Compiled Terminfo Entry 0 short 0434 uxcore:257 Curses screen image 0 short 0570 uxcore:259 vax executable 0 short 0510 uxcore:263 x86 executable 0 short 0560 uxcore:267 WE32000 executable 0 string 070701 uxcore:565 DOS executable (EXE) 0 string 070707 uxcore:566 DOS built-in 0 byte 0xe9 uxcore:567 DOS executable (COM) 0 short 0520 uxcore:277 mc68k executable 0 string uxcore:569 core file (Xenix) 0 byte 0x80 uxcore:280 8086 relocatable (Microsoft)
Both more and page have several commands, many of which take a numerical argument that controls the number of times the command is actually executed. You can issue these commands while using the more or page program (see the syntax below), and none of these commands are echoed to the screen. Table 6.1 lists the major commands.
more [-cdflrsuw] [-lines] [+linenumber] [+/pattern] [file(s)] page [-cdflrsuw] [-lines] [+linenumber] [+/pattern] [file(s)]
You can invoke more(page) with certain options that specify the program's behavior. For example, these programs can display explicit error messages instead of just beeping. Table 6.2 lists the most commonly used options for more and page.
$pg [options] file
The pg program has several startup options that modify its behavior. Table 6.4 describes the most frequently used options.
grep stands for global/regular expression/print; that is, search through an entire file (do a global search) for a specified regular expression (the pattern that you specified) and display the line or lines that contain the pattern.
Before you can use grep and the other members of the grep family, you must explore regular expressions, which are what gives the grep commands (and many other UNIX commands) their power. After that, you will learn all of the details of the grep family of commands.
You can also combine two strings into a pattern. For example, to combine a search for Unix and UNIX, you can specify a word that begins with U, followed by n or N, followed by i or I, and ending with x or X.
Several UNIX commands use regular expressions to find text in files. Usually you supply a regular expression to a command to tell that command what to search for. Most regular expressions match more than one text string.
There are two kinds of regular expressions: limited and full (sometimes called extended). Limited regular expressions are a subset of full regular expressions, but UNIX commands are inconsistent in the extended operations that they permit. At the end of this discussion, you'll find a table that lists the most common commands in UNIX System V Release 4 that use regular expressions, along with the operations that they can perform.
The simplest form of a regular expression includes only ordinary characters, and is called a string. The grep family (grep, egrep, and fgrep) matches a string wherever it finds the regular expression, even if it's surrounded by other characters. For example, the is a regular expression that matches only the three-letter sequence t, h, and e. This string is found in the words the, therefore, bother, and many others.
Two of the members of the grep family use regular expressions—the third, fgrep, operates only on strings:
grep regular_expression filenameWhen grep finds a match of regular_expression, it displays the line of the file that contains it and then continues searching for a subsequent match. Thus, grep displays every line of a file that contains a text string that matches the regular expression.
$ cat REfile A regular expression is a sequence of characters taken from the set of uppercase and lowercase letters, digits, punctuation marks, etc., plus a set of special regular expression operators. Some of these operators may remind you of file name matching, but be forewarned: in general, regular expression operators are different from the shell metacharacters we discussed in Chapter 1. The simplest form of a regular expression is one that includes only letters. For example, they would match only the three-letter sequence t, h, e. This pattern is found in the following words: the, therefore, bother. In other words, wherever the regular expression pattern is found " even if it is surrounded by other characters " it will be matched.
$ grep only REfile includes only letters. For example, the would match onlyThe sole occurrence of only satisfied grep's search, so grep printed the matching line.
Now let's look at each character in detail.
$ grep "w.r" REfile from the set of uppercase and lowercase letters, digits, you of file name matching, but be forewarned: in general, in the following words: the, therefore, bother. In other words, wherever the regular expression pattern is found
You can form a somewhat different one-character regular expression by enclosing a list of characters in a left and right pair of square brackets. The matching is limited to those characters listed between the brackets. For example, the pattern
[aei135XYZ]matches any one of the characters a, e, i, 1, 3, 5, X, Y, or Z. Consider the following example:
$ grep "w[fhmkz]" REfile words, wherever the regular expression pattern is foundThis time, the match was satisfied only by the wh in wherever, matching the pattern "w followed by either f, h, m, k, or z." If the first character in the list is a right square bracket (]), it does not terminate the list—that would make the list empty, which is not permitted. Instead, ] itself becomes one of the possible characters in the search pattern. For example, the pattern
[]a]matches either ] or a. If the first character in the list is a circumflex (also called a caret), the match occurs on any character that is not in the list:
$ grep "w[^fhmkz]" REfile from the set of uppercase and lowercase letters, digits, you of file name matching, but be forewarned: in general, shell metacharacters we discussed in Chapter 1. includes only letters. For example, the would match only in the following words: the, therefore, bother. In other words, wherever the regular expression pattern is found " even if it is surrounded by other characters " it willThe pattern "w followed by anything except f, h, m, k, or z" has many matches. On line 1, we in lowercase is a "w followed by anything except an f, an h, an m, a k, or a z." On line 2, wa in forewarned is a match, as is the word we on line 3. Line 4 contains wo in would, and line 5 contains wo in words. Line 6 has wo in words as its match. The other possible matches on line 6 are ignored because the match is satisfied at the beginning of the line. Finally, at the end of line 7, wi in will matches.
You can use a minus sign (-) inside the left and right pair of square brackets to indicate a range of letters or digits. For example, the pattern
[a-z]matches any lowercase letter.
$ grep "w[a-f]" REfile from the set of uppercase and lowercase letters, digits, you of file name matching, but be forewarned: in general, shell metacharacters we discussed in Chapter 1.The matches are we on line 1, wa on line 2, and we on line 3. Look at REfile again and note how many potential matches are omitted because the character following the w is not one of the group a through f.
Furthermore, you can include several ranges in one set of brackets. For example, the pattern
[a-zA-Z]matches any letter, lower- or uppercase.
X\{2,5\}matches at least two but not more than five Xs. That is, it matches XX, XXX, XXXX, or XXXXX. The minimum number of matches is written immediately after the escaped left curly brace, followed by a comma (,) and then the maximum value. If you omit the maximum value (but not the comma), as in
X\{2,\}you specify that the match should occur for at least two Xs. If you write just a single value, omitting the comma, you specify the exact number of matches, no more and no less. For example, the pattern
X\{4\}matches only XXXX. Here are some examples of this kind of regular expression:
$ grep "p\{2\}" REfile from the set of uppercase and lowercase letters, digits,This is the only line that contains "pp."
$ grep "p\{1\}" REfile A regular expression is a sequence of characters taken from the set of uppercase and lowercase letters, digits, punctuation marks, etc., plus a set of special regular expression operators. Some of these operators may remind regular expression operators are different from the shell metacharacters we discussed in Chapter 1. The simplest form of a regular expression is one that includes only letters. For example, the would match only the three-letter sequence t, h, e. This pattern is found words, wherever the regular expression pattern is foundNotice that on the second line, the first "p" in "uppercase" satisfies the search. The grep program doesn't even see the second "p" in the word because it stops searching as soon as it finds one "p."
X*matches zero or more Xs: nothing, X, XX, XXX, and so on. To ensure that you get at least one character in the match, use
XX*For example, the command
$ grep "p*" REfiledisplays the entire file, because every line can match "zero or more instances of the letter p." However, note the output of the following commands:
$ grep "pp*" REfile A regular expression is a sequence of characters taken from the set of uppercase and lowercase letters, digits, punctuation marks, etc., plus a set of special regular expression operators. Some of these operators may remind regular expression operators are different from the shell metacharacters we discussed in Chapter 1. The simplest form of a regular expression is one that includes only letters. For example, the would match only the three-letter sequence t, h, e. This pattern is found words, wherever the regular expression pattern is found $ grep "ppp*" REfile from the set of uppercase and lowercase letters, digits,The regular expression ppp* matches "pp followed by zero or more instances of the letter p," or, in other words, "two or more instances of the letter p."
The extended set of regular expressions includes two additional operators that are similar to the asterisk: the plus sign (+) and the question mark (?). The plus sign is used to match one or more occurrences of the preceding character, and the question mark is used to match zero or one occurrences. For example, the command
$ egrep "p?" REfileoutputs the entire file because every line contains zero or one p. However, note the output of the following command:
$ egrep "p+" REfile A regular expression is a sequence of characters taken from the set of uppercase and lowercase letters, digits, punctuation marks, etc., plus a set of special regular expression operators. Some of these operators may remind regular expression operators are different from the shell metacharacters we discussed in Chapter 1. The simplest form of a regular expression is one that includes only letters. For example, the would match only the three-letter sequence t, h, e. This pattern is found words, wherever the regular expression pattern is foundAnother possibility is [a-z]+. This pattern matches one or more occurrences of any lowercase letter.
^[Tt]hematches a line that begins with either The or the, but does not match a line that has a The or the at any other position on the line. Note, for example, the output of the following two commands:
$ grep "[Tt]he" REfile from the set of uppercase and lowercase letters, digits, expression operators. Some of these operators may remind regular expression operators are different from the The simplest form of a regular expression is one that includes only letters. For example, the would match only the three-letter sequence t, h, e. This pattern is found in the following words: the, therefore, bother. In other words, wherever the regular expression pattern is found " even if it is surrounded by other characters " it is $ grep "^[Tt]he" REfile The simplest form of a regular expression is one that the three-letter sequence t, h, e. This pattern is foundA dollar sign as the last character of the pattern anchors the regular expression to the end of the line, as in the following example:
$ grep "1\.$" REfile shell metacharacters we discussed in Chapter 1.This anchoring occurs because the line ends in a match of the regular expression. The period in the regular expression is preceded by a backslash, so the program knows that it's looking for a period and not just any character. Here's another example that uses REfile:
$ grep "[Tt]he$" REfile regular expression operators are different from theThe regular expression .* is an idiom that is used to match zero or more occurrences of any sequence of any characters. Any multicharacter regular expression always matches the longest string of characters that fits the regular expression description. Consequently, .* used as the entire regular expression always matches an entire line of text. Therefore, the command
$ grep "^.*$" REfileprints the entire file. Note that in this case the anchoring characters are redundant. When used as part of an "unanchored" regular expression, that idiomatic regular expression matches the longest string that fits the description, as in the following example:
$ grep "C.*1" REfile shell metacharacters we discussed in Chapter 1.The regular expression C.*1 matches the longest string that begins with a C and ends with a 1. Another expression, d.*d, matches the longest string that begins and ends with a d. On each line of output in the following example, the matched string is highlighted with italics:
$ grep "d.*d" REfile from the set of uppercase and lowercase letters, digits, shell metacharacters we discussed in Chapter 1. includes only letters. For example, the would match only words, wherever the regular expression pattern is found " even if it is surrounded by other characters " it isYou've seen that a regular expression command such as grep finds a match even if the regular expression is surrounded by other characters. For example, the pattern
[Tt]hematches the, The, there, There, other, oTher, and so on (even though the last word is unlikely to be used). Suppose that you're looking for the word The or the and do not want to match other, There, or there. In a few of the commands that use full regular expressions, you can surround the regular expression with escaped angle brackets (\<___\>). For example, the pattern
\<the\>represents the string the, where t follows a character that is not a letter, digit, or underscore, and where e is followed by a character that is not a letter, digit, or underscore. If you need not completely isolate letters, digits, and underscores, you can use the angle brackets singly. That is, the pattern \<the matches anything that begins with the, and ter\> matches anything that ends with ter. You can tell egrep (but not grep) to search for either of two regular expressions as follows:
$ egrep "regular expression-1 | regular expression-2" filename
[A-Z][a-z]* [0-9]\{1,2\}, [0-9]\{4\}You can improve this pattern so that it recognizes that May—the month with the shortest name—has three letters, and that September has nine:
[A-Z][a-z]\{3,9\} [0-9]\{1,2\}, [0-9]\{4\}
[0-9]\{3\}-[0-9]\{\2\}-[0-9]\{4\}
1-[0-9]\{3\}-[0-9]\{3\}-[0-9]\{4\}
$grep [options] RE [file(s)]where RE is a limited regular expression. Table 6.5 lists the regular expressions that grep recognizes. The grep command reads from the specified file on the command line or, if no files are specified, from standard input. Table 6.5 lists the command-line options that grep takes.
$ cat cron In SCO Xenix 2.3, or SCO UNIX, you can edit a crontab file to your heart's content, but it will not be re-read, and your changes will not take effect, until you come out of multi-user run level (thus killing cron), and then re-enter multi-user run level, when a new cron is started; or until you do a reboot. The proper way to install a new version of a crontab (for root, or for any other user) is to issue the command "crontab new.jobs", or "cat new.jobs | crontab", or if in "vi" with a new version of the commands, "w ! crontab". I find it easy to type "vi /tmp/tbl", then ":0 r !crontab -l" to read the existing crontab into the vi buffer, then edit, then type ":w !crontab", or "!crontab %" to replace the existing crontab with what I see on vi's screen. $ cat pax This is an announcement for the MS-DOS version of PAX version 2. See the README file and the man pages for more information on how to run PAX, TAR, and CPIO. For those of you who don't know, pax is a 3 in 1 program that gives the functionality of pax, tar, and cpio. It supports both the DOS filesystem and the raw "tape on a disk" system used by most micro UNIX systems. This will allow for easy transfer of files to and from UNIX systems. It also supports multiple volumes. Floppy density for raw UNIX type read/writes can be specified on the command line. The source will eventually be posted to one of the source groups. Be sure to use a blocking factor of 20 with pax-as-tar and B with pax-as-cpio for best performance.The following examples show how to find a string in a file:
$ grep "you" pax For those of you who don't know, pax is a 3 in 1 $ grep "you" cron In SCO Xenix 2.3, or SCO UNIX, you can edit a crontab file to your heart's content, but it will not be re-read, and your changes will not take effect, until you come out of multi-user run or until you do a reboot.Note that you appears in your in the second and third lines.
You can find the same string in two or more files by using a variety of options. In this first example, case is ignored:
$ grep -i "you" pax cron pax:For those of you who don't know, pax is a 3 in 1 cron:In SCO Xenix 2.3, or SCO UNIX, you can edit a cron:crontab file to your heart's content, but it will cron:not be re-read, and your changes will not take cron:effect, until you come out of multi-user run cron:or until you do a reboot.Notice that each line of output begins with the name of the file that contains a match. In the following example, the output includes the name of the file and the number of the line of that file on which the match is found:
$ grep -n "you" pax cron pax:6:For those of you who don't know, pax is a 3 in 1 cron:1:In SCO Xenix 2.3, or SCO UNIX, you can edit a cron:2:crontab file to your heart's content, but it will cron:3:not be re-read, and your changes will not take cron:4:effect, until you come out of multi-user run cron:7:or until you do a reboot.The following example shows how to inhibit printing the lines themselves:
$ grep -c "you" pax cron pax:1 cron:5The following output shows the matching lines without specifying the files from which they came:
$ grep -h "you" pax cron For those of you who don't know, pax is a 3 in 1 In SCO Xenix 2.3, or SCO UNIX, you can edit a crontab file to your heart's content, but it will not be re-read, and your changes will not take effect, until you come out of multi-user run or until you do a reboot.The following specifies output of "every line in pax and cron that does not have [Yy][Oo][Uu] in it":
$ grep -iv "you" pax cron pax:This is an announcement for the MS-DOS version of pax:PAX version 2. See the README file and the man pax:pages for more information on how to run PAX, pax:TAR, and CPIO. pax: pax:program that gives the functionality of pax, tar, pax:and cpio. It supports both the DOS filesystem pax:and the raw "tape on a disk" system used by most pax:micro UNIX systems. This will allow for easy pax:transfer of files to and from UNIX systems. It pax:also support multiple volumes. Floppy density pax:for raw UNIX type read/writes can be specified on pax:the command line. pax: pax:The source will eventually be posted to one of pax:the source groups. pax: pax:Be sure to use a blocking factor of 20 with pax:pax-as-tar and B with pax-as-cpio for best pax:performance. cron:level (thus killing cron), and then re-enter cron:multi-user run level, when a new cron is started; cron: cron:The proper way to install a new version of a cron:crontab (for root, or for any other user) is to cron:issue the command "crontab new.jobs", or "cat cron:new.jobs | crontab", or if in "vi" with a new cron:version of the commands, "w ! crontab". I find it cron:easy to type "vi /tmp/tbl", then ":0 r !crontab cron:-l" to read the existing crontab into the vi cron:buffer, then edit, then type ":w !crontab", or cron:"!crontab %" to replace the existing crontab with cron:what I see on vi's screen.Note that blank lines are considered to be lines that do not match the given regular expression.
The following example is quite interesting. It lists every line that has r.*t in it and of course it matches the longest possible string in each line. First, let's see exactly how the strings are matched. The matching strings in the listing are highlighted in italics so that you can see what grep actually matches:
$ grep "r.*t" pax cron pax:This is an announcement for the MS-DOS version of pax:PAX version 2. See the README file and the man pax:pages for more information on how to run PAX, pax:For those of you who don't know, pax is a 3 in 1 pax:program that gives the functionality of pax, tar, pax:and cpio. It supports both the DOS filesystem pax:and the raw "tape on a disk" system used by most pax:micro UNIX systems. This will allow for easy pax:transfer of files to and from UNIX systems. It pax:also support multiple volumes. Floppy density pax:for raw UNIX type read/writes can be specified on pax:The source will eventually be posted to one of pax:Be sure to use a blocking factor of 20 with pax:pax-as-tar and B with pax-as-cpio for best cron:In SCO Xenix 2.3, or SCO UNIX, you can edit a cron:crontab file to your heart's content, but it will cron:not be re-read, and your changes will not take cron:level (thus killing cron), and then re-enter cron:multi-user run level, when a new cron is started; cron:or until you do a reboot. cron:The proper way to install a new version of a cron:crontab (for root, or for any other user) is to cron:issue the command "crontab new.jobs", or "cat cron:new.jobs | crontab", or if in "vi" with a new cron:version of the commands, "w ! crontab". I find it cron:easy to type "vi /tmp/tbl", then ":0 r !crontab cron:-l" to read the existing crontab into the vi cron:buffer, then edit, then type ":w !crontab", or cron:"!crontab %" to replace the existing crontab withYou can obtain for free a version of grep that highlights the matched string, but the standard version of grep simply shows the line that contains the match.
If you are thinking that grep doesn't seem to do anything with the patterns that it matches, you are correct. But in Chapter 7, "Editing Text Files," you will see how the sed command does replacements.
Now let's look for two or more ls (two ls followed by zero or more ls):
$ grep "lll*" pax cron pax:micro UNIX systems. This will allow for easy pax:The source will eventually be posted to one of cron:crontab file to your heart's content, but it will cron:not be re-read, and your changes will not take cron:level (thus killing cron), and then re-enter cron:The proper way to install a new version of aThe following command finds lines that begin with The:
$ grep "^The" pax cron pax:The source will eventually be posted to one of cron:The proper way to install a new version of aThe next command finds lines that end with n:
$ grep "n$" pax cron pax:PAX version 2. See the README file and the man pax:for raw UNIX type read/writes can be specified on cron:effect, until you come out of multi-user runYou can easily use the grep command to search for two or more consecutive uppercase letters:
$ grep "[A-Z]\{2,\}" pax cron pax:This is an announcement for the MS-DOS version of pax:PAX version 2. See the README file and the man pax:pages for more information on how to run PAX, pax:TAR, and CPIO. pax:and cpio. It supports both the DOS filesystem pax:micro UNIX systems. This will allow for easy pax:transfer of files to and from UNIX systems. It pax:for raw UNIX type read/writes can be specified on cron:In SCO Xenix 2.3, or SCO UNIX, you can edit a
$egrep [options] RE [files]where RE is a regular expression. The egrep command uses the same regular expressions as the grep command, except for \( and \), and includes the following additional patterns:
$ egrep "[A-Z][A-Z]+" pax cron pax:This is an announcement for the MS-DOS version of pax:PAX version 2. See the README file and the man pax:pages for more information on how to run PAX, pax:TAR, and CPIO. pax:For those of you who don't know, PAX is a 3-in-1 pax:and cpio. It supports both the DOS filesystem pax:micro UNIX systems. This allows for easy pax:transfer of files to and from UNIX systems. It pax:for raw UNIX type read/writes can be specified onThe following command finds each line that contains either DOS or SCO:
$ egrep "DOS|SCO" pax cron pax:This is an announcement for the MS-DOS version of pax:and cpio. It supports both the DOS filesystem cron:In SCO Xenix 2.3, or SCO UNIX, you can edit aThe next example finds all lines that contain either new or now:
$ egrep "n(e|o)w" cron multi-user run level, when a new cron is started; The proper way to install a new version of a issue the command "crontab new.jobs", or "cat new.jobs | crontab", or if in "vi" with a new
fgrep [options] string [files]The options you use with the fgrep command are exactly the same as those that you use for egrep, with the addition of -x, which prints only the lines that are matched in their entirety. As an example of fgrep's -x option, consider the following file named sample:
$ cat sample this is a file for testing egrep's x option.Now, invoke fgrep with the -x option and a as the pattern.
$ fgrep -x a sample aThat matches the second line of the file, but
$ fgrep -x option sampleoutputs nothing, as option doesn't match a line in the file. However,
$ fgrep -x option. sample option.matches the entire last line.
The normal ordering for sort follows the ASCII code sequence.
The syntax for sort is
$sort [-cmu] [-ooutfile] [-ymemsize] [-zrecsize] [-dfiMnr] [-btchar] [+pos1 [-pos2]] [file(s)]Table 6.6 describes the options of sort.
$ cat auto ES Arther 85 Honda Prelude 49.412 BS Barker 90 Nissan 300ZX 48.209 AS Saint 88 BMW M-3 46.629 ES Straw 86 Honda Civic 49.543 DS Swazy 87 Honda CRX-Si 49.693 ES Downs 83 VW GTI 47.133 ES Smith 86 VW GTI 47.154 AS Neuman 84 Porsche 911 47.201 CS Miller 84 Mazda RX-7 47.291 CS Carlson 88 Pontiac Fiero 47.398 DS Kegler 84 Honda Civic 47.429 ES Sherman 83 VW GTI 48.489 DS Arbiter 86 Honda CRX-Si 48.628 DS Karle 74 Porsche 914 48.826 ES Shorn 87 VW GTI 49.357 CS Chunk 85 Toyota MR2 49.558 CS Cohen 91 Mazda Miata 50.046 DS Lisanti 73 Porsche 914 50.609 CS McGill 83 Porsche 944 50.642 AS Lisle 72 Porsche 911 51.030 ES Peerson 86 VW Golf 54.493If you invoke sort with no options, it sorts on the entire line:
$ sort auto AS Lisle 72 Porsche 911 51.030 AS Neuman 84 Porsche 911 47.201 AS Saint 88 BMW M-3 46.629 BS Barker 90 Nissan 300ZX 48.209 CS Carlson 88 Pontiac Fiero 47.398 CS Chunk 85 Toyota MR2 49.558 CS Cohen 91 Mazda Miata 50.046 CS McGill 83 Porsche 944 50.642 CS Miller 84 Mazda RX-7 47.291 DS Arbiter 86 Honda CRX-Si 48.628 DS Karle 74 Porsche 914 48.826 DS Kegler 84 Honda Civic 47.429 DS Lisanti 73 Porsche 914 50.609 DS Swazy 87 Honda CRX-Si 49.693 ES Arther 85 Honda Prelude 49.412 ES Downs 83 VW GTI 47.133 ES Peerson 86 VW Golf 54.493 ES Sherman 83 VW GTI 48.489 ES Shorn 87 VW GTI 49.357 ES Smith 86 VW GTI 47.154 ES Straw 86 Honda Civic 49.543To alphabetize a list by the driver's name, you need sort to begin with the second field (+1 means skip the first field). Sort normall treats the first blank (space or tab) in a sequence of blanks as the field separator, and consider that reht rest of the blanks are part of the next field. This has no effect on sorting on the second field because there is an equal number of blanks between the class letters and driver's name. However, whenever a field is "rapped"—for example, driver's name, car make, and car model—the next field will include leading blanks:
$ sort +1 auto DS Arbiter 86 Honda CRX-Si 48.628 ES Arther 85 Honda Prelude 49.412 BS Barker 90 Nissan 300ZX 48.209 CS Carlson 88 Pontiac Fiero 47.398 CS Chunk 85 Toyota MR2 49.558 CS Cohen 91 Mazda Miata 50.046 ES Downs 83 VW GTI 47.133 DS Karle 74 Porsche 914 48.826 DS Kegler 84 Honda Civic 47.429 DS Lisanti 73 Porsche 914 50.609 AS Lisle 72 Porsche 911 51.030 CS McGill 83 Porsche 944 50.642 CS Miller 84 Mazda RX-7 47.291 AS Neuman 84 Porsche 911 47.201 ES Peerson 86 VW Golf 54.493 AS Saint 88 BMW M-3 46.629 ES Sherman 83 VW GTI 48.489 ES Shorn 87 VW GTI 49.357 ES Smith 86 VW GTI 47.154 ES Straw 86 Honda Civic 49.543 DS Swazy 87 Honda CRX-Si 49.693Note that the key to this sort is only the driver's name. However, if two drivers had the same name, they would have been further sorted by the car year. In other words, +1 actually means skip the first field and sort on the rest of the line. Here's a list sorted by race times:
$ sort -b +5 auto AS Saint 88 BMW M-3 46.629 ES Downs 83 VW GTI 47.133 ES Smith 86 VW GTI 47.154 AS Neuman 84 Porsche 911 47.201 CS Miller 84 Mazda RX-7 47.291 CS Carlson 88 Pontiac Fiero 47.398 DS Kegler 84 Honda Civic 47.429 BS Barker 90 Nissan 300ZX 48.209 ES Sherman 83 VW GTI 48.489 DS Arbiter 86 Honda CRX-Si 48.628 DS Karle 74 Porsche 914 48.826 ES Shorn 87 VW GTI 49.357 ES Arther 85 Honda Prelude 49.412 ES Straw 86 Honda Civic 49.543 CS Chunk 85 Toyota MR2 49.558 DS Swazy 87 Honda CRX-Si 49.693 CS Cohen 91 Mazda Miata 50.046 DS Lisanti 73 Porsche 914 50.609 CS McGill 83 Porsche 944 50.642 AS Lisle 72 Porsche 911 51.030 ES Peerson 86 VW Golf 54.493The -b means do not treat the blanks between the car model (e.g. M-3) and the race time as part of the race time.
Suppose that you want a list of times by class. You try the following command and discover that it fails:
$ sort +0 -b +5 auto AS Lisle 72 Porsche 911 51.030 AS Neuman 84 Porsche 911 47.201 AS Saint 88 BMW M-3 46.629 BS Barker 90 Nissan 300ZX 48.209 CS Carlson 88 Pontiac Fiero 47.398 CS Chunk 85 Toyota MR2 49.558 CS Cohen 91 Mazda Miata 50.046 CS McGill 83 Porsche 944 50.642 CS Miller 84 Mazda RX-7 47.291 DS Arbiter 86 Honda CRX-Si 48.628 DS Karle 74 Porsche 914 48.826 DS Kegler 84 Honda Civic 47.429 DS Lisanti 73 Porsche 914 50.609 DS Swazy 87 Honda CRX-Si 49.693 ES Arther 85 Honda Prelude 49.412 ES Downs 83 VW GTI 47.133 ES Peerson 86 VW Golf 54.493 ES Sherman 83 VW GTI 48.489 ES Shorn 87 VW GTI 49.357 ES Smith 86 VW GTI 47.154 ES Straw 86 Honda Civic 49.543This command line fails because it tells sort to skip nothing and sort on the rest of the line, then sort on the sixth field. To restrict the first sort to just the class, and then sort on time as the secondary sort, use the following expression:
$ sort +0 -1 -b +5 auto AS Saint 88 BMW M-3 46.629 AS Neuman 84 Porsche 911 47.201 AS Lisle 72 Porsche 911 51.030 BS Barker 90 Nissan 300ZX 48.209 CS Miller 84 Mazda RX-7 47.291 CS Carlson 88 Pontiac Fiero 47.398 CS Chunk 85 Toyota MR2 49.558 CS Cohen 91 Mazda Miata 50.046 CS McGill 83 Porsche 944 50.642 DS Kegler 84 Honda Civic 47.429 DS Arbiter 86 Honda CRX-Si 48.628 DS Karle 74 Porsche 914 48.826 DS Swazy 87 Honda CRX-Si 49.693 DS Lisanti 73 Porsche 914 50.609 ES Downs 83 VW GTI 47.133 ES Smith 86 VW GTI 47.154 ES Sherman 83 VW GTI 48.489 ES Shorn 87 VW GTI 49.357 ES Arther 85 Honda Prelude 49.412 ES Straw 86 Honda Civic 49.543 ES Peerson 86 VW Golf 54.493This command says skip nothing and stop after sorting on the first field, then skip to the end of the fifth field and sort on the rest of the line. In this case, the rest of the line is just the sixth field. Here's a simple merge example. Notice that both files are already sorted by class and name.
$ cat auto.1 AS Neuman 84 Porsche 911 47.201 AS Saint 88 BMW M-3 46.629 BS Barker 90 Nissan 300ZX 48.209 CS Carlson 88 Pontiac Fiero 47.398 CS Miller 84 Mazda RX-7 47.291 DS Swazy 87 Honda CRX-Si 49.693 ES Arther 85 Honda Prelude 49.412 ES Downs 83 VW GTI 47.133 ES Smith 86 VW GTI 47.154 ES Straw 86 Honda Civic 49.543 $ cat auto.2 AS Lisle 72 Porsche 911 51.030 CS Chunk 85 Toyota MR2 49.558 CS Cohen 91 Mazda Miata 50.046 CS McGill 83 Porsche 944 50.642 DS Arbiter 86 Honda CRX-Si 48.628 DS Karle 74 Porsche 914 48.826 DS Kegler 84 Honda Civic 47.429 DS Lisanti 73 Porsche 914 50.609 ES Peerson 86 VW Golf 54.493 ES Sherman 83 VW GTI 48.489 ES Shorn 87 VW GTI 49.357 $ sort -m auto.1 auto.2 AS Lisle 72 Porsche 911 51.030 AS Neuman 84 Porsche 911 47.201 AS Saint 88 BMW M-3 46.629 BS Barker 90 Nissan 300ZX 48.209 CS Carlson 88 Pontiac Fiero 47.398 CS Chunk 85 Toyota MR2 49.558 CS Cohen 91 Mazda Miata 50.046 CS McGill 83 Porsche 944 50.642 CS Miller 84 Mazda RX-7 47.291 DS Arbiter 86 Honda CRX-Si 48.628 DS Karle 74 Porsche 914 48.826 DS Kegler 84 Honda Civic 47.429 DS Lisanti 73 Porsche 914 50.609 DS Swazy 87 Honda CRX-Si 49.693 ES Arther 85 Honda Prelude 49.412 ES Downs 83 VW GTI 47.133 ES Peerson 86 VW Golf 54.493 ES Sherman 83 VW GTI 48.489 ES Shorn 87 VW GTI 49.357 ES Smith 86 VW GTI 47.154 ES Straw 86 Honda Civic 49.543For a final example, pass1 is an excerpt from /etc/passwd and Sort it on the user ID field—field number 3. Specify the -t option so that the field separator used by sort is the colon, as used by /etc/passwd.
$ cat pass1 root:x:0:0:System Administrator:/usr/root:/bin/ksh slan:x:57:57:StarGROUP Software NPP Administration:/usr/slan: labuucp:x:21:100:shevett's UPC:/usr/spool/uucppublic:/usr/lib/uucp/uucico pcuucp:x:35:100:PCLAB:/usr/spool/uucppublic:/usr/lib/uucp/uucico techuucp:x:36:100:The 6386:/usr/spool/uucppublic:/usr/lib/uucp/uucico pjh:x:102:0:Peter J. Holsberg:/usr/pjh:/bin/ksh lkh:x:250:1:lkh:/usr/lkh:/bin/ksh shevett:x:251:1:dave shevett:/usr/shevett:/bin/ksh mccollo:x:329:1:Carol McCollough:/usr/home/mccollo:/bin/ksh gordon:x:304:20:gordon gary g:/u1/fall91/dp168/gordon:/bin/csh grice:x:273:20:grice steven a:/u1/fall91/dp270/grice:/bin/ksh gross:x:305:20:gross james l:/u1/fall91/dp168/gross:/bin/ksh hagerho:x:326:20:hagerhorst paul j:/u1/fall91/dp168/hagerho:/bin/ksh hendric:x:274:20:hendrickson robbin:/u1/fall91/dp270/hendric:/bin/ksh hinnega:x:320:20:hinnegan dianna:/u1/fall91/dp163/hinnega:/bin/ksh innis:x:262:20:innis rafael f:/u1/fall91/dp270/innis:/bin/ksh intorel:x:286:20:intorelli anthony:/u1/fall91/dp168/intorel:/bin/kshNow run sort with the delimiter set to a colon:
$ sort -t: +2 -3 pass1 root:x:0:0:System Administrator:/usr/root:/bin/ksh pjh:x:102:0:Peter J. Holsberg:/usr/pjh:/bin/ksh labuucp:x:21:100:shevett's UPC:/usr/spool/uucppublic:/usr/lib/uucp/uucico lkh:x:250:1:lkh:/usr/lkh:/bin/ksh shevett:x:251:1:dave shevett:/usr/shevett:/bin/ksh innis:x:262:20:innis rafael f:/u1/fall91/dp270/innis:/bin/ksh grice:x:273:20:grice steven a:/u1/fall91/dp270/grice:/bin/ksh hendric:x:274:20:hendrickson robbin:/u1/fall91/dp270/hendric:/bin/ksh intorel:x:286:20:intorelli anthony:/u1/fall91/dp168/intorel:/bin/ksh gordon:x:304:20:gordon gary g:/u1/fall91/dp168/gordon:/bin/csh gross:x:305:20:gross james l:/u1/fall91/dp168/gross:/bin/ksh hinnega:x:320:20:hinnegan dianna:/u1/fall91/dp163/hinnega:/bin/ksh hagerho:x:326:20:hagerhorst paul j:/u1/fall91/dp168/hagerho:/bin/ksh mccollo:x:329:1:Carol McCollough:/usr/home/mccollo:/bin/ksh pcuucp:x:35:100:PCLAB:/usr/spool/uucppublic:/usr/lib/uucp/uucico techuucp:x:36:100:The 6386:/usr/spool/uucppublic:/usr/lib/uucp/uucico slan:x:57:57:StarGROUP Software NPP Administration:/usr/slan:Note that 35 comes after 329, because sort does not recognize numeric characters as being numbers. You want the user ID field to be sorted by numerical value, so correct the command by adding the -n option:
$ sort -t: -n +2 -3 pass1 root:x:0:0:System Administrator:/usr/root:/bin/ksh labuucp:x:21:100:shevett's UPC:/usr/spool/uucppublic:/usr/lib/uucp/uucico pcuucp:x:35:100:PCLAB:/usr/spool/uucppublic:/usr/lib/uucp/uucico techuucp:x:36:100:The 6386:/usr/spool/uucppublic:/usr/lib/uucp/uucico slan:x:57:57:StarGROUP Software NPP Administration:/usr/slan: pjh:x:102:0:Peter J. Holsberg:/usr/pjh:/bin/ksh lkh:x:250:1:lkh:/usr/lkh:/bin/ksh shevett:x:251:1:dave shevett:/usr/shevett:/bin/ksh innis:x:262:20:innis rafael f:/u1/fall91/dp270/innis:/bin/ksh grice:x:273:20:grice steven a:/u1/fall91/dp270/grice:/bin/ksh hendric:x:274:20:hendrickson robbin:/u1/fall91/dp270/hendric:/bin/ksh intorel:x:286:20:intorelli anthony:/u1/fall91/dp168/intorel:/bin/ksh gordon:x:304:20:gordon gary g:/u1/fall91/dp168/gordon:/bin/csh gross:x:305:20:gross james l:/u1/fall91/dp168/gross:/bin/ksh hinnega:x:320:20:hinnegan dianna:/u1/fall91/dp163/hinnega:/bin/ksh hagerho:x:326:20:hagerhorst paul j:/u1/fall91/dp168/hagerho:/bin/ksh mccollo:x:329:1:Carol McCollough:/usr/home/mccollo:/bin/ksh
uniq [-udc [+n] [-m]] [input.file [output.file]]The following examples demonstrate the options. The sample file contains the results of a survey taken by a USENET news administrator on a local computer. He asked users what newsgroups they read (newsgroups are a part of the structure of USENET News, an international electronic bulletin board), used cat to merge the users" responses into a single file, and used sort to sort the file. ngs is a piece of that file.
$ cat ngs alt.dcom.telecom alt.sources comp.archives comp.bugs.sys5 comp.databases comp.databases.informix comp.dcom.telecom comp.lang.c comp.lang.c comp.lang.c comp.lang.c comp.lang.c++ comp.lang.c++ comp.lang.postscript comp.laserprinters comp.mail.maps comp.sources comp.sources.3b comp.sources.3b comp.sources.3b comp.sources.bugs comp.sources.d comp.sources.misc comp.sources.reviewed comp.sources.unix comp.sources.unix comp.sources.wanted comp.std.c comp.std.c comp.std.c++ comp.std.c++ comp.std.unix comp.std.unix comp.sys.3b comp.sys.att comp.sys.att comp.unix.questions comp.unix.shell comp.unix.sysv386 comp.unix.wizards u3b.sourcesTo produce a list that contains no duplicates, simply invoke uniq:
$ uniq ngs alt.dcom.telecom alt.sources comp.archives comp.bugs.sys5 comp.databases comp.databases.informix comp.dcom.telecom comp.lang.c comp.lang.c++ comp.lang.postscript comp.laserprinters comp.mail.maps comp.sources comp.sources.3b comp.sources.bugs comp.sources.d comp.sources.misc comp.sources.reviewed comp.sources.unix comp.sources.wanted comp.std.c comp.std.c++ comp.std.unix comp.sys.3b comp.sys.att comp.unix.questions comp.unix.shell comp.unix.sysv386 comp.unix.wizards u3b.sourcesThis is the desired list. Of course, you can get the same result by using the sort command's -u option while sorting the original file.
The -c option displays the so-called repetition count—the number of times each line appears in the original file:
$ uniq -c ngs 1 alt.dcom.telecom 1 alt.sources 1 comp.archives 1 comp.bugs.sys5 1 comp.dcom.telecom 1 comp.databases 1 comp.databases.informix 4 comp.lang.c 2 comp.lang.c++ 1 comp.lang.postscript 1 comp.laserprinters 1 comp.mail.maps 1 comp.sources 3 comp.sources.3b 1 comp.sources.bugs 1 comp.sources.d 1 comp.sources.misc 1 comp.sources.reviewed 2 comp.sources.unix 1 comp.sources.wanted 2 comp.std.c 2 comp.std.c++ 2 comp.std.unix 1 comp.sys.3b 2 comp.sys.att 1 comp.unix.questions 1 comp.unix.shell 1 comp.unix.sysv386 1 comp.unix.wizards 1 u3b.sourcesThe -u command tells uniq to output only the truly unique lines; that is, the lines that have a repetition count of 1:
$ uniq -u ngs alt.dcom.telecom alt.sources comp.archives comp.bugs.sys5 comp.databases comp.databases.informix comp.dcom.telecom comp.lang.postscript comp.laserprinters comp.mail.maps comp.sources comp.sources.bugs comp.sources.d comp.sources.misc comp.sources.reviewed comp.sources.wanted comp.sys.3b comp.unix.questions comp.unix.shell comp.unix.sysv386 comp.unix.wizards u3b.sourcesThe -d option tells uniq to output only those lines that have a repetition count of 2 or more:
$ uniq -d ngs comp.lang.c comp.lang.c++ comp.sources.3b comp.sources.unix comp.std.c comp.std.c++ comp.std.unix comp.sys.attThe uniq command also can handle lines that are divided into fields by a separator that consists of one or more spaces or tabs. The -m option tells uniq to skip the first m fields. The file mccc.ngs contains an abbreviated and modified newsgroup list in which every dot (.) is changed to a tab:
$ cat mccc.ngs alt dcom telecom alt sources comp dcom telecom comp sources u3b sourcesNotice that some of the lines are identical except for the first field, so sort the file on the second field:
$ sort +1 mccc.ngs > mccc.ngs-1 $ cat mccc.ngs-1 alt dcom telecom comp dcom telecom alt sources comp sources u3b sourcesNow display lines that are unique except for the first field:
$ uniq -1 mccc.ngs-1 alt dcom telecom alt sourcesThe uniq command also can ignore the first m columns of a sorted file. The +n option tells uniq to skip the first n columns. The new file mccc.ngs-2 has four characters in each of its first fields on each line:
$ cat mccc.ngs-2 alt .dcom.telecom comp.dcom.telecom alt .sources comp.sources u3b .sources $ uniq +4 mccc.ngs-2 alt .dcom.telecom alt .sources
The result of this research is a programming technique that compresses text files to about 50 percent of their original lengths. Although not as efficient with files that include characters that use all eight bits, this technique can indeed reduce file sizes substantially. Because the files are smaller, storage and file transfer can be much more efficient.
There are three UNIX commands associated with compression: compress, uncompress, and zcat. Here is the syntax for each command:
compress [ -cfv ] [ -b bits ] file(s) uncompress [ -cv ] [ file(s) ] zcat [ file(s)]The options for these commands are listed in Table 6.7.
Incidentally, note that all three of these utilities can take their input from stdin through a pipe. For example, suppose that you retrieve a compressed tar archive (see Chapter 32, "Backing Up") from some site that archives free programs. If the compressed file were called archive.tar.Z, you could then uncompress it and separate it into its individual files with the following command:
$ zcat archive.tar * | tar -xf -
Incidentally, pr has nothing to do with actual printing on a printer. The name was used originally because the terminals of that time were printers—there were no screens as we know them today. You'll learn about true printing in the next section, "Printing Hard Copy Output." The syntax for the pr command is as follows:
pr -m [-N [-wM] [-a]] [-ecK] [-icK] [-drtfp] [+p] [ -ncK] [-oO] [-lL] [-sS] [-h header] [-F] [file(s)]
$ cat names allen christopher babinchak david best betty bloom dennis boelhower joseph bose anita cacossa ray chang liang crawford patricia crowley charles cuddy michael czyzewski sharon delucia josephThe pr command normally prints a file with a five-line header and a five-line footer. The header, by default, consists of these five lines: two blank lines; a line that shows the date, time, filename, and page number; and two more blank lines. The footer consists of five blank lines. The blank lines provide proper top and bottom margins so that you can pipe the output of the pr command to a command that sends a file to the printer. The pr command normally uses 66-line pages, but to save space the demonstrations use a page length of 17: five lines of header, five lines of footer, and seven lines of text.
Use the -l option with a 17 argument to do this:
$ pr -l17 names Sep 19 15:05 1991 names Page 1 allen christopher babinchak david best betty bloom dennis boelhower joseph bose anita cacossa ray (Seven blank lines follow.) Sep 19 15:05 1991 names Page 2 chang liang crawford patricia crowley charles cuddy michael czyzewski sharon delucia josephNotice that pr puts the name for the file in the header, just before the page number. You can specify your own header with -h:
$ pr -l17 -h "This is the NAMES file" names Sep 19 15:05 1991 This is the NAMES file Page 1 allen christopher babinchak david best betty bloom dennis boelhower joseph bose anita cacossa ray (Seven blank lines follow.) Sep 19 15:05 1991 This is the NAMES file Page 2 chang liang crawford patricia crowley charles cuddy michael czyzewski sharon delucia josephThe header that you specify replaces the file name.
$ pr -l17 -2 names Sep 19 15:05 1991 names Page 1 allen christopher chang liang babinchak david crawford patricia best betty crowley charles bloom dennis cuddy michael boelhower joseph czyzewski sharon bose anita delucia joseph cacossa rayYou can number the lines of text; the numbering always begins with 1:
$ pr -l17 -n names Sep 19 15:05 1991 names Page 1 1 allen christopher 2 babinchak david 3 best betty 4 bloom dennis 5 boelhower joseph 6 bose anita 7 cacossa ray (Seven blank lines follow.) Sep 19 15:05 1991 names Page 2 8 chang liang 9 crawford patricia 10 crowley charles 11 cuddy michael 12 czyzewski sharon 13 delucia josephCombining numbering and multicolumns results in the following:
$ pr -l17 -n -2 names Sep 19 15:05 1991 names Page 1 1 allen christopher 8 chang liang 2 babinchak david 9 crawford patricia 3 best betty 10 crowley charles 4 bloom dennis 11 cuddy michael 5 boelhower joseph 12 czyzewski sharon 6 bose anita 13 delucia joseph 7 cacossa raypr is, good for combining two or more files. Here are three files created from fields in /etc/passwd:
$ cat p-login allen babinch best bloom boelhow bose cacossa chang crawfor crowley cuddy czyzews delucia diesso dimemmo dintron $ cat p-home /u1/fall91/dp168/allen /u1/fall91/dp270/babinch /u1/fall91/dp163/best /u1/fall91/dp168/bloom /u1/fall91/dp163/boelhow /u1/fall91/dp168/bose /u1/fall91/dp270/cacossa /u1/fall91/dp168/chang /u1/fall91/dp163/crawfor /u1/fall91/dp163/crowley /u1/fall91/dp270/cuddy /u1/fall91/dp168/czyzews /u1/fall91/dp168/delucia /u1/fall91/dp270/diesso /u1/fall91/dp168/dimemmo /u1/fall91/dp168/dintron $ cat p-uid 278 271 312 279 314 298 259 280 317 318 260 299 300 261 301 281The -m option tells pr to merge the files:
$ pr -m -l20 p-home p-uid p-login Oct 12 14:15 1991 Page 1 /u1/fall91/dp168/allen 278 allen /u1/fall91/dp270/babinc 271 babinch /u1/fall91/dp163/best 312 best /u1/fall91/dp168/bloom 279 bloom /u1/fall91/dp163/boelho 314 boelhow /u1/fall91/dp168/bose 298 bose /u1/fall91/dp270/cacoss 259 cacossa /u1/fall91/dp168/chang 280 chang /u1/fall91/dp163/crawfo 317 crawfor /u1/fall91/dp163/crowle 318 crowley (Seven blank lines follow.) Oct 12 14:15 1991 Page 2 /u1/fall91/dp270/cuddy 260 cuddy /u1/fall91/dp168/czyzew 299 czyzews /u1/fall91/dp168/deluci 300 delucia /u1/fall91/dp270/diesso 261 diesso /u1/fall91/dp168/dimemm 301 dimemmo /u1/fall91/dp168/dintro 281 dintronYou can tell pr what to put between fields by using -s and a character. If you omit the character, pr uses a tab character.
$ pr -m -l20 -s p-home p-uid p-login Oct 12 14:16 1991 Page 1 /u1/fall91/dp168/allen 278 allen /u1/fall91/dp270/babinch 271 babinch /u1/fall91/dp163/best 312 best /u1/fall91/dp168/bloom 279 bloom /u1/fall91/dp163/boelhow 314 boelhow /u1/fall91/dp168/bose 298 bose /u1/fall91/dp270/cacossa 259 cacossa /u1/fall91/dp168/chang 280 chang /u1/fall91/dp163/crawfor 317 crawfor /u1/fall91/dp163/crowley 318 crowley (Seven blank lines follow.) Oct 12 14:16 1991 Page 2 /u1/fall91/dp270/cuddy 260 cuddy /u1/fall91/dp168/czyzews 299 czyzews /u1/fall91/dp168/delucia 300 delucia /u1/fall91/dp270/diesso 261 diesso /u1/fall91/dp168/dimemmo 301 dimemmo /u1/fall91/dp168/dintron 281 dintronThe -t option makes pr act somewhat like cat. By including the -t option, you can specify the order of merging, and even tell pr not to print (or leave room for) the header and footer:
$ pr -m -t -s p-uid p-login p-home 278 allen /u1/fall91/dp168/allen 271 babinch /u1/fall91/dp270/babinch 312 best /u1/fall91/dp163/best 279 bloom /u1/fall91/dp168/bloom 314 boelhow /u1/fall91/dp163/boelhow 298 bose /u1/fall91/dp168/bose 259 cacossa /u1/fall91/dp270/cacossa 280 chang /u1/fall91/dp168/chang 317 crawfor /u1/fall91/dp163/crawfor 318 crowley /u1/fall91/dp163/crowley 260 cuddy /u1/fall91/dp270/cuddy 299 czyzews /u1/fall91/dp168/czyzews 300 delucia /u1/fall91/dp168/delucia 261 diesso /u1/fall91/dp270/diesso 301 dimemmo /u1/fall91/dp168/dimemmo 281 dintron /u1/fall91/dp168/dintron
Your system administrator can tell you which printers are available on your computer, or you can use the lpstat command to find out yourself. (This command is described later in this section.)
lp [options] [files]This command causes the named files and the designated options (if any) to become a print request. If no files are named in the command line, lp takes its input from the standard input so that it can be the last command in a pipeline. Table 6.9 contains the most frequently used options for lp.
$ lp sample request id is lj-19 (1 file)Note the response from the printing system. If you don't happen to remember the request id later, don't worry; lpstat will tell it to you, as long as it has not finished printing the file. Once the system has finished printing, your request has been fulfilled and no longer exists.
Suppose your organization has a fancy, all-the-latest-bells-and-whistles-and-costing-more-than-an-arm-and-a-leg printer, code-named the_best in the Chairman's secretary's office in the next building. People are permitted to use it for the final copies of important documents so it is kept fairly busy. And you don't want to have to walk over to that building and climb 6 flights of stairs to retrieve your print job until you know it's been printed. So you type
$ lp -m -d the_best final.report.94 request id is the_best-19882 (1 file)You have asked that the printer called the_best be used and that mail be sent to you when the printing has completed. (This assumes that this printer and your computer are connected on some kind of network that will transfer the actual file from your computer to the printer.)
cancel [request-ID(s)]where request-ID(s) is the print job number that lp displays when you make a print request. Again, if you forget the request-ID, lpstat (see the section on lpstat) will show it to you.
$lpstat [options] [request-ID(s)]When you use the lp command, it puts your request in a queue and issues a request ID for that particular command. If you supply that ID to lpstat, it reports on the status of that request. If you omit all IDs and use the lpstat command with no arguments, it displays the status of all your print requests.
Some options take a parameter list as arguments, indicated by [list] below. You can supply that list as either a list separated by commas, or a list enclosed in double quotation marks and separated by spaces, as in the following examples:
-p printer1,printer2 -u "user1 user2 user3"If you specify all as the argument to an option that takes a list or if you omit the argument entirely, lpstat provides information about all requests, devices, statuses, and so on, appropriate to that option letter. For example, the following commands both display the status of all output requests:
$ lpstat -o all $ lpstat -oHere are some of the more common arguments and options for lpstat:
dircmp [-d] [-s] [-wn] dir1 dir2The options are as follows:
./phlumph: total 24 -rw-r- -r- - 1 pjh sys 8432 Mar 6 13:02 TTYMON -rw-r- -r- - 1 pjh sys 51 Mar 6 12:57 x -rw-r- -r- - 1 pjh sys 340 Mar 6 12:55 y -rw-r- -r- - 1 pjh sys 222 Mar 6 12:57 z ./xyzzy: total 8 -rw-r- -r- - 1 pjh sys 385 Mar 6 13:00 CLEANUP -rw-r- -r- - 1 pjh sys 52 Mar 6 12:55 x -rw-r- -r- - 1 pjh sys 340 Mar 6 12:55 y -rw-r- -r- - 1 pjh sys 241 Mar 6 12:55 zEach directory includes a unique file and three pairs of files that have the same name. Of the three files, two of them differ in size and presumably in content. Now use dircmp to determine whether the files in the two directories are the same or different, as follows:
$ dircmp xyzzy phlumph Mar 6 13:02 1994 xyzzy only and phlumph only Page 1 ./CLEANUP ./TTYMON (Many blank lines removed to save space.) Mar 6 13:02 1994 Comparison of xyzzy phlumph Page 1 directory . different ./x same ./y different ./z (Many blank lines removed to save space.) $Note that dircmp first reports on the files unique to each directory and then comments about the common files.
$ dircmp -d xyzzy phlumph Mar 6 13:02 1994 xyzzy only and phlumph only Page 1 ./CLEANUP ./TTYMON (Many blank lines removed to save space.) Mar 6 13:02 1994 Comparison of xyzzy phlumph Page 1 directory . different ./x same ./y different ./z (Many blank lines removed to save space.) Mar 6 13:02 1994 diff of ./x in xyzzy and phlumph Page 1 3c3 < echo "root has logged out..." - - > echo "pjh has logged out..." (Many blank lines removed to save space.) Mar 6 13:02 1994 diff of ./z in xyzzy and phlumph Page 1 6d5 < j) site=jonlab ;; (Many blank lines removed to save space.) $At this point, you may want to refer back to the section "The diff Command" later in this chapter.
$crypt [ key ] < clearfile > encryptedfilewhere key is any phrase. For example
crpyt 'secret agent 007" <mydat> xyzzywill encrypt the contents of my dat and write the result to xyzzy.
$crypt -k < clearfile > encryptedfileThe encryption key need not be complex. In fact, the longer it is, the more time it takes to do the decryption. A key of three lowercase letters causes decryption to take as much as five minutes of machine time—and possibly much more on a multiuser machine.
Also, do not pipe the output of crypt through any program that changes the settings of your terminal. Otherwise, when crypt finishes, the output will be in a strange state.
$ head names allen christopher babinchak david best betty bloom dennis boelhower joseph bose anita cacossa ray chang liang crawford patricia crowley charlesYou can specify the number of lines that head displays, as follows:
$ head -4 names allen christopher babinchak david best betty bloom dennisTo view the last few lines of a file, use the tail command. This command is helpful when you have a large file and want to look at at the end only. For example, suppose that you want to see the last few entries in the log file that records the transactions that occur when files are transferred between your machine and a neighboring machine. That log file may be large, and you surely don't want to have to read all the beginning and middle of it just to get to the end.
By default, tail prints the last 10 lines of a file to stdout (by default, the screen). Suppose that your names file consist of the following:
$ cat names allen christopher babinchak david best betty bloom dennis boelhower joseph bose anita cacossa ray chang liang crawford patricia crowley charles cuddy michael czyzewski sharon delucia josephThe tail command limits your view to the last 10 lines:
$ tail names bloom dennis boelhower joseph bose anita cacossa ray chang liang crawford patricia crowley charles cuddy michael czyzewski sharon delucia josephYou can change this display by specifying the number of lines to print. For example, the following command prints the last five lines of names:
$ tail -5 names crawford patricia crowley charles cuddy michael czyzewski sharon delucia josephThe tail also can follow a file; that is, it can continue looking at a file as a program continues to add text to the end of that file. The syntax is
tail -f logfilewhere logfile is the name of the file being written to. If you're logged into a busy system, try one of the following forms:
$ tail -f /var/uucp/.Log/uucico/neighbor $ tail -f /var/uucp/.Log/uuxqt/neighborwhere neighbor is the name of a file that contains log information about a computer that can exchange information with yours. The first is the log file that logs file-transfer activity between your computer and neighbor, and the second is the log of commands that your computer has executed as requested by neighbor. The tail command has several other useful options:
$tee [-i] [-a] [file(s)]The tee command can send its output to multiple files simultaneously. With the -a option specified, tee appends the output to those files instead of overwriting them. The -i option prevents the pipline from being broken. To show the use of tee, type the comman that follows:
$ lp /etc/passwd | tee statusThis command causes the file /etc/passwd to be sent to the default printer, prints a message about the print request on the screen and simultaneously captures that message in a file called status. The tee sends the output of the lp command to two places: the screen and the named file.
$ touch 0704202090 fireworkschanges both access and modification time and dates of the file fireworks to July 4, 1990, 8:20 P.M.
$ split [ -n ] [ in-file [ out-file ] ]This command reads the text file in-file and splits it into several files, each consisting of n lines (except possibly the last file). If you omit -n, split creates 1,000-line files. The names of the small files depend on whether or not you specify out-file. If you do, these files are named out-fileaa, out-fileab, out-fileac, and so on. If you have more than 26 output files, the 27th is named as out-fileba, the 28th as out-filebb, and so forth. If you omit out-file, split uses x in its place, so that the files are named xaa, xab, xac, and so on.
To recreate the original file from a group of files named xaa and xab, etc., type
$ cat xa* > new-nameIt may be more sensible to divide a file according to the context of its contents, rather than on a chosen number of lines. UNIX offers a context splitter, called csplit. This command's syntax is
$ csplit [ -s ] [ -k ] [ -f out-file ] in-file arg(s)where in-file is the name of the file to be split, and out-file is the base name of teh ouput files.
The arg(s) determine where each file is split. If you have N args, you get N+1 output files, named out-file00, out-file01, and so on, through out-fileN (with a 0 in front of N if N is less than 10). N cannot be greater than 99. If you do not specify an out-file argument, csplit names the files xx00, xx01, and so forth. See below for an example where a file is divided by context into five files. The -s option suppresses csplit's reporting of the number of characters in each output file. The -k option prevents csplit from deleting all output files if an error occurs.
Suppose that you have a password file such as the following. It is divided into sections: an unlabeled one at the beginning, followed by UUCP Logins, Special Users, DP Fall 1991, and NCR.
$ cat passwd root:x:0:0:System Administrator:/usr/root:/bin/ksh reboot:x:7:1:- -:/:/etc/shutdown -y -g0 -i6 listen:x:37:4:x:/usr/net/nls: slan:x:57:57:StarGROUP Software NPP Administration:/usr/slan: lp:x:71:2:x:/usr/spool/lp: _:- :6:6: ============================== :6: _:- :6:6: == UUCP Logins :6: _:- :6:6: ============================== :6: uucp:x:5:5:0000-uucp(0000):x: nuucp:x:10:10:0000-uucp(0000):/usr/spool/uucppublic:/usr/lib/uucp/uucico zzuucp:x:37:100:Bob Sorenson:/usr/spool/uucppublic:/usr/lib/uucp/uucico asyuucp:x:38:100:Robert L. Wald:/usr/spool/uucppublic:/usr/lib/uucp/uucico knuucp:x:39:100:Kris Knigge:/usr/spool/uucppublic:/usr/lib/uucp/uucico _:- :6:6: ============================== :6: _:- :6:6: == Special Users :6: _:- :6:6: ============================== :6: msnet:x:100:99:Server Program:/usr/net/servers/msnet:/bin/false install:x:101:1:x:/usr/install: pjh:x:102:0:Peter J. Holsberg:/usr/pjh:/bin/ksh hohen:x:346:1:Michael Hohenshilt:/usr/home/hohen:/bin/ksh reilly:x:347:1:Joan Reilly:/usr/home/reilly:/bin/ksh _:- :6:6: ============================== :6: _:- :6:6: == DP Fall 1991 :6: _:- :6:6: ============================== :6: gordon:x:304:20:gordon gary g:/u1/fall91/dp168/gordon:/bin/csh lewis:x:288:20:lewis prince e:/u1/fall91/dp168/lewis:/bin/ksh metelit:x:265:20:metelitsa natalya:/u1/fall91/dp270/metelit:/bin/ksh nadaraj:x:307:20:nadarajah kalyani:/u1/fall91/dp168/nadaraj:/bin/ksh nado:x:266:20:nado conan j:/u1/fall91/dp270/nado:/bin/ksh _:- :6:6: ============================== :6: _:- :6:6: === NCR =================== :6: _:- :6:6: ============================== :6: antello:x:334:20:antello ronald f:/u1/fall91/ff437/antello:/bin/ksh cilino:x:335:20:cilino michael a:/u1/fall91/ff437/cilino:/bin/ksh emmons:x:336:20:emmons william r:/u1/fall91/ff437/emmons:/bin/ksh foreste:x:337:20:forester james r:/u1/fall91/ff437/foreste:/bin/ksh hayden:x:338:20:hayden richard:/u1/fall91/ff437/hayden:/bin/kshYou might want to split this file so that each section has its own file. To split this file into multiple files, you must specify the appropriate arguments to csplit. Each takes the form of a text string surrounded by slash (/) marks. The csplit command then copies from the current line up to, but not including, the argument. The following is the first attempt at splitting the file with csplit:
$ csplit -f PA passwd /UUCP/ /Special/ /Fall/ /NCR/ 270 505 426 490 446Note that there are four args: uucp, special, fall, and ncr. There will be five files created: PA01 will contan everything from the beginning of passwd, to (but not including) the first line that contains uucp. PA02 will contain everything from the first line containing uucp up to (but not including) the line that contains special, and so on. Five files are created: the first has 270 characters, the second has 505 characters, and so on. Now let's see what they look like:
$ cat PA00 root:x:0:0:System Administrator:/usr/root:/bin/ksh reboot:x:7:1:- -:/:/etc/shutdown -y -g0 -i6 listen:x:37:4:x:/usr/net/nls: slan:x:57:57:StarGROUP Software NPP Administration:/usr/slan: lp:x:71:2:x:/usr/spool/lp: _:- :6:6: ============================== :6: $ cat PA01 _:- :6:6: == UUCP Logins :6: _:- :6:6: ============================== :6: uucp:x:5:5:0000-uucp(0000):x: nuucp:x:10:10:0000-uucp(0000):/usr/spool/uucppublic:/usr/lib/uucp/uucico zzuucp:x:37:100:Bob Sorenson:/usr/spool/uucppublic:/usr/lib/uucp/uucico asyuucp:x:38:100:Robert L. Wald:/usr/spool/uucppublic:/usr/lib/uucp/uucico knuucp:x:39:100:Kris Knigge:/usr/spool/uucppublic:/usr/lib/uucp/uucico _:- :6:6: ============================== :6: $ cat PA02 _:- :6:6: == Special Users :6: _:- :6:6: ============================== :6: msnet:x:100:99:Server Program:/usr/net/servers/msnet:/bin/false install:x:101:1:x:/usr/install: pjh:x:102:0:Peter J. Holsberg:/usr/pjh:/bin/ksh hohen:x:346:1:Michael Hohenshilt:/usr/home/hohen:/bin/ksh reilly:x:347:1:Joan Reilly:/usr/home/reilly:/bin/ksh _:- :6:6: ============================== :6: $ cat PA03 _:- :6:6: == DP Fall 1991 :6: _:- :6:6: ============================== :6: gordon:x:304:20:gordon gary g:/u1/fall91/dp168/gordon:/bin/csh lewis:x:288:20:lewis prince e:/u1/fall91/dp168/lewis:/bin/ksh metelit:x:265:20:metelitsa natalya:/u1/fall91/dp270/metelit:/bin/ksh nadaraj:x:307:20:nadarajah kalyani:/u1/fall91/dp168/nadaraj:/bin/ksh nado:x:266:20:nado conan j:/u1/fall91/dp270/nado:/bin/ksh _:- :6:6: ============================== :6: $ cat PA04 _:- :6:6: === NCR =================== :6: _:- :6:6: ============================== :6: antello:x:334:20:antello ronald f:/u1/fall91/ff437/antello:/bin/ksh cilino:x:335:20:cilino michael a:/u1/fall91/ff437/cilino:/bin/ksh emmons:x:336:20:emmons william r:/u1/fall91/ff437/emmons:/bin/ksh foreste:x:337:20:forester james r:/u1/fall91/ff437/foreste:/bin/ksh hayden:x:338:20:hayden richard:/u1/fall91/ff437/hayden:/bin/kshThis is not bad, but each file ends or begins with one or more lines that you don't want. The csplit command enables you to adjust the split point by appending an offset to the argument. For example, /UUCP/-1 means that the split point is the line before the one on which UUCP appears for the first time. Add -1 to each argument, and you should get rid of the unwanted line that ends each of the first four files:
$ csplit -f PB passwd /UUCP/-1 /Special/-1 /Fall/-1 /NCR/-1 213 505 426 490 503You can see that the first file is smaller than the previous first file. Perhaps this is working. Let's see:
$ cat PB00 root:x:0:0:System Administrator:/usr/root:/bin/ksh reboot:x:7:1:- -:/:/etc/shutdown -y -g0 -i6 listen:x:37:4:x:/usr/net/nls: slan:x:57:57:StarGROUP Software NPP Administration:/usr/slan: lp:x:71:2:x:/usr/spool/lp: $ cat PB01 _:- :6:6: ============================== :6: _:- :6:6: == UUCP Logins :6: _:- :6:6: ============================== :6: uucp:x:5:5:0000-uucp(0000):x: nuucp:x:10:10:0000-uucp(0000):/usr/spool/uucppublic:/usr/lib/uucp/uucico zzuucp:x:37:100:Bob Sorenson:/usr/spool/uucppublic:/usr/lib/uucp/uucico asyuucp:x:38:100:Robert L. Wald:/usr/spool/uucppublic:/usr/lib/uucp/uucico knuucp:x:39:100:Kris Knigge:/usr/spool/uucppublic:/usr/lib/uucp/uucico $ cat PB02 _:- :6:6: ============================== :6: _:- :6:6: == Special Users :6: _:- :6:6: ============================== :6: msnet:x:100:99:Server Program:/usr/net/servers/msnet:/bin/false install:x:101:1:x:/usr/install: pjh:x:102:0:Peter J. Holsberg:/usr/pjh:/bin/ksh hohen:x:346:1:Michael Hohenshilt:/usr/home/hohen:/bin/ksh reilly:x:347:1:Joan Reilly:/usr/home/reilly:/bin/ksh $ cat PB03 _:- :6:6: ============================== :6: _:- :6:6: == DP Fall 1991 :6: _:- :6:6: ============================== :6: gordon:x:304:20:gordon gary g:/u1/fall91/dp168/gordon:/bin/csh lewis:x:288:20:lewis prince e:/u1/fall91/dp168/lewis:/bin/ksh metelit:x:265:20:metelitsa natalya:/u1/fall91/dp270/metelit:/bin/ksh nadaraj:x:307:20:nadarajah kalyani:/u1/fall91/dp168/nadaraj:/bin/ksh nado:x:266:20:nado conan j:/u1/fall91/dp270/nado:/bin/ksh $ cat PB04 _:- :6:6: ============================== :6: _:- :6:6: === NCR =================== :6: _:- :6:6: ============================== :6: antello:x:334:20:antello ronald f:/u1/fall91/ff437/antello:/bin/ksh cilino:x:335:20:cilino michael a:/u1/fall91/ff437/cilino:/bin/ksh emmons:x:336:20:emmons william r:/u1/fall91/ff437/emmons:/bin/ksh foreste:x:337:20:forester james r:/u1/fall91/ff437/foreste:/bin/ksh hayden:x:338:20:hayden richard:/u1/fall91/ff437/hayden:/bin/kshThis is very good indeed. Now, to get rid of the unwanted lines at the beginning, you have csplit advance its current line without copying anything. A pair of arguments, /UUCP/-1 and %uucp%, tells csplit to skip all the lines beginning with the one that precedes the line containing UUCP, to the one that precedes the line containing uucp. This causes csplit to skip the lines that begin with _:-. The following displays the full command:
$ csplit -f PC passwd /UUCP/-1 %uucp% /Special/-1 %msnet% \ /Fall/-1 %dp[12][67][80]% /NCR/1%ff437% 213 334 255 321 332Note the backslash (/) at the end of the first line fo the command. This is simply a continuation character—it tells the shell that the carriage return (or Enter) that you're about to press is not the end of the command, but that you'd like to continue typing on the next line on the scree. Also note that any argument can be a regular expression. Here are the resulting files:
$ cat PC00 root:x:0:0:System Administrator:/usr/root:/bin/ksh reboot:x:7:1:- -:/:/etc/shutdown -y -g0 -i6 listen:x:37:4:x:/usr/net/nls: slan:x:57:57:StarGROUP Software NPP Administration:/usr/slan: lp:x:71:2:x:/usr/spool/lp: $ cat PC01 uucp:x:5:5:0000-uucp(0000):x: nuucp:x:10:10:0000-uucp(0000):/usr/spool/uucppublic:/usr/lib/uucp/uucico zzuucp:x:37:100:Bob Sorenson:/usr/spool/uucppublic:/usr/lib/uucp/uucico asyuucp:x:38:100:Robert L. Wald:/usr/spool/uucppublic:/usr/lib/uucp/uucico knuucp:x:39:100:Kris Knigge:/usr/spool/uucppublic:/usr/lib/uucp/uucico $ cat PC02 msnet:x:100:99:Server Program:/usr/net/servers/msnet:/bin/false install:x:101:1:x:/usr/install: pjh:x:102:0:Peter J. Holsberg:/usr/pjh:/bin/ksh hohen:x:346:1:Michael Hohenshilt:/usr/home/hohen:/bin/ksh reilly:x:347:1:Joan Reilly:/usr/home/reilly:/bin/ksh $ cat PC03 gordon:x:304:20:gordon gary g:/u1/fall91/dp168/gordon:/bin/csh lewis:x:288:20:lewis prince e:/u1/fall91/dp168/lewis:/bin/ksh metelit:x:265:20:metelitsa natalya:/u1/fall91/dp270/metelit:/bin/ksh nadaraj:x:307:20:nadarajah kalyani:/u1/fall91/dp168/nadaraj:/bin/ksh nado:x:266:20:nado conan j:/u1/fall91/dp270/nado:/bin/ksh $ cat PC04 antello:x:334:20:antello ronald f:/u1/fall91/ff437/antello:/bin/ksh cilino:x:335:20:cilino michael a:/u1/fall91/ff437/cilino:/bin/ksh emmons:x:336:20:emmons william r:/u1/fall91/ff437/emmons:/bin/ksh foreste:x:337:20:forester james r:/u1/fall91/ff437/foreste:/bin/ksh hayden:x:338:20:hayden richard:/u1/fall91/ff437/hayden:/bin/kshThe program, therefore, has been a success.
In addition, an argument can be a line number (typed as an argument but without slashes) to indicate that the desired split should take place at the line before the specified number. You also can specify a repeat factor by appending {number} to a pattern. For example, /login/{8} means use the first eight lines that contain login as split points.
The cmp command is especially useful in shell scripts (see Chapters 11, 12 and 13). The diff command is more specialized in what it does and where you can use it.
$ cmp [ -l ] [ -s ] file1 file2The -l option gives you more information. It displays the number of each character that is different (the first character in the file is number 1), and then prints the octal value of the ASCII code of that character. (You will probably not have any use for the octal value of a character until you become a shell programming expert!) The -s option prints nothing, but returns an appropriate result code (0 if there are no differences, 1 if there are one or more differences). This option is useful when you write shell scripts (see Chapters 11, 12, and 13).
Here are two files that you can compare with cmp:
$ cat na.1 allen christopher babinchak david best betty bloom dennis boelhower joseph bose anita cacossa ray delucia joseph $ cat na.2 allen christopher babinchak David best betty boelhower joseph bose cacossa ray delucia josephNote that the first difference between the two files is on the second line. The D in David in the second file is the 29th character, counting all newline characters at the ends of lines.
$ cmp na.1 na.2 na.1 na.2 differ: char 29, line 2 $ cmp -l na.1 na.2 cmp: 29 144 104 68 141 12 69 156 143 70 151 141 71 164 143 72 141 157 73 12 163 74 143 163 76 143 40 77 157 162 78 163 141 79 163 171 80 141 12 81 40 144 82 162 145 83 141 154 84 171 165 85 12 143 86 144 151 87 145 141 88 154 40 89 165 152 90 143 157 91 151 163 92 141 145 93 40 160 94 152 150 95 157 12This is quite a list! The 29th character is octal 144 in the first file and octal 104 in the second. If you look them up in an ASCII table, you'll see that the former is a d, and the latter is a D. Character 68 is the first a in anita in na.1 and the newline after the space after bose in na.2.
Now let's try the -s option on the two files:
$ cmp -s na.1 na.2 $ echo $? 1The variable ? is the shell variable that contains the result code of the last command, and $? is its value. The value 1 on the last line indicates that cmp found at least one difference between the two files. (See Chapters 11, 12, and 13.) Next, for contrast, compare a file with itself to see how cmp reports no differences:
$ cmp -s na.1 na.2 $ echo $? 0The value 0 means that cmp found no differences.
$ diff [-bitw] [-c | -e | -f | -h | -n] file1 file2 $ diff [-bitw] [-C number] file1 file2 $ diff [-bitw] [-D string] file1 file2 $ diff [-bitw] [-c | -e | -f | -h | -n] [-l] [-r] [-s] [-Sname] dir1 dir2The three sets of options—cefhn, -C number, and -Dstring—are mutually exclusive. The common options are
First, let's look at the two files that show what diff does:
Let's apply diff to the files na.1 and na.2 (the files with which cmp was demonstrated):
$ diff na.1 na.2 2c2 < babinchak david - - > babinchak David 4d3 < bloom dennis 6c5 < bose anita - - > boseThese editor commands are quite different from those that diff printed before. The first four lines show
2c2 < babinchak david - - > babinchak Davidwhich means that you can change the second line of file1 (na.1) to match the second line of file2 (na.2) by executing the command, which means change line 2 of file1 to line 2 of file2. Note that both the line from file1—prefaced with <—and the line from file2—prefaced with >—are displayed, separated by a line consisting of three dashes. The next command says to delete line 4 from file1 to bring it into agreement with file2 up to—but not including—line 3 of file2. Finally, notice that there is another change command, 6c5, which says change line 6 of file1 by replacing it with line 5 of file2.
Note that in line 2, the difference that diff found was the d versus D letter in the second word.
You can use the -i option to tell diff to ignore the case of the characters, as follows:
$ diff -i na.1 na.2 4d3 < bloom dennis 6c5 < bose anita - - > boseThe -c option causes the differences to be printed in context; that is, the output displays several of the lines above and below a line in which diff finds a difference. Each difference is marked with one of the following:
Note in the following example that the output includes a header that displays the names of the two files, and the times and dates of their last changes. The header also shows either stars (***) to designate lines from the first file, or dashes (- - -) to designate lines from the second file.
$ diff -c na.1 na.2 *** na.1 Sat Nov 9 12:57:55 1991 " na.2 Sat Nov 9 12:58:27 1991 *************** *** 1,8 **** allen christopher ! babinchak david best betty - bloom dennis boelhower joseph ! bose anita cacossa ray delucia joseph - - - 1,7 - - - allen christopher ! babinchak David best betty boelhower joseph ! bose cacossa ray delucia josephAfter the header comes another asterisk-filled header that shows which lines of file1 (na.1) will be printed next (1,8), followed by the lines themselves. You see that the babinchak line differs in the two files, as does the bose line. Also, bloomdennis does not appear in file2 (na.2). Next, you see a header of dashes that indicates which lines of file2 will follow (1,7). Note that for the file2 list, the babinchak line and the bose line are marked with exclamation points. The number of lines displayed depends on how close together the differences are (the default is three lines of context). Later in this section, when you once again use diff with p1 and p2, you'll see an example that show how to change the number of context lines.
diff can create an ed script (see Chapter 7) that you can use to change file1 into file2. First you a execute a command such as the following:
$ diff -e na.1 na.2 6c bose . 4d 2c babinchak David .Then you redirect this output to another file using a command such as the following:
$ diff -e na.1 na.2 > ed.scrEdit the file by adding two lines, w and q (see Chapter 7), which results in the following file:
$ cat ed.scr 6c bose . 4d 2c babinchak David . w qThen you execute the command:
$ ed na.1 < ed.scrThis command changes the contents na.1 to agree with na.2.
Perhaps this small example isn't very striking, but here's another, more impressive one. Suppose that you have a large program written in C that does something special for you; perhaps it manages your investments or keeps track of sales leads. Further, suppose that the people who provided the program discover that it has bugs (and what program doesn't?). They could either ship new disks that contain the rewritten program, or they could run diff on both the original and the corrected copy and then send you an ed script so that you can make the changes yourself. If the script were small enough (less than 50,000 characters or so), they could even distribute it through electronic mail.
The -f option creates what appears to be an ed script that changes file2 to file1. However, it is not an ed script at all, but a rather puzzling feature that is almost never used:
$ diff -f na.1 na.2 c2 babinchak David . d4 c6 bose .Also of limited value is the -h option, which causes diff to work in a "half-hearted" manner (according to the official AT&T UNIX System V Release 4 Users Reference Manual). With the -h option, diff is supposed to work best—and fast—on very large files having sections of change that encompass only a few lines at a time and that are widely separated in the files. Without -h, diff slows dramatically as the sizes increase for the files on which you are apply diff.
$ diff -h na.1 na.2 2c2 < babinchak david - - > babinchak David 4d3 < bloom dennis 6c5 < bose anita - - > boseAs you can see, diff with the -h option also works pretty well with original files that are too small to show a measurable difference in diff's speed.
The -n option, like -f, also produces something that lokks like an ed script, but isn't and is also rarely used. The -D option permits C programmers (see Chapter 17) to produce a source code file based on the differences between two source code files. This is useful when uniting a program that is to be compiled on two different computers.