SED -- A Non-interactive Text Editor USD:18-1
SED -- A Non-interactive Text Editor
Lee E. McMahon
AT&T Bell Laboratories
Murray Hill, New Jersey 07974
ABSTRACT
Sed is a non-interactive context editor that runs
on the UNIX operating system. Sed is designed to be
especially useful in three cases:
1) To edit files too large for comfortable inter-
active editing;
2) To edit any size file when the sequence of
editing commands is too complicated to be
comfortably typed in interactive mode.
3) To perform multiple `global' editing functions
efficiently in one pass through the input.
This memorandum constitutes a manual for users of sed.
Introduction
Sed is a non-interactive context editor designed to be especially
useful in three cases:
1) To edit files too large for comfortable interactive edit-
ing;
2) To edit any size file when the sequence of editing com-
mands is too complicated to be comfortably typed in
interactive mode;
3) To perform multiple `global' editing functions effi-
ciently in one pass through the input.
Since only a few lines of the input reside in core at one time,
and no temporary files are used, the effective size of file that
can be edited is limited only by the requirement that the input
and output fit simultaneously into available secondary storage.
Complicated editing scripts can be created separately and given
_________________________
UNIX is a trademark of AT&T Bell Laboratories.
to sed as a command file. For complex edits, this saves consid-
erable typing, and its attendant errors. Sed running from a com-
mand file is much more efficient than any interactive editor
known to the author, even if that editor can be driven by a pre-
written script.
The principal loss of functions compared to an interactive editor
are lack of relative addressing (because of the line-at-a-time
operation), and lack of immediate verification that a command has
done what was intended.
Sed is a lineal descendant of the UNIX editor, ed. Because of
the differences between interactive and non-interactive opera-
tion, considerable changes have been made between ed and sed;
even confirmed users of ed will frequently be surprised (and
probably chagrined), if they rashly use sed without reading Sec-
tions 2 and 3 of this document. The most striking family resem-
blance between the two editors is in the class of patterns (`reg-
ular expressions') they recognize; the code for matching patterns
is copied almost verbatim from the code for ed, and the descrip-
tion of regular expressions in Section 2 is copied almost verba-
tim from the UNIX Programmer's Manual[1]. (Both code and descrip-
tion were written by Dennis M. Ritchie.)
1. Overall Operation
Sed by default copies the standard input to the standard output,
perhaps performing one or more editing commands on each line
before writing it to the output. This behavior may be modified
by flags on the command line; see Section 1.1 below.
The general format of an editing command is:
[address1,address2][function][arguments]
One or both addresses may be omitted; the format of addresses is
given in Section 2. Any number of blanks or tabs may separate
the addresses from the function. The function must be present;
the available commands are discussed in Section 3. The arguments
may be required or optional, according to which function is
given; again, they are discussed in Section 3 under each individ-
ual function.
Tab characters and spaces at the beginning of lines are ignored.
1.1. Command-line Flags
Three flags are recognized on the command line:
-n: tells sed not to copy all lines, but only those speci-
fied by p functions or p flags after s functions (see
Section 3.3);
-e: tells sed to take the next argument as an editing
command;
-f: tells sed to take the next argument as a file name; the
file should contain editing commands, one to a line.
1.2. Order of Application of Editing Commands
Before any editing is done (in fact, before any input file is
even opened), all the editing commands are compiled into a form
which will be moderately efficient during the execution phase
(when the commands are actually applied to lines of the input
file). The commands are compiled in the order in which they are
encountered; this is generally the order in which they will be
attempted at execution time. The commands are applied one at a
time; the input to each command is the output of all preceding
commands.
The default linear order of application of editing commands can
be changed by the flow-of-control commands, t and b (see Section
3). Even when the order of application is changed by these com-
mands, it is still true that the input line to any command is the
output of any previously applied command.
1.3. Pattern-space
The range of pattern matches is called the pattern space. Ordi-
narily, the pattern space is one line of the input text, but more
than one line can be read into the pattern space by using the N
command (Section 3.6.).
1.4. Examples
Examples are scattered throughout the text. Except where other-
wise noted, the examples all assume the following input text:
In Xanadu did Kubla Khan
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
(In no case is the output of the sed commands to be considered an
improvement on Coleridge.)
Example:
The command
2q
will quit after copying the first two lines of the input. The
output will be:
In Xanadu did Kubla Khan
A stately pleasure dome decree:
2. ADDRESSES: Selecting lines for editing
Lines in the input file(s) to which editing commands are to be
applied can be selected by addresses. Addresses may be either
line numbers or context addresses.
The application of a group of commands can be controlled by one
address (or address-pair) by grouping the commands with curly
braces (`{ }')(Sec. 3.6.).
2.1. Line-number Addresses
A line number is a decimal integer. As each line is read from
the input, a line-number counter is incremented; a line-number
address matches (selects) the input line which causes the inter-
nal counter to equal the address line-number. The counter runs
cumulatively through multiple input files; it is not reset when a
new input file is opened.
As a special case, the character $ matches the last line of the
last input file.
2.2. Context Addresses
A context address is a pattern (`regular expression') enclosed in
slashes (`/'). The regular expressions recognized by sed are
constructed as follows:
1) An ordinary character (not one of those discussed below)
is a regular expression, and matches that character.
2) A circumflex `^' at the beginning of a regular expression
matches the null character at the beginning of a line.
3) A dollar-sign `$' at the end of a regular expression
matches the null character at the end of a line.
4) The characters `\n' match an imbedded newline character,
but not the newline at the end of the pattern space.
5) A period `.' matches any character except the terminal
newline of the pattern space.
6) A regular expression followed by an asterisk `*' matches
any number (including 0) of adjacent occurrences of the
regular expression it follows.
7) A string of characters in square brackets `[ ]' matches
any character in the string, and no others. If, how-
ever, the first character of the string is circumflex
`^', the regular expression matches any character
except the characters in the string and the terminal
newline of the pattern space.
8) A concatenation of regular expressions is a regular
expression which matches the concatenation of strings
matched by the components of the regular expression.
9) A regular expression between the sequences `\(' and `\)'
is identical in effect to the unadorned regular expres-
sion, but has side-effects which are described under
the s command below and specification 10) immediately
below.
10) The expression `\d' means the same string of characters
matched by an expression enclosed in `\(' and `\)' ear-
lier in the same pattern. Here d is a single digit;
the string specified is that beginning with the dth
occurrence of `\(' counting from the left. For exam-
ple, the expression `^\(.*\)\1' matches a line begin-
ning with two repeated occurrences of the same string.
11) The null regular expression standing alone (e.g., `//')
is equivalent to the last regular expression compiled.
To use one of the special characters (^ $ . * [ ] \ /) as a lit-
eral (to match an occurrence of itself in the input), precede the
special character by a backslash `\'.
For a context address to `match' the input requires that the
whole pattern within the address match some portion of the pat-
tern space.
2.3. Number of Addresses
The commands in the next section can have 0, 1, or 2 addresses.
Under each command the maximum number of allowed addresses is
given. For a command to have more addresses than the maximum
allowed is considered an error.
If a command has no addresses, it is applied to every line in the
input.
If a command has one address, it is applied to all lines which
match that address.
If a command has two addresses, it is applied to the first line
which matches the first address, and to all subsequent lines
until (and including) the first subsequent line which matches the
second address. Then an attempt is made on subsequent lines to
again match the first address, and the process is repeated.
Two addresses are separated by a comma.
Examples:
/an/ matches lines 1, 3, 4 in our sample text
/an.*an/ matches line 1
/^an/ matches no lines
/./ matches all lines
/\./ matches line 5
/r*an/ matches lines 1,3, 4 (number = zero!)
/\(an\).*\1/ matches line 1
3. FUNCTIONS
All functions are named by a single character. In the following
summary, the maximum number of allowable addresses is given
enclosed in parentheses, then the single character function name,
possible arguments enclosed in angles (< >), an expanded English
translation of the single-character name, and finally a descrip-
tion of what each function does. The angles around the arguments
are not part of the argument, and should not be typed in actual
editing commands.
3.1. Whole-line Oriented Functions
(2)d -- delete lines The d function deletes from the file
(does not write to the output) all those lines matched
by its address(es). It also has the side effect that
no further commands are attempted on the corpse of a
deleted line; as soon as the d function is executed, a
new line is read from the input, and the list of edit-
ing commands is re-started from the beginning on the
new line.
(2)n -- next line The n function reads the next line from
the input, replacing the current line. The current
line is written to the output if it should be. The
list of editing commands is continued following the n
command.
(1)a\
-- append lines
The a function causes the argument to be written
to the output after the line matched by its address.
The a command is inherently multi-line; a must appear
at the end of a line, and may contain any number
of lines. To preserve the one-command-to-a-line fic-
tion, the interior newlines must be hidden by a back-
slash character (`\') immediately preceding the new-
line. The argument is terminated by the first
unhidden newline (the first one not immediately pre-
ceded by backslash). Once an a function is success-
fully executed, will be written to the output
regardless of what later commands do to the line which
triggered it. The triggering line may be deleted
entirely; will still be written to the output.
The is not scanned for address matches, and no
editing commands are attempted on it. It does not
cause any change in the line-number counter.
(1)i\
-- insert lines
The i function behaves identically to the a function,
except that is written to the output before the
matched line. All other comments about the a function
apply to the i function as well.
(2)c\
-- change lines
The c function deletes the lines selected by its
address(es), and replaces them with the lines in
. Like a and i, c must be followed by a newline
hidden by a backslash; and interior new lines in
must be hidden by backslashes. The c command may have
two addresses, and therefore select a range of lines.
If it does, all the lines in the range are deleted, but
only one copy of is written to the output, not
one copy per line deleted. As with a and i, is
not scanned for address matches, and no editing com-
mands are attempted on it. It does not change the
line-number counter. After a line has been deleted by
a c function, no further commands are attempted on the
corpse. If text is appended after a line by a or r
functions, and the line is subsequently changed, the
text inserted by the c function will be placed before
the text of the a or r functions. (The r function is
described in Section 3.4.)
Note: Within the text put in the output by these functions, lead-
ing blanks and tabs will disappear, as always in sed commands.
To get leading blanks and tabs into the output, precede the first
desired blank or tab by a backslash; the backslash will not
appear in the output.
Example:
The list of editing commands:
n
a\
XXXX
d
applied to our standard input, produces:
In Xanadu did Kubhla Khan
XXXX
Where Alph, the sacred river, ran
XXXX
Down to a sunless sea.
In this particular case, the same effect would be produced by
either of the two following command lists:
n n
i\ c\
XXXX XXXX
d
3.2. Substitute Function
One very important function changes parts of lines selected by a
context search within the line.
(2)s -- substitute The s
function replaces part of a line (selected by ) with . It can best be read:
Substitute for , The
argument contains a pattern, exactly like the
patterns in addresses (see 2.2 above). The only dif-
ference between and a context address is that
the context address must be delimited by slash (`/')
characters; may be delimited by any character
other than space or newline. By default, only the
first string matched by is replaced, but see
the g flag below. The argument begins
immediately after the second delimiting character of
, and must be followed immediately by another
instance of the delimiting character. (Thus there are
exactly three instances of the delimiting character.)
The is not a pattern, and the characters
which are special in patterns do not have special mean-
ing in . Instead, other characters are
special:
& is replaced by the string matched by
\d (where d is a single digit) is replaced by the
dth substring matched by parts of
enclosed in `\(' and `\)'. If nested sub-
strings occur in , the dth is deter-
mined by counting opening delimiters (`\(').
As in patterns, special characters may be
made literal by preceding them with backslash
(`\').
The argument may contain the following flags:
g -- substitute for all (non-
overlapping) instances of in the
line. After a successful substitution, the
scan for the next instance of
begins just after the end of the inserted
characters; characters put into the line from
are not rescanned.
p -- print the line if a successful replacement
was done. The p flag causes the line to be
written to the output if and only if a sub-
stitution was actually made by the s func-
tion. Notice that if several s functions,
each followed by a p flag, successfully sub-
stitute in the same input line, multiple
copies of the line will be written to the
output: one for each successful substitution.
w -- write the line to a file if a suc-
cessful replacement was done. The w flag
causes lines which are actually substituted
by the s function to be written to a file
named by . If exists
before sed is run, it is overwritten; if not,
it is created. A single space must separate
w and . The possibilities of
multiple, somewhat different copies of one
input line being written are the same as for
p. A maximum of 10 different file names may
be mentioned after w flags and w functions
(see below), combined.
Examples:
The following command, applied to our standard input,
s/to/by/w changes
produces, on the standard output:
In Xanadu did Kubhla Khan
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless by man
Down by a sunless sea.
and, on the file `changes':
Through caverns measureless by man
Down by a sunless sea.
If the nocopy option is in effect, the command:
s/[.,;?:]/*P&*/gp
produces:
A stately pleasure dome decree*P:*
Where Alph*P,* the sacred river*P,* ran
Down to a sunless sea*P.*
Finally, to illustrate the effect of the g flag, the command:
/X/s/an/AN/p
produces (assuming nocopy mode):
In XANadu did Kubhla Khan
and the command:
/X/s/an/AN/gp
produces:
In XANadu did Kubhla KhAN
3.3. Input-output Functions
(2)p -- print The print function writes the addressed lines
to the standard output file. They are written at the
time the p function is encountered, regardless of what
succeeding editing commands may do to the lines.
(2)w -- write on The write function
writes the addressed lines to the file named by . If the file previously existed, it is overwrit-
ten; if not, it is created. The lines are written
exactly as they exist when the write function is
encountered for each line, regardless of what subse-
quent editing commands may do to them. Exactly one
space must separate the w and . A maximum of
ten different files may be mentioned in write functions
and w flags after s functions, combined.
(1)r -- read the contents of a file The read
function reads the contents of , and appends
them after the line matched by the address. The file
is read and appended regardless of what subsequent
editing commands do to the line which matched its
address. If r and a functions are executed on the same
line, the text from the a functions and the r functions
is written to the output in the order that the func-
tions are executed. Exactly one space must separate
the r and . If a file mentioned by a r func-
tion cannot be opened, it is considered a null file,
not an error, and no diagnostic is given.
NOTE: Since there is a limit to the number of files that can be
opened simultaneously, care should be taken that no more than ten
files be mentioned in w functions or flags; that number is
reduced by one if any r functions are present. (Only one read
file is open at one time.)
Examples
Assume that the file `note1' has the following contents:
Note: Kubla Khan (more properly Kublai Khan;
1216-1294) was the grandson and most eminent successor
of Genghiz (Chingiz) Khan, and founder of the Mongol
dynasty in China.
Then the following command:
/Kubla/r note1
produces:
In Xanadu did Kubla Khan
Note: Kubla Khan (more properly Kublai Khan;
1216-1294) was the grandson and most eminent successor
of Genghiz (Chingiz) Khan, and founder of the Mongol
dynasty in China.
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
3.4. Multiple Input-line Functions
Three functions, all spelled with capital letters, deal specially
with pattern spaces containing imbedded newlines; they are
intended principally to provide pattern matches across lines in
the input.
(2)N -- Next line The next input line is appended to the
current line in the pattern space; the two input lines
are separated by an imbedded newline. Pattern matches
may extend across the imbedded newline(s).
(2)D -- Delete first part of the pattern space Delete up to
and including the first newline character in the cur-
rent pattern space. If the pattern space becomes empty
(the only newline was the terminal newline), read
another line from the input. In any case, begin the
list of editing commands again from its beginning.
(2)P -- Print first part of the pattern space Print up to
and including the first newline in the pattern space.
The P and D functions are equivalent to their lower-case counter-
parts if there are no imbedded newlines in the pattern space.
3.5. Hold and Get Functions
Four functions save and retrieve part of the input for possible
later use.
(2)h -- hold pattern space The h functions copies the con-
tents of the pattern space into a hold area (destroying
the previous contents of the hold area).
(2)H -- Hold pattern space The H function appends the con-
tents of the pattern space to the contents of the hold
area; the former and new contents are separated by a
newline.
(2)g -- get contents of hold area The g function copies the
contents of the hold area into the pattern space
(destroying the previous contents of the pattern
space).
(2)G -- Get contents of hold area The G function appends the
contents of the hold area to the contents of the pat-
tern space; the former and new contents are separated
by a newline.
(2)x -- exchange The exchange command interchanges the con-
tents of the pattern space and the hold area.
Example
The commands
1h
1s/ did.*//
1x
G
s/\n/ :/
applied to our standard example, produce:
In Xanadu did Kubla Khan :In Xanadu
A stately pleasure dome decree: :In Xanadu
Where Alph, the sacred river, ran :In Xanadu
Through caverns measureless to man :In Xanadu
Down to a sunless sea. :In Xanadu
3.6. Flow-of-Control Functions
These functions do no editing on the input lines, but control the
application of functions to the lines selected by the address
part.
(2)! -- Don't The Don't command causes the next command
(written on the same line), to be applied to all and
only those input lines not selected by the adress part.
(2){ -- Grouping The grouping command `{' causes the next
set of commands to be applied (or not applied) as a
block to the input lines selected by the addresses of
the grouping command. The first of the commands under
control of the grouping may appear on the same line as
the `{' or on the next line.
The group of commands is terminated by a matching `}'
standing on a line by itself.
Groups can be nested.
(0):