The rewriting rules of Sendmail help your system check and correct an
electronic mail address before sending it to its final destination
By Bryan Costales
The Sendmail program is the mail-transfer software for many Unix systems,
but Sendmail's configuration file has a long and glorious history of being
difficult to understand, much less modify. Are Sendmail's rewriting rules
confusing to you? If they are, you're not alone. The rewriting rules--used
to rewrite mail headers, check for errors, and to select mail programs--don't
have to be all that mysterious. Compact, yes, but relatively simple once
you begin to understand them.
The rewriting rules have been variously described as resembling: modem
noise, Mr. Dithers swearing in the comic strip ``Blondie,'' and an explosion
in a punctuation factory. While these allusions are sadly true, they are
also, in reality, misleading. What appears confusing and complex is, in
reality, just succinct.
The Sendmail program parses (reads and processes) each rule every time
it reads its configuration file, sendmail.cf. Because that process
needs to be swift, rules have been designed to be easier for Sendmail to
parse than for you to understand.
Why Rules?
The rules are used to modify mail addresses, to detect errors in addressing,
and to select an appropriate means of mail delivery. Addresses need to
be modified because they can be specified in many ways yet are required
to be in specific forms for particular means of delivery.
To illustrate, consider the address friend@uuhost. If the machine
named ``uuhost'' were connected to yours over a dial-up line, the message
would likely be sent using UUCP software. That software requires addresses
to be expressed in UUCP form uuhost!friend. The Sendmail rewriting
rules control the transformation.
Another role for the rules is to detect (and reject) errors locally.
This filtering prevents errors from propagating over the network. Mail
to an address without a user name, such as @neighbor, is one such
error. It is better to detect this kind of error locally rather than having
the host ``neighbor'' reject it.
Sequences of rules are grouped together into rule sets. Each set is
similar to a subroutine. A rule set is declared with the ``S'' key letter,
which must begin a line in the Sendmail configuration file. For example,
``S0'' begins the declaration of the rules that forms rule set number 0.
Rule sets are numbered starting from 0, where sets 0 through 5 are internally
defined by Sendmail to have very specific purposes:
0 Resolve delivery agent
1 Process sender address
2 Process recipient address
3 Preprocess all addresses
4 Postprocess all addresses
5 Rewrite unaliased
Rule set definitions may appear in any order in the configuration file.
For example, rule set S5 may be defined first, followed by S2 and then
S7. The rule sets are gathered when the configuration file is read, and
they are sorted internally by Sendmail.
If a rule set is undefined, the result is the same as if it were defined
but had no rules associated with it. It is like a subroutine that contains
nothing but a ``return'' statement. It does nothing and produces no errors.
To observe the effect of rules that do nothing, create a three-line
configuration file named, say, x.cf [as shown in Listing
1A] and run Sendmail in rule-testing mode on that file with the command
shown [in Listing 1B].
The -bt command-line switch causes Sendmail to run in address-testing
mode. In this mode, Sendmail waits for you to type a rule set and an address.
It then shows you how the rule set ``rewrites'' the address. As Listing
1B shows, you enter an address by specifying a rule set number and then
a space and a mail address. The rule set specified is 0, but you can specify
any number.
The ``rewrite:'' designation that begins each line of address- testing-mode
output is simply there to highlight rewriting lines when they are mixed
with other kinds of debugging output. The ``input'' designation means that
Sendmail placed the address into the workspace (more about this later).
The ``returns'' designation shows the result after the rule set has rewritten
that address based on its rules.
The address that was fed to Sendmail (bob@here) was first split into
parts (tokens) based on the separating characters defined by the ``Do''
macro shown in Listing 1A, and 10 others defined internally by Sendmail,
namely: ( ) < > , ; \ " \r \n
For clarity, each token in Listing 1B was printed within full quotation
marks; however, some versions of Sendmail omit these marks. The ``input:''
line shows the three tokens passed to rule set 0. The ``returns:'' line
shows, because there is no rule set 0, that the undefined (empty) rule
set returns those tokens that make up the address unmatched and unchanged.
The example illustrates version 8 Sendmail. If you are running an old
version of Sendmail, two things will be different. First, the initial output
will not include the message ``(ruleset 3 NOT automatically invoked)'',
but will include two extra rewrite lines. Second, old versions of Sendmail
always assume you want to see the effect of rule set S3, whether you do
or not.
Rule Sets
Each rule set may contain any number of individual rules or none at all.
Rules begin with the ``R'' key letter and generally take the following
general form:
S0
Rlhs rhs
Rlhs rhs comment
The first line--the S0--declares the start of rule set 0. All the lines
after the S line that begin with R belong to that rule set. A new rule
set begins when another S line with a different number appears.
Each R line is an individual rule in a series of rules that form a rule
set. If you examine the Sendmail configuration file for almost any major
mail-handling site you'll see that any given rule set can have a huge number
of rules. But our hypothetical rule set 0 has only two rules and therefore
only two lines that begin with an R.
Each rule has two distinct parts, each divided from the other by one
or more tab characters. You can use space characters inside each part,
but you must use tabs to separate the parts.
The left-hand part of the rule is called the lhs for left-hand
side. Conversely, the right-hand part is denoted rhs. These two
form the rule. A comment may optionally follow the right-hand side, and,
if present, must be separated from it by one or more tab characters.
The left-hand and right-hand sides form a ``do while'' pair. As long
as the left-hand side evaluates to true, the right-hand is processed. If
the left-hand side evaluates false, Sendmail skips to the next rule for
that rule set.
The Workspace
Whether the left-hand side is true or false is determined by making comparisons.
When an address is processed for rewriting by a rule set, Sendmail first
separates the parts into tokens and stores those tokens internally in a
buffer called the ``workspace.''
When the left-hand side of a rule is evaluated, it is divided into tokens
and those are compared to the tokens in the workspace. If both the workspace
and the left-hand side contain exactly the same tokens, a match is found,
and the result of the left-hand side comparison is true. To illustrate,
in Listing 2A we've added two lines to the
end of our minimal configuration file, x.cf. Don't forget that
the three parts of the rule are separated from each other by tab characters.
This example creates a ``demo'' rule set that illustrates a few introductory
concepts about rules.
Now run Sendmail in rule-testing mode, as shown in Listing
2B. As we did in Listing 1B, enter rule set 0 and a typical e-mail
address at the prompt. Notice that nothing was rewritten, even though there
is a rule set 0 and a rule in our sample configuration file. Remember that
a rule is only rewritten if the workspace and the left-hand side exactly
match. For the demo rule, they do not match (see Figure
1).
Enter the exact text that appears in the left-hand side of the demo
rule at the prompt (see Listing 2C). An amazing
thing happens. The rule has actually rewritten an address. The address
``left.side'' was given to rule set 0 and was rewritten by the rule in
that rule set to become the address ``new.stuff''. This transformation
was possible because the workspace and the left-hand side exactly matched
each other, so the result of the left-hand side comparison was true.
Before leaving this demo rule set, perform one final experiment. Enter
the text ``left.side'' again, but this time change the case of the letters
to upper case. Notice that the workspace and the left-hand side still match,
even though they now differ by case. This example illustrates that all
comparisons between the workspace and the left-hand side of rules are done
in a case-insensitive manner. This property enables rules that solve complex
problems to be written without the need to distinguish between upper- and
lower-case letters.
The Flow of Addresses Through Rules
When rule sets contain many rules, the ``flow'' is from the first through
the last rule (top to bottom), in the order they are declared in the configuration
file.
To illustrate, modify the two demo lines you added to the sample configuration
file, replacing them with the three new demo rules shown in Listing
3. There are only two parts to each rule (the comment is missing).
Before you test these new rules, consider what they do. The first rule
rewrites any ``x'' in the workspace into a ``y''. The second rule rewrites
any ``y'' in the workspace into a ``z''. And the last rule rewrites any
``z'' that it finds in the workspace into an ``a''.
Now run Sendmail in rule-testing mode once again, and, one at a time,
enter rule set 0 and one of the letters ``x'', ``y'', and ``z''. No matter
which of ``x'', ``y'', or ``z'' you enter, each is rewritten into ``a'',
illustrating the ``flow'' of addresses (the workspace) through rules.
Let's look in detail at what is going on by examining the input. Follow
along with Figure 2. When you first enter rule
set 0, the first rule of that rule set tries to match its left-hand side
to the workspace; the left-hand side exactly matches the workspace, so
the right-hand side rewrites the workspace so that ``x'' is replaced by
``y''.
Now the next rule tries to match its left-hand side to the workspace.
But what is contained in the workspace has been rewritten by the first
rule. The key point here is that each rule compares its left-hand side
to the current contents of the workspace, even though they may have been
rewritten by earlier rules. It should now be clear why all three letters
are rewritten to ``a'' (see Figure 3). Now
feed one more letter into Sendmail in rule-testing mode. This time enter
anything other than an ``x'', ``y'', or ``z'', say the letter ``b''. Notice
that the workspace remains unchanged because ``b'' did not match of the
left-hand sides in any of the three rules. If the left-hand side of a rule
fails to match the workspace, that rule is skipped, and the workspace remains
unchanged.
Operators Versus the Workspace
Rules would be pretty useless if they always had to match the workspace
exactly. Fortunately, that is not the case; in addition to literal text,
you can also use operators. Operators are like wild cards in that they
allow the left-hand side of rules to match arbitrary text in the workspace.
To illustrate, look at Figure 4. The left-hand
side begins with the first character following the ``R'' key letter. The
left-hand side in Figure 4 is the operator, $+, the truth of which
is determined by a process called pattern matching. The left-hand side
$+ (a single operator) is a pattern that means ``match one or
more tokens.''
The address being evaluated is separated into tokens, placed into the
workspace (see Figure 5), and then the workspace
is compared to that pattern. When matching the workspace to a left-hand
side pattern, Sendmail scans the workspace from left to right. Each token
in the workspace is compared to the operator ($+) in the left-hand
side pattern. If the tokens all match the pattern, the left-hand side is
true.
The $+ operator simply matches any one or more tokens. As you
can see, if there are any tokens in the address at all (the workspace
is not empty), the left-hand side rule $+ evaluates to true.
A rule using $+ on the left-hand side is not sufficient to
handle all possible addresses, especially bad addresses (see Figure
6). To make matching in the left-hand side more effective, Sendmail
allows literal text to appear in the pattern. To make sure that the address
in the workspace contains a user part and a host part separated by the
@ character, the left-hand side pattern $+@$+ can be
used. Just like the address in the workspace, this pattern is separated
into tokens before it is compared for a match. Operators (like $+)
are handled individually, and the @ is a token because it is a
separator character defined by the ``Do'' macro definition.
The $+@$+ pattern is separated into three tokens: $+,
@, and $+. Text in the pattern must match text in the
workspace exactly, token for token, if there is to be a match. A good address
in the workspace--one containing a user part and a host part--will match
our new left-hand side ($+@$+) as shown in Figure
7. The ``flow'' of matching begins with the first $+, which
matches one token of the one or more tokens in the workspace. The @
matches the identical token in the workspace. At this point, the $+@
part of the pattern has been satisfied. All that remains is for the final
$+ to match one or more of all the remaining tokens in the workspace.
But a bad address in the workspace will not match. For example, consider
an address that lacks a user name (as shown in Figure
8). The first $+ incorrectly matches the @ in the
workspace. Because there is no other @ in the workspace to be
matched by the @ in the pattern, the first $+ matches
the entire workspace. Because there is nothing left in the workspace, the
attempt to match the @ fails.
When any part of a pattern fails to match the workspace, the entire
left-hand side fails. One small bit of confusion may yet remain. When an
operator like $+ is used to match the workspace, Sendmail always
does a minimal match. That is, it only matches what it needs to for the
next part of the rule to work. Consider a left-hand side of R$+@$+,
in which the first $+ matches everything in the workspace up to
the first @ character in the workspace. For example, for a workspace
of a@b@c, the $+@ causes the $+ to match only
the characters--the a--up to the first @ character. This character
is the minimum that needs to be matched, and so it is the maximum that
will be matched.
More Play With Left-Hand Side Matching
Take a moment to revise the sample Sendmail configuration file as shown
(Listing 4). I've given each temporary right-hand
side a number to see whether it is selected. The $@ in front of
each right-hand side prevents any successful rewrite being carried to any
subsequent rules. Now run Sendmail in rule-testing mode again. The first
address to specify is an ``@'' which returns one. The ``@'' causes the
first right-hand side to be selected. The left-hand side--the pattern to
match--contains the lone @. That pattern matched the tokenized
workspace @ exactly, so the right-hand side for that rule is returned.
No other rules are called because of the $@ prefix.
Next enter an address that contains just a host and domain part, but
not a user part, something like ``@host.domain''. The first thing to notice
is what was not printed! The workspace does not match the pattern of the
first rule. But instead of returning an error, the workspace is carried
down as is to the next rule, where it does match.
Now enter an address that fails to match the first two rules but successfully
matches the third, something like ``user@host.domain''. The flow for this
address is shown in Figure 9.
The fourth rule contains the original lone $+, which is there
to catch any addresses that slip past the first three. Go ahead and test
it. Try addresses like your log-in name or UUCP addresses like ``user@host.uucp''
and ``host!user''. Can you predict what will happen with weird addresses
like ``@@'' or ``a@b@c''?
Other Operators
A single operator, the $+, allowed a rule set to be designed with
four rules. Far more complex rule sets become possible when you take advantage
of Sendmail's other left-hand side operators. Here's a list:
$@ Exactly none
$* Zero or more
$+ One or more
$- Exactly one
But the story doesn't end here. In this article you've been given a glimpse
of how Sendmail's rules work. In all the listings, I've shown only ordinary,
literal text in the right- hand side. The power of Sendmail lies in its
use of operators in the right-hand side to rewrite addresses in complex
and sophisticated ways. The right-hand side operators are:
$: Rewrite once (prefix)
$@ Return (prefix)
$digit Copy by position
$( Database lookup
$[ Name canonicalization
Clearly, there is not enough room in this tutorial to go over all the possible
Sendmail rewriting rules. And the rewriting rules are only a part of Sendmail.
The Sendmail program is a very flexible tool, and its configuration file
reflects this flexibility by its complexity. Still, this tutorial hopefully
has shown that you can understand Sendmail's configuration file, and encouraged
you to continue exploring.
|