The include statement | ||
---|---|---|
The $LINES and
$SIZE environment |
||
Partial matches using the ! operator | ||
Short-circuit evaluation using logical operators | ||
The foreach
statement |
||
The $HOME/.tmp
directory |
$HOME/.mailfilter
, or some
other file containing mail filtering instructions. Before doing anything
else, this entire file is read and semi-compiled. Any grammatical or syntax
errors will result in maildrop terminating with a temporary error code,
without doing anything with the message.
The entire file must be 'semi-compiled". The 'semi-compilation' process consists of building an logical representation of the filtering statements in memory.
If you have a long and complicated set of instructions, it will be
semi-compiled whether or not it will actually be executed. For example,
consider the following code:
if ( /^Subject: rosebud/ ) { ... some really long set of instructions ... }If the contents of the "
if
" construct are large, maildrop may
spend considerable amount of time and memory semi-compiling filtering
instructions that may never be used. Consider putting all these instructions
into a separate file, and using the include
statement to execute
them.
The include
statement is processed only when it is actually
executed by maildrop. That carries certain side effects. One feature of
maildrop is the semi-compilation which weeds out errors in the filtering
recipe file, holding all mail and not doing anything until the errors are
fixed. The include
statement is not processed until it is
actually executed. If there are errors in the file referenced by the
include
statement, maildrop will not know about them until it
executes the include
statement. When maildrop detects the error,
it will terminate with a temporary error code, holding all mail. However, any
filtering instructions before the include
statement would've
already been executed.
$LINES
and
$SIZE
environment variables.Scanning headers for patterns will take the same amount of time whether the body of the message has 100 or 1,000 lines. However, the "b" option causes the message body to be searched. If someone sends you a 5 megabyte binary file, each pattern matched against the message body will take a significant time to complete.
Whenever possible, the first thing you should do is check both variables to see if the message is excessively large. If so, divert the message to a separate mailbox, or reject it.
Weighted scoring is especially sensitive to the size of the message being delivered. Normally, as soon as a pattern match is found, maildrop terminates the pattern scan. If weighted scoring is used then the pattern scan is immediately restarted, until the end of the message is reached.
A fixed overhead is required to start a pattern scan going, so that
weighted scoring that matches a very short pattern, like
/[:lower:]/:Dbw,1,1
will take a significant amount of CPU time
to complete. Maildrop is simply not optimized for these kinds of
situations.
Nevertheless, on a Pentium 200, /[:lower:]/:Dbw,1,1 perform reasonably well for messages less than 60K long.
Here's an example of using $LINES
and $SIZE
to
divert large messages to a separate mailbox:
if ($LINES > 1000 || $SIZE > 60000) { to "mail/IN.large" } . . .
Whenever the first pattern is succesfully matched, maildrop runs the second pattern. If the second pattern does not match, maildrop resumes matching the first pattern until it matches again.
Since starting a pattern match involves some fixed overhead, minimizing the number of times the first pattern will match will reduce the amount of time it takes to match the entire pattern.
For example: you're trying to extract an IP address from the first Received header. Here's your first attempt at doing so:
/^Received:.*![0-9\.]+/ IP=$MATCH2This will not work at all. From the manual page for maildropfilter:
When there is more than one way to match a string, maildrop favors matching as much as possible initially.
So this fragment will set IP to the last digit, or period found in the first
Received: header, because the first half of the pattern will match as much as
possible.
Here's your second attempt, then:
/^Received:.*[^0-9\.]![0-9\.]+/ IP=$MATCH2This will work, except the first half of the pattern will end up matching every character in the Received: header that's not a digit or a period. Maildrop will then stop and try to run the second half of the pattern.
Your mail server will probably put a left bracket before the IP address of the relay. Noting that, the following code picks up the IP address as efficiently as possible:
/^Received:.*\[![0-9\.]+/ IP=$MATCH2
||
and &&
work in
maildrop just like they work in C. Consider the following expression:
if (/ ... some pattern ... / || / ... some other pattern ... /) { . . . }If the first pattern is found, maildrop will not execute the second pattern, since the logical expression will be true no matter what. You can use that to your advantage. If you have two patterns, and one of them takes more time to process, put the other first. Even if both patterns are relatively quick, if you expect one of them to be found almost every time, put that one first so that it will not be necessary to run the second pattern most of the time.
The same concept works for the &&
operator:
if (/ ... some pattern ... / && / ... some other pattern ... /) { . . . }If the first pattern is not found, maildrop will not execute the second pattern, since the logical expression will be false no matter what.
foreach
statementforeach
statement is implemented as follows. Maildrop will
execute the regular expression, and compile a list of all the patterns in the
message matched by the regular expression. Afterwards, maildrop will execute
the foreach
statement, once for each matched pattern.
It is important to note that maildrop will create a list in memory containing every pattern matched by the regular expression. if the regular expression is found many times in the message, this will require a lot of memory. For example, the following is a Very Bad Thing[tm]:
foreach /[:alpha:]/ { . . . }Maildrop does not include any built-in limits on resource usage, except for the watchdog timer. Therefore, as a system administrator, it is your responsibility to limit resources used by the maildrop process.
This implementation of the foreach
statement is the one that
yields the least number of surprises. Observe that the contents of the
foreach
construct can modify the message using the
xfilter
statement. Furthermore, the regular expression may
include variable substitution, and those variables could be modified by the
foreach
statement. If the foreach statement were to be executed
"on the fly", these factors would yield unpredictable - and rather messy -
results.
$HOME/.tmp
directory$HOME/.tmp
directory.
Temporary files are used to store large messages (instead of storing them in
memory). Temporary files may also be used by the xfilter command, or in other
situations.
Maildrop will automatically delete all temporary files before terminating.
However, if the maildrop process is killed, or terminates abnormally, a
temporary file may remain and take up disk space. To have those leftover
files automatically cleaned up, put the following code in
/etc/profile
, or some other file that all users run after
logging in:
find $HOME/.tmp -depth -mtime +7 \( \( \ -type d -exec rmdir {} \; \) -o -exec rm -f {} \; \) 2>/dev/null