Installing ifile-1.1.5


$Id: using.html,v 1.5 2003/02/18 20:45:43 jrennie Exp $

   Installing and using ifile

General information page: http://people.csail.mit.edu/jrennie/ifile.


   Configuration and installation of ifile

Configure:


        me% CC=gcc CFLAGS=-O ./configure
        creating cache ./config.cache
        checking for gcc... gcc
        checking whether the C compiler (gcc -O ) works... yes
        checking whether the C compiler (gcc -O ) is a cross-compiler... no
        checking whether we are using GNU C... yes
        checking whether gcc accepts -g... yes
        checking for a BSD compatible install... /usr/bin/install -c
        checking for ranlib... ranlib
        checking for strchr... yes
        checking how to run the C preprocessor... gcc -E
        checking for alloca.h... no
        checking for perl... /usr/bin/perl
        updating cache ./config.cache
        creating ./config.status
        creating Makefile
        configuring in argp
        running /bin/sh ./configure  --cache-file=.././config.cache --srcdir=.
        loading cache .././config.cache
        checking for gcc... (cached) gcc
        checking whether the C compiler (gcc -O ) works... yes
        checking whether the C compiler (gcc -O ) is a cross-compiler... no
        checking whether we are using GNU C... (cached) yes
        checking whether gcc accepts -g... (cached) yes
        checking how to run the C preprocessor... (cached) gcc -E
        checking for a BSD compatible install... (cached) /usr/bin/install -c
        checking for ranlib... (cached) ranlib
        checking for getopt.h... no
        checking for getopt_long... no
        checking for strerror... yes
        checking for strndup... no
        checking for ANSI C header files... yes
        checking for ssize_t... yes
        checking for memmove... yes
        checking for vsnprintf... yes
        checking for strerror... (cached) yes
        checking for strings.h... yes
        checking if vsprintf returns int... yes
        checking program_invocation_name... no
        checking for ANSI C header files... (cached) yes
        checking for string.h... yes
        checking for memory.h... yes
        updating cache .././config.cache
        creating ./config.status
        creating Makefile

Abbreviations in make output:


        $flags = -I. -I./include -I./argp -DHAVE_STRCHR=1 -O

        $argflags = -I. -DHAVE_STRERROR=1 -DSTDC_HEADERS=1
            -DHAVE_MEMMOVE=1 -DHAVE_VSNPRINTF=1 -DHAVE_STRERROR=1
            -DHAVE_STRINGS_H=1 -DVSPRINTF_RETURNS_INT=1 -DSTDC_HEADERS=1
            -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -O

Make:


        me% make
        gcc -c $flags -o database.o database.c
        gcc -c $flags -o error.o error.c
        gcc -c $flags -o hash_table.o hash_table.c
        gcc -c $flags -o int4str.o int4str.c
        int4str.c: In function `ifile_int4str_free_contents':
        int4str.c:382: warning: passing arg 1 of `free' discards
            qualifiers from pointer target type
        gcc -c $flags -o istext.o istext.c
        gcc -c $flags -o lex-define.o lex-define.c
        gcc -c $flags -o lex-email.o lex-email.c
        gcc -c $flags -o lex-indirect.o lex-indirect.c
        gcc -c $flags -o lex-simple.o lex-simple.c
        gcc -c $flags -o opts.o opts.c
        gcc -c $flags -o primes.o primes.c
        gcc -c $flags -o scan.o scan.c
        gcc -c $flags -o stem.o stem.c
        gcc -c $flags -o stoplist.o stoplist.c
        gcc -c $flags -o stopwords.o stopwords.c
        gcc -c $flags -o util.o util.c
        ar rc libifile.a database.o error.o hash_table.o int4str.o istext.o
            lex-define.o lex-email.o lex-indirect.o lex-simple.o opts.o
            primes.o scan.o stem.o stoplist.o stopwords.o util.o
        ranlib libifile.a

        cd argp ; make libargp.a
        make[1]: Entering directory `/pub/src/mail/spam/ifile-1.1.5/argp'
        gcc -c $argflags -o argp-ba.o argp-ba.c
        gcc -c $argflags -o argp-fmtstream.o argp-fmtstream.c
        gcc -c $argflags -o argp-fs-xinl.o argp-fs-xinl.c
        gcc -c $argflags -o argp-help.o argp-help.c
        gcc -c $argflags -o argp-parse.o argp-parse.c
        gcc -c $argflags -o argp-pv.o argp-pv.c
        gcc -c $argflags -o argp-pvh.o argp-pvh.c
        gcc -c $argflags -o argp-xinl.o argp-xinl.c
        gcc -c $argflags -o argp.o argp.c
        gcc -c $argflags -o pin.o pin.c
        gcc -c $argflags -o strndup.o strndup.c
        gcc -c $argflags -o getopt.o getopt.c
        gcc -c $argflags -o getopt1.o getopt1.c
        ar rc libargp.a argp-ba.o argp-fmtstream.o argp-fs-xinl.o
            argp-help.o argp-parse.o argp-pv.o argp-pvh.o argp-xinl.o
            argp.o pin.o strndup.o getopt.o getopt1.o
        ranlib libargp.a

        make[1]: Leaving directory `/pub/src/mail/spam/ifile-1.1.5/argp'
        gcc -c -I. -I./include -I./argp -DHAVE_STRCHR=1 -O -o ifile.o ifile.c
        gcc -O ifile.o -o ifile -L. -lifile -L./argp -largp -lm
        rm -f ifilter.mh
        cat ifilter.mh.pl | sed -e 's,/usr/bin/perl,/usr/bin/perl,' > ifilter.mh
        chmod a+x ifilter.mh
        rm -f irefile.mh
        cat irefile.mh.pl | sed -e 's,/usr/bin/perl,/usr/bin/perl,' > irefile.mh
        chmod a+x irefile.mh
        rm -f knowledge_base.mh
        cat knowledge_base.mh.pl | sed -e
            's,/usr/bin/perl,/usr/bin/perl,' > knowledge_base.mh
        chmod a+x knowledge_base.mh
        rm -f news2mail
        cat news2mail.pl | sed -e 's,/usr/bin/perl,/usr/bin/perl,' > news2mail
        chmod a+x news2mail

"make install" gives the following installed files, if configured with the usual prefix /usr/local:


        /usr/local/bin
        +-----ifile
        +-----ifilter.mh
        +-----irefile.mh
        +-----knowledge_base.mh
        +-----news2mail
        |
        /usr/local/man
        +-----man1
        |      +-----ifile.1

To clean and remove configuration stuff:


        me% make clean
        me% rm -f config.cache config.log config.status Makefile


   Creating a spam database

When I create a spam database for ifile, I use these sources:

I tried using the collections at http://www.iit.demokritos.gr/~ionandr/pu1_encoded.tar.gz and http://www.iit.demokritos.gr/~ionandr/lingspam_public.tar.gz, but both were useless; they were modified to the point where I couldn't recreate the original mail messages.

I use a program called spmail to split an mbox-formatted file into one or more directories containing one message per file.  Directories are numbered starting at 1, and messages are numbered within each directory from 000 or 001-999.  This speeds up the process, because ifile can accept multiple messages on one command line when creating its database.

I delete any Chinese/Korean spam, because I don't need ifile to handle it; I can either use procmail or a much smaller program called hibits to see if a given message has lots of 8-bit characters.

Here's my directory setup:


        +-----00-PRIVATE
        |      +-----0
        |      |      +-----001
        |      |      +-----002
        |      |      +-----003
        |      |      +-----004
        |      |      +-----005
        ...
        |      |      +-----995
        |      |      +-----996
        |      |      +-----997
        |      |      +-----998
        |      |      +-----999
        |      +-----1
        |      |      +-----000
        |      |      +-----001
        ...
        |      |      +-----998
        |      |      +-----999
        |      +-----2                (1000 messages, same as 1)
        |      +-----3                ...
        ...
        |      +-----13                ...
        |      +-----14                
        |      |      +-----000
        |      |      +-----001
        ...
        |      |      +-----841
        |      |      +-----842
        |
        +-----00-SPAM-MESSAGES
        |      +-----ifile
        |      |      +-----idata            Spam plus my private messages
        |      |      +-----idata.spamonly   Spam only
        |      |      +-----mkifiledb        Script to create idata files
        |
        |      +-----misc
        |      |      +-----0                Graham's DB
        |      |      |      +-----001
        |      |      |      +-----002
        ...
        |      |      |      +-----998
        |      |      |      +-----999
        |      |      +-----1
        |      |      +-----2
        |
        |      |      +-----3                My personal spam
        |      |      |      +-----001
        |      |      |      +-----002
        ...
        |      |      |      +-----998
        |      |      |      +-----999
        |      |      +-----4
        |      |      +-----5
        |
        |      +-----net-abuse                news.admin.net-abuse.sightings
        |      |      +-----0
        |      |      |      +-----001
        |      |      |      +-----002
        ...
        |      |      |      +-----998
        |      |      |      +-----999
        |      |      +-----1
        |      |      |      +-----000
        |      |      |      +-----001
        ..
        |      |      |      +-----998
        |      |      |      +-----999
        |      |      +-----2                (1000 messages)
        |      |      +-----3                ...
        ...
        |      |      +-----26
        |      |      +-----27
        |      |      +-----28
        |      |      +-----29                ...
        |      |      +-----30                ...
        |      |      |      +-----000
        |      |      |      +-----001
        ..
        |      |      |      +-----684
        |      |      |      +-----685
        |
        |      +-----uk-corpus                UK junk email corpus
        |      |      +-----1
        |      |      |      +-----001
        |      |      |      +-----002
        ...
        |      |      |      +-----672
        |      |      |      +-----673

I use the following ifile options when making the "spamonly" database:


        -h, --strip-header    Skip all of the header lines except Subject:,
                              From: and To:

        -i, --insert=FOLDER   Add the statistics for each of FILES to the
                              category FOLDER

Here's the script, called "mkifiledb".

I've found that adding spam messages repeatedly can sometimes improve ifile's accuracy, so the script allows you to specify how often you want to add messages from a given spam group.  Non-spam messages are only added once.


        #!/bin/ksh
        #
        # Id: mkifiledb,v 1.6 2002/11/04 22:06:51 vogelke Exp
        # Source: /src/mail/spam/00-SPAM-MESSAGES/ifile/RCS/mkifiledb,v
        #
        # create ifile database from known spam and non-spam messages.

        PATH=/usr/local/bin:/bin:/usr/bin; export PATH
        umask 022

        cwd=`pwd`
        top=/pub/src/mail/spam
        dbdir=$top/00-SPAM-MESSAGES/ifile
        dbfile=idata

        test -f $dbdir/$dbfile && rm $dbfile

        # Accept number of passes, a classification, and a list of numeric
        # subdirectories.  Add each file within each subdirectory to our
        # ifile db with the given classification.
        #
        # Using "ifile -h -k -i" gives *huge* .idata file, no real gain.

        readmail () {
            passes=$1
            class=$2
            shift
            shift

            for f in $*
            do
                if test -d $f; then
                    k=1
                    while test $k -le $passes; do
                        echo pass $k: $class $PWD/$f
                        ifile -b $dbdir/$dbfile -h -i $class $f/*
                        k=`expr $k + 1`
                    done

                    (cd $dbdir && ls -l $dbfile)
                fi
            done
        }

        cd $top

        # local spam.
        (
            cd 00-SPAM-MESSAGES/local
            list=`/bin/ls -d [0-9]* | sort -n`
            readmail 1 spamlocal $list
        )

        # Nigerian/African fraud.
        (
            cd 00-SPAM-MESSAGES/fraud
            list=`/bin/ls -d [0-9]* | sort -n`
            readmail 1 spamfraud $list
        )

        # Credit repair.
        (
            cd 00-SPAM-MESSAGES/credit
            list=`/bin/ls -d [0-9]* | sort -n`
            readmail 1 spamcredit $list
        )

        # Diplomas.
        (
            cd 00-SPAM-MESSAGES/diploma
            list=`/bin/ls -d [0-9]* | sort -n`
            readmail 1 spamdiploma $list
        )

        # Drivers license.
        (
            cd 00-SPAM-MESSAGES/license
            list=`/bin/ls -d [0-9]* | sort -n`
            readmail 1 spamlicense $list
        )

        # gtaylor collection.
        (
            cd 00-SPAM-MESSAGES/gtaylor
            list=`/bin/ls -d [0-9]* | sort -n`
            readmail 1 spamgt $list
        )

        # UK spam.
        (
            cd 00-SPAM-MESSAGES/uk-corpus
            list=`/bin/ls -d [0-9]* | sort -n`
            readmail 1 spamuk $list
        )

        ## net-abuse spam.
        #(
        #    cd 00-SPAM-MESSAGES/net-abuse
        #    list=`/bin/ls -d [0-9]* | sort -n`
        #    readmail 1 spamnet $list
        #)

        # keep a copy of the junk-only database.
        # make 1 pass through valid messages.
        cp $dbdir/$dbfile $dbdir/$dbfile.spamonly

        (
            cd 00-PRIVATE
            list=`/bin/ls -d [0-9]* | sort -n`
            readmail 1 good $list
        )

        exit 0

mkifiledb takes about an hour to run on a Pentium-133. Output files are roughly this size:


        +-----00-SPAM-MESSAGES
        |      +-----ifile
        |      |      +----- 587799 Feb  3 16:41 idata
        |      |      +----- 391994 Feb  3 16:38 idata.spamonly

The "idata" file is periodically copied to $HOME/.idata.

The file containing only spam results is available here: idata.spamonly


   Multiple spam categories

Something about the Nigerian bank-account scams isn't close enough to regular spam to trip the filter, so I set up some other spam categories:

I can get a nice sample of a given spam category (in this case, fraud) by creating a small small idata file using known fraud messages plus a few valid messages, and then doing this:


        me% cat findfraud
        #!/bin/sh
        # findfraud -- look through net-abuse messages, find anything
        # that looks like a bank-scam.

        find ../net-abuse/? ../net-abuse/?? -type f -print |
            xargs ifile -b idata -q -c | grep fraud
        exit 0

        me% ./findfraud
        ../net-abuse/0/204 spamfraud
        ../net-abuse/0/275 spamfraud
        ../net-abuse/0/431 spamfraud
        ../net-abuse/0/450 spamfraud
        ../net-abuse/0/470 spamfraud

        me% ./findfraud | awk '{print $1}' > list

This gives me a list of files that are highly likely to be fraud-related.  I can narrow it down further by using a script that lets me quickly classify these messages by hand:


        me% cat quicklook
        #!/bin/sh
        # quicklook -- display each message, and ask if it fits the
        # description.  If so, echo "msgno" to the logfile.
        # Otherwise, echo nothing to the logfile.
        
        PATH=/bin:/usr/bin:/usr/local/bin
        export PATH
        
        logfile="fraud"
        touch $logfile
        
        case "$#" in
            1)  list="$1" ;;
            *)  echo "usage: $0 list"; exit 1 ;;
        esac
        test -f $list || exit 2
        
        for msg in `cat $list`
        do
            head -25 $msg
            ans=`grabchars -q'(f)raud, (n)ext, (q)uit: '`
        
            case "$ans" in
                f) echo "fraud" 1>&2; echo $msg >> $logfile ;;
                n) echo "next" 1>&2 ;;
                q) echo "done" 1>&2; exit 0 ;;
                *) echo "${ans}?  Please answer y or n." 1>&2 ;;
            esac
        done
        
        exit 0

        me% ./quicklook list

The "quicklook" script accepts a list of messages identified by "findfraud" and shows me the first 25 lines of each one.  I can use one keystroke to classify the message as fraud and move on, move to the next message, or quit.

I used this script to split up my current spam collection into smaller batches:


   Processing incoming mail

I usually process incoming mail in batches of a few hundred messages at a time, depending on how often I collect it.  Processing takes place in four stages:

The stages are arranged so that the fastest processing takes place first, and the more computationally expensive stuff like ifile or procmail is only run when most of the mail has already been dealt with.

Here's a sample run for 139 messages.  I don't like keeping my mailbox locked for any length of time, and I don't like endlessly locking/unlocking it, so I generally make a copy if lots of mail has built up:


        me% ls inbox
        ls: inbox: No such file or directory
        
        me% lockfile -ml
        me% cp $MAIL inbox
        me% cp /dev/null $MAIL
        me% lockfile -mu      

Break up the inbox using spmail:


        me% ls data
        ls: data: No such file or directory
        
        me% spmail inbox
        me% rm inbox
        
        me% ls -R data
        inbox/
        
        data/inbox:
        0/
        
        data/inbox/0:
        001     017     033     049     065     081     097     113     129
        002     018     034     050     066     082     098     114     130
        003     019     035     051     067     083     099     115     131
        004     020     036     052     068     084     100     116     132
        005     021     037     053     069     085     101     117     133
        006     022     038     054     070     086     102     118     134
        007     023     039     055     071     087     103     119     135
        008     024     040     056     072     088     104     120     136
        009     025     041     057     073     089     105     121     137
        010     026     042     058     074     090     106     122     138
        011     027     043     059     075     091     107     123     139
        012     028     044     060     076     092     108     124
        013     029     045     061     077     093     109     125
        014     030     046     062     078     094     110     126
        015     031     047     063     079     095     111     127
        016     032     048     064     080     096     112     128

Filter the inbox using a script called "runmail".  This script saves its output to a tmp file and runs "tail -f" on that file, so I exit with control-c:


        me% cd data/inbox/0/
        me% runmail
        saving 3 whitelisted messages
        042: not seen
        048: not seen
        116: not seen
        filtering 4 whitelisted messages
        012
        056
        094
        125
        removing 18 spam messages
        001
        002
        003
        005
        006
        ...
        136
        137
        DONE
        ^C

        me% cd
        me% rm -rf data
        me% /usr/ucb/from | wc
             80     560    5062

I only have 80 messages left from 139.  Here's the "runmail" script:


     1  #!/bin/ksh
     2  #
     3  # Id: runmail,v 1.5 2003/02/16 21:29:30 vogelke Exp
     4  # Source: /space/home/vogelke/bin/RCS/runmail,v
     5  #
     6  # filter all mail messages in current directory.
     7  
     8  PATH=/bin:/usr/bin:/usr/local/bin; export PATH
     9  tag=`basename $0`
    10  tmp=$tag.$RANDOM.tmp
    11  good=$tag.$RANDOM.good
    12  
    13  die () {
    14      echo "$*" >& 2
    15      exit 1
    16  }
    17  
    18  week="`/bin/date +%Yw%W`"
    19  
    20  # Run whitelist check before anything else.
    21  ls ??? > /dev/null 2>&1 || die "no files"
    22  
    23  fgrep -il -f $HOME/.whitelist ??? > $tmp
    24  set X `wc -l $tmp`
    25  
    26  case "$2" in
    27      0) echo no whitelisted messages ;;
    28  
    29      *) echo saving $2 whitelisted messages
    30         cp /dev/null $good || die "can't write $good"
    31  
    32         for file in `cat $tmp`
    33         do
    34             if formail -D 655360 $HOME/mail/msgid.cache < $file
    35             then
    36                 echo $file: already seen
    37             else
    38                 echo $file: not seen
    39                 formail -A 'X-Spam: whitelist' < $file >> $good
    40                 formail -X "" < $file >> $HOME/mail/HEADERS.$week
    41             fi
    42         done
    43  
    44         xargs rm < $tmp
    45         rm -f $tmp
    46  
    47         if lockfile -0 -r0 -ml
    48         then
    49             cat $good >> $MAIL || die "can't append $good to $MAIL"
    50             lockfile -mu
    51             rm $good
    52         else
    53             echo "could not lock $MAIL, see $good"
    54         fi
    55         ;;
    56  esac
    57  
    58  # These messages are not spam but need procmail handling.
    59  ls ??? > /dev/null 2>&1 || die "no more files"
    60  
    61  fgrep -il -f $HOME/.whitelist2 ??? > $tmp
    62  set X `wc -l $tmp`
    63  
    64  case "$2" in
    65      0) echo no whitelisted messages for procmail ;;
    66  
    67      *) echo filtering $2 whitelisted messages
    68         for file in `cat $tmp`
    69         do
    70             echo $file
    71             procmail < $file
    72             rm $file
    73         done
    74         ;;
    75  esac
    76  
    77  # Check for spam using ifile.
    78  ls ??? > /dev/null 2>&1 || die "no more files"
    79  
    80  echo running ifile
    81  ifile -c -q ??? | grep 'spam' | cut -f1 -d' '> $tmp
    82  set X `wc -l $tmp`
    83  
    84  case "$2" in
    85      0) echo ifile found no spam messages ;;
    86  
    87      *) echo removing $2 spam messages
    88         sf="$HOME/mail/spam-folder"
    89         xargs cat < $tmp >> $sf || die "$sf: can't append"
    90         xargs rm < $tmp
    91         cp /dev/null $tmp
    92         ;;
    93  esac
    94  
    95  # Send remaining messages through procmail.
    96  ls ??? > /dev/null 2>&1 || die "no more files"
    97  
    98  echo running procmail
    99  (
   100      for file in ???
   101      do
   102          echo $file
   103          procmail < $file
   104      done
   105      echo DONE
   106  ) >> $tmp &
   107  
   108  exec tail -f $tmp
   109  exit 0

Here's a quick description of the script:

1-11: Standard Korn-shell header.  I like ksh because I can use $RANDOM when creating tmp files.

13-16: Short suicide function which allows one-line tests, like line 21.

18: Get the current week, for storing message headers.

23-24: Look for messages that have addresses in my whitelist. My ~/.whitelist file has records like this, one per line:


        announce@freebsd.org
        apache-ssl@lists.aldigital.co.uk
        archive@securityinsight.com

26-56: Checks for any whitelisted messages and acts accordingly.

Lines 32-42 walk through each whitelisted message in turn, discard it if it's been seen before (line 34), or save a copy of the headers and add it to a temp file if it hasn't been seen before (lines 39-40).

Lines 44-45 remove the individual whitelisted messages.

Lines 47-54 safely append the temp file to my inbox.

61-75: Does roughly the same thing for whitelisted messages that need some procmail work.

80-93: Run ifile in concise, quiet mode to do a spam check on the remaining files.  Any hits are stored in $HOME/mail/spam-folder.

98-106: Run procmail on anything else that's left.

Believe me, it takes much longer to describe than it does to run.


   My .procmailrc file

For completion, here's my $HOME/.procmailrc file.


      1  # Id: .procmailrc,v 1.52 2002/08/16 16:44:44 vogelke Exp
      2  # Source: /space/home/vogelke/RCS/.procmailrc,v
      3  #
      4  # NAME:
      5  #    $HOME/.procmailrc
      6  #
      7  # DESCRIPTION:
      8  #    "procmail" handles local mail delivery, and you can use this
      9  #    file to tell it to
     10  #     - store your mail in a given folder,
     11  #     - forward or discard mail depending on the contents, or
     12  #     - run your mail through a program automatically.
     13  #
     14  # TESTING CHANGES:
     15  #    If you want to mess with your setup, the safest way is:
     16  #
     17  #    1.  copy an existing mail message to /tmp/msg,
     18  #    2.  copy .procmailrc to .procmailrc.new,
     19  #    3.  only make your changes to .procmailrc.new, and
     20  #    4.  run "procmail -m .procmailrc.new < /tmp/msg" to test.
     21  #
     22  # AUTHOR:
     23  #    Karl Vogel <vogelke@dnaco.net>
     24  #    Sumaria Systems, Inc.
         
     25  # Search path.
     26  PATH=/usr/local/bin:/bin:/usr/bin:$HOME/bin
         
     27  # Default mail folder.
     28  DEFAULT=/var/mail/vogelke
         
     29  # Current directory while procmail is executing.
     30  # All pathnames are relative to this directory.
     31  MAILDIR=$HOME/mail
         
     32  # File containing error messages or diagnostics.  If this
     33  # file does not exist, then said messages will be bounced
     34  # back to the message sender.
     35  LOGFILE=$MAILDIR/MAILLOG
         
     36  # If yes, keep an abstract of the From and Subject lines of
     37  # each delivered message, the folder it was delivered to,
     38  # and the size of the message.  If no, skip this abstract.
     39  LOGABSTRACT=yes
         
     40  # If on, describe actions of procmail in detail.
     41  #VERBOSE=on
         
     42  # Number of seconds before procmail zaps a lockfile by force.
     43  LOCKTIMEOUT=1
         
     44  # Default shell and umask value.
     45  SHELL=/bin/sh
     46  UMASK=022
         
     47  # Frequently-used variables.
     48  WEEK="`/bin/date +%Yw%W`"
         
     49  # Rules section.
     50  #--------------------------------------------------------
     51  # RULE: Save incoming headers in a file called
     52  #               $HOME/mail/HEADERS.YYYYwNN
     53  #       where YYYY = year
     54  #       NN = the week number starting on Monday.
         
     55  :0 chw: $HOME/hdr.lck
     56  | /bin/cat - >> $HOME/mail/HEADERS.$WEEK;
         
     57  #--------------------------------------------------------
     58  # RULE: Check if the Message-ID: header has been seen.
     59  #       Discard the message if so, otherwise continue.
         
     60  :0 Wh: msgid.lock
     61  | formail -D 655360 msgid.cache
         
     62  #========================================================
     63  # SPAM: dump message if the message contains a few
     64  #       8-bit characters. Simple check for encoded
     65  #       characters (=[0-9A-F][0-9A-F]) will fail for
     66  #       messages containing "dd xxx=yyy" commands.
     67  #
     68  #       If you want to check the header only, use ":0 HD"
     69  #       instead.
         
     70  :0 HBD
     71  * -10^1 Subject:
     72  *   1^1 =[0-9A-F][0-9A-F]=[0-9A-F][0-9A-F]
     73  *   1^1 [ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿]
     74  *   1^1 [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß]
     75  *   1^1 [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
     76  *   1^1 =[A-F][0-9A-F]=[A-F][0-9A-F]
     77  spam-8bit
         
     78  :0 H
     79  * ^Subject: =\?.*\?=
     80  spam-8bit
         
     81  #--------------------------------------------------------
     82  # SPAM: All-numeric "email addresses" like "you@67890.com"
     83  #       Messages to "friend@host" or "you@host".
         
     84  :0 H
     85  * ^(From|To|Reply-To): .*@[0-9]+\.
     86  spam-friend
         
     87  :0 H
     88  * ^(From|To|Cc): friend[0-9a-zA-Z]*@
     89  spam-friend
         
     90  :0 H
     91  * ^(From|To|Cc): you@
     92  spam-friend
         
     93  #--------------------------------------------------------
     94  # SPAM: pass anything in the whitelist.
     95  #       http://www.mindrape.org/caffeine/squashing_spam.html
         
     96  :0:
     97  * ? formail -x"From:" -x"From" -x"To:" -x"Reply-To:" -x"Cc:" \
     98       | fgrep -is -f $HOME/.whitelist
     99  $DEFAULT
         
    100  #--------------------------------------------------------
    101  # SPAM: kill anything in the blacklist.
         
    102  :0:
    103  * ? formail -x"From:" -x"From" -x"To:" -x"Reply-To:" -x"Cc:" \
    104       | fgrep -is -f $HOME/.blacklist
    105  spam-folder
         
    106  #--------------------------------------------------------
    107  # SPAM: Same from and to: happens legitimately only when
    108  #       sending mail to oneself.  Put this *after* whitelist
    109  #       filtering; some people on whitelist send to themselves.
    110  :0 H
    111  * ^From: \/.*
    112  * $^To: $MATCH
    113  spam-tofrom
         
    114  :0 :
    115  $DEFAULT

62-77: This checks for 8-bit spam, or quoted-printable crap. It works, but it's slow; we can do better by adding another stage to the runmail script, and deleting this portion of .procmailrc.


Created by log2html.pl v1.13 Sun, 16 Feb 2003 17:56:43 -0500