r1005 - in trunk: . ATTACHMENTS

archaic at linuxfromscratch.org archaic at linuxfromscratch.org
Tue Nov 15 06:08:06 PST 2005


Author: archaic
Date: 2005-11-15 07:08:06 -0700 (Tue, 15 Nov 2005)
New Revision: 1005

Added:
   trunk/ATTACHMENTS/spamstuff.tar.bz2
   trunk/anti-spam-stuff.txt
Log:
Added: anti-spam-stuff.txt

Added: trunk/ATTACHMENTS/spamstuff.tar.bz2
===================================================================
(Binary files differ)


Property changes on: trunk/ATTACHMENTS/spamstuff.tar.bz2
___________________________________________________________________
Name: svn:mime-type
   + application/octet-stream

Added: trunk/anti-spam-stuff.txt
===================================================================
--- trunk/anti-spam-stuff.txt	2005-11-15 03:35:13 UTC (rev 1004)
+++ trunk/anti-spam-stuff.txt	2005-11-15 14:08:06 UTC (rev 1005)
@@ -0,0 +1,789 @@
+AUTHOR: Declan Moriarty <junk _ mail AT iol.ie>
+
+DATE: 2005-11-12
+
+LICENSE: GNU Free Documentation License Version 1.2
+
+SYNOPSIS: Setting up an Open Source Anti-Spam kit on an lfs box
+
+DESCRIPTION: With an emphasis on configuration, this provides
+Installation & Configuration Instructions for Mail-SpamAssassin-3.1.0
+and it's helper tools.
+
+ATTACHMENTS:
+
+spamstuff.tar.bz2	A config file and init script.
+
+PREREQUISITES: A Basic understanding of unix, and a hatred of spam. This
+hint does _not_ apply to earlier versions of SpamAssassin, but you
+should be OK with most recent (or future) versions of other programs.
+Perl5 is required. A configurable mail server also helps. I would
+suggest postfix instead of qmail, but whatever you know well will
+probably do. If your mail is relayed to you, get procmail also, or some
+other mda, otherwise calling all these will be difficult. I also give
+instructions for formail (part of the postfix package), althouugh any
+similar mail handling utility can do.
+
+HINT:
+
+SECTION 1: INTRODUCTION.
+
+This is long. The only consolation is that it's about all the reading
+you have to do. Some jargon first
+
+	Spam = Unsolicited Bulk email, that is mail that the user did
+not subscribe for. People who subscribe to a mailing list agree to
+receive to bulk mail. That is solicited. Spam is not. The word is from
+the film "Monty Python and the Holy Grail", where knights used as a
+weapon the repition of the word spam.
+
+	Ham = good mail
+	a 'hit' is a test that identifies spam identifying something.
+	false hits are tests that hit ham.
+        False Positive  = Good mail wrongly marked as spam
+        False Negatives  = Spam wrongly let through
+	Lint = Test validity of setup
+
+	Set your goals. Set your spam policy. I don't want bulk mail, I
+don't want any spam in my mail,and I will accept false positives.
+Relying on an isp for relaying mail, I cannot reject at smtp level, so I
+silently delete spam, after checking the subjects and sender. Others
+will be different, and your policy will differ accordingly.
+
+In fighting spam, you have many tools. Collect your first one.
+
+1. From this moment on, start keeping your spam. you need every bit of
+it you can hold onto, for testing. Don't read it, just store it in a
+mailbox somewhere. About a Meg or two is enough. Collect a few
+mailboxes with 50 or so, and at least one with a hundred.
+
+http:razor.sourceforge.net/
+
+2. Razor-agents. This operates by sending checksums of mail to a central
+server. If they have been reported as spam, the mail is markable as
+spam. If not, the checksums are discarded and you are told the mail is
+OK.  It's very good, but relies on reporting. For commercial use, send
+an email (explaining your linux installation) to partners at cloudmark.com
+
+http://www.rhyolite.com/anti-spam/dcc
+
+3. DCC, The Distributed Checksum Clearinghouse. This operates as above,
+sending checksums, but the dcc counts how many times it has received
+that checksum. That is what it reports. The dcc also keeps all
+checksums, so the server database is bigger. It goes back about six
+months. The DCC is an effectiive check for bulk mail. I believe
+commtouch offer a commercial service.
+
+http://spamassassin.apache.org/downloads.cgi
+
+4. SpamAssassin-3.1.0 is a major revision on previous versions. It
+offers heuristic or rule-based vetting of email and employs blocklists,
+and several novel and unusual features. Very configurable - the
+workhorse, and the PITA. Unlike most Perl applications, this one is
+inclined to land 'jam side down' or in a mess, and sorting is necessary.
+
+5. Others exist. Notably, Amavisd-new and clamav. This is a sensible
+balance for a home user. You may want clamav if you are processing mail
+for windoze clients. Amavisd-new is a sort of sweeper process. The
+trouble is, all run on perl, and there's a limit to any box's workload.
+I may include them later.
+
+Ownerships:
+
+Preferred practise is not to run anything as root, and most of the mail
+programs will become user 'nobody' if they find themselves running with
+uid 0. Also, you do not want to make a 'super-luser' who has everything
+set up for him, as then if any process is breached, they have access to
+the whole box. So mail is handled by restricted users with few
+privileges until the delivery, which is done as the user to whom mail is
+delivered. The ultimate in this is qmail, which has a mexican wave of
+processes owned by users with shells like /bin/true, appearing and
+dissappearing playing pass-the-parcel while your mail goes through.
+
+Installation instructions specify a reccomended user. Make your choice
+
+		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+SECTION 2. INSTALLING: (Configuration later)
+
+Spam:
+
+1. The spam seems to land naturally. If it doesn't, I can probably send
+you some. But if you really want pain, register a domain. You instantly
+go on every spammer's list. Then you get email from spammers offering
+you a mailing list to spam with _every_ address from registered domains
+:-/. If spam doesn't land, what are you doing here?
+
+Razor-agent:
+
+2. Razor agents. You need razor-agents, and razor-agents-sdk.  You also
+need to know that this service is marketed to windoze users at profit,
+and the open source community receive it free, or cheap. Free for
+individuals, cheap for business use under linux. 
+
+Read the install, and stick it in. The single hassle is permissions: It
+needs to run as a particular user, who registers with cloudmark.com, and
+has a directory in ~/.razor with the config files. I used the same user
+as fetchmail.
+
+You get 4 tools, razor-check, razor-report, razor-revoke, and
+razor-admin, each with it's own man page. The default log I have in
+/var/log/razor-agent.log instead of a homedir, but it should be owned
+and writable by the configured user
+
+After install, change to the razor user, and run 'razor-admin
+-create'.  You should now have a ~/.razor subdir. 
+
+Razor-admin -register registers an identity with cloudmark, which you
+need for reporting & revoking. Follow the prompts.  Razor attaches a
+seriousness level to your reports. If you report spam that nobody else
+ever does, you're an idiot. If you report what others subsequently do,
+that's good. Your revokes are also examined; If you revoke what isn't
+spam, that's good.  If you revoke the wrong stuff, you're a twit. That's
+all in their software, and don't worry. As good netizens receiving a
+free service, however, we want to provide feedback.
+
+Tart up ~/.razor-agents.conf to suit your site, copy the entire ~/.razor
+subdir to /etc/razor (Sitewide) and you're done. It would be nice for
+other lusers to be able to read that, so make it so. The only catch can
+be the reporting failing to authenticate. You should have an 'identity'
+symlink, but you can turn debug up to 9, and then try again and check
+the log.
+
+
+DCC:
+
+3. This is a bit trickier to play with, largely because they try to have
+the same package go into every unix without messing. They fail. Open the
+INSTALL.txt in one console, and open root in another and obey it.
+Install the client software only, unless you have 100k emails per day.
+These instructions are in addition. DCC.tar.Z contains
+
+cdcc - control program
+dccifd, dccproc - The client end
+dccd, dccm - the server end.
+
+DCC by default lands in /var/dcc (make the dir) and needs a user's
+name or uid to drop down to. Everything has to be accessible to that
+luser. By default, it wants to 
+
+	* Drop it's pid in /var/run/dcc/. But some smartalec script
+cleans out /var/run every boot under LFS, so you need to alter that. I
+stuck  the pid in /tmp also, but you can fix it if you like.
+
+	* dccifd uses a socket (Much like a device node to a program) to
+communicate with the box, and it also lands that in /var/dcc/ This is
+not a good place in LFS for users to be trying to write to. Stick this
+in /tmp also and make /tmp at least 666 or else 777 permissions.
+
+	There is also dccm, a 'milter' for sendmail. If you use
+sendmail, and figure this out, please send me an appropiate chunk of
+hint on it, and I'll include it.
+
+SPAMASSASSIN:
+
+4. Cancel the day's appointments and buy yourself in some alcaholic
+tranquilizer. You may need it. Open the archive.  Become root. 
+If you had a previous version of Spamassassin, read the UPGRADE
+file.  Heavy going. Check perl5 as follows.
+
+
+perl -v   # gives the version. You need at least 5.6.1
+ls -l /usr/lib/perl5
+ls -l /usr/local/lib/perl5
+
+One of the ls commands should return no files. If it doesn't, try this
+
+du -sh /usr/lib/perl5
+du -sh /usr/local/lib/perl5
+
+In my setup, the perl binary is in /usr/bin, and the libs are in /usr/local/lib/
+perl5; spamassassin, cpan and you get confused. Perl truncates the path.
+If the libs are in /usr local, so must any plugins. Solve it this way
+
+cp -R <smaller lib/perl5> <larger lib/perl5>
+rm -rf <smaller lib/perl5>
+ln -s <larger lib/perl5> <smaller lib/perl5>, i.e. one of
+
+ln -s /usr/local/lib/perl5 /usr/lib OR
+ln -s /usr/lib/perl5 /usr/local/
+
+I ended up with 5 megs of plugins in /usr/lib/perl5 and 31 megs or perl
+in /usr/local/lib, and I was installing stuff spamassassin couldn't
+find. I am sparing you that hassle.
+
+If you haven't got IPV6 compiled in to your kernel, do it now. Don't try
+it with modules, unless you enjoy suffering. This thing is stupid, but
+it wants IPV6. It also shuts up an annoying error on starting Xorg.
+
+Perl author seem to be fond of Capitalising Every Little Word, and using
+:: where they mean - which leads to a lot of wear on shift keys, and
+nerves. They also seem to have a law against the same subroutine being
+written twice, so they store them all in cpan.org
+(Common::Perl::Achhive::Network in perlspeak). Here they package them
+with no readme files and devilishly strict testing.
+
+Install spamassassin with
+
+	perl Makefile.PL &&
+	make &&
+	make install
+
+Open the Mail-SpamAssassin archive, log in as a luser and open
+the INSTALL in one console(1), while you raid CPAN as root in the
+second (2). I would reccomensd another  root console (3), to sort things
+out. The commands you need in (2) are
+
+	perl -MCPAN -e shell	#open a perl shel
+	o conf prerequisites_policy ask # get prerequisites
+
+That sets you up. Then
+
+	i <Module::Name>  # What's the story with <Module::Name>
+
+	install <Module::Name> # guess!
+
+In the spamassassin install file (1) find the section "Optional Modules.
+They are not really optional. Paste the module names one at a time into the cpan
+console, after the 'i' command, e.g.
+
+	i Digest::SHA1 # This is Digest::SHA(one). If you take
+the digit for an 'I' you'll lose yourself nicely.
+
+	This will return something like the following:
+
+Module id = Digest::SHA1
+    DESCRIPTION   NIST SHA message digest algorithm
+    CPAN_USERID  UWEH (Uwe Hollerbach <uweh at bu.edu>)
+    CPAN_VERSION 2.10
+    CPAN_FILE    G/GA/GAAS/Digest-SHA1-2.10.tar.gz
+    DSLI_STATUS  cdch (pre-alpha,developer,C,hybrid)
+    MANPAGE      Digest::SHA1 - Perl interface to the SHA-1 algorithm
+    INST_FILE
+/usr/local/lib/perl5/site_perl/5.8.5/i686-linux/Digest/SHA1.pm
+    INST_VERSION 2.10
+
+Meaning I have it, or "Not Installed" in place of the last couple of
+lines if I don't. In that case, simply type 
+
+	install Digest::SHA1
+
+into your CPAN terminal. It will probably tell you that you need 76
+other modules first, and will it get them too? (That's what the 'ask'
+was about). Say yes. Par for the course is that one of them fails 2 of
+the 997 tests, and the install stops. Here's what your terminal (3) is
+for. Change to /root/.cpan/build/Module-name, and run the make test
+again, this time noting the errors. If you decide they are
+inconsequential, simply type 'make install' and it's in. Most of these
+are networking modules anyhow. If the error matters, it's your system.
+
+Presuming a module barfs, progress will stop. You intervene on
+terminal (3) as above, then return to cpan on (2) and  simply retype 
+
+	install <Module::Name>
+
+and it will reassess the dependencies, and carry on. BORING!
+Try to keep the name of the module it's processing in your head, as it
+may well not be on the screen as it barfs.
+
+Eventually, you'll get there. Check the root/.cpan/build directory for
+any subdirs with (3). If you find, for example IP-Country-Fast-<version>
+still there, ask CPAN did it go in in (2)
+
+	i IP::Country::Fast
+
+and if it did, rm -rf the appropiate directory. Typing quit lets you out
+of cpan.
+		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+
+SECTION 3 CONFIGURATION:
+
+	From here on, keep a minimum of one root terminal open, and one
+user terminal. I presume you know what user arrangements you have set
+up. 
+
+Razor:
+
+With the config done up above, you should be able to save off a spam
+email as it's own mailbox (save to a mailbox called 'test' or
+something). In a terminal, type 'tail -f /var/log/razor-agents.log'.
+The razor log should appear (last few lines). Then in another terminal,
+type 
+
+	'cat test | razor-check' 
+
+A few lines should be added to the log, the key ones being the verdict.
+
+type 'cat test | razor-report' to report it.
+
+If this doesn't happen, check the firewall. Open Outgoing TCP port 2703
+(Razor2) and TCP port 7 (Echo), then try again.
+
+Failure to report (error 202)is usually a permissions issue. Set the
+debug up to 9 and try again, and check that everything is user readable.
+Then start suspecting perl. Check is enough to stop spam. Report &
+revoke need the identity. To check if it's a permissions issue,
+try as root.
+
+Vipul does not want any automatic reporting set up. One exception is if
+you have mail adresses which you know are going to be 100% spam, and you
+may indeed forward them. We will want to report manually, being good
+netizens. Be aware that the checksums are on the body, as the headers
+will differ anyhow. Further if you report spam to a mailing list, you're
+a twit, because they usually add the footer, making the mailing list
+copy different from the original.
+
+
+DCC:
+
+Further to the above setup details, here are some other thoughts. Razor
+finds it's own servers - dcc wants you to specify yours. If you have
+100k messages per day or more, you build use the server. Like razor, dcc
+allows free usage from single users but wants money from businesses. As
+root, type
+
+cdcc  # This gives a cdcc shell
+new map map.txt 
+
+generates a map (default name of 'map) from the list of servers
+in /var/dcc/map.txt. Quit out of it. This instruction presumes you are a
+private user, covered by the license to use their servers.
+
+Open /var/dcc/dcc_conf and change anything you don't like the look of -
+notably the settings at the end. There's no need to set up the rest of
+it, as you can do it with command line options. There are 3 programs you
+will use
+
+	1. cdcc - a setup program
+	2. dccproc - executable checker
+	3. dccifd - The daemon used by spamassassin's spamd/spamc.
+
+Three other use options:
+	
+	1. There is a whitelist /var/dcc/whiteclnt. Whitelist everyone
+you can think of - linuxfromscratch.org, ebay, paypal, and any other
+list server you may be on.  
+
+	2. There is a blacklist file, which isn't a lot of use as the
+spammers have to keep hopping from one place to another anyhow.  If
+certain weirdos stay stuck in the same place, they belong in a
+blacklist.
+
+	3. Greylisting is also an option. You may theoretically lose a
+small percentage of mail with this. It works as follows. In every mail
+transaction where this is done, your mail server says "Not right now -
+I'm busy. Send it in half an hour" Proper mail servers will send it
+later. Poorly set up mail servers may lose mail, either by not sending,
+or resending immediately and then giving up. Spammers will not resend in
+99% of cases, seeing as they can't hold messages back while relaying
+illegally through other servers with ease. So you don't get spammed, and
+your name comes off their list. That's the theory. I have my mail
+relayed via the isp, so I can't use it. You will upset any isp big time
+if you turn on greylisting. If you are out in the big bad world,
+however, it is an option.
+
+Some words on querying: dccproc is like razor-check, except it reports
+as well by default. If you check & report ham repeatedly with dcc, the count 
+keeps going up. Use the -Q option for repeat tests to avoid reporting again.
+Each user is supposed only to report each mail once.
+
+I would suggest a startup script for this and spamd (The server end of
+spamassassin). Mine is available and it specifies all details of
+operation, making /var/dcc/dcc_conf fairly redundant. I presume you
+still have your 'test' spam lying around.
+
+
+The threshold figure is set by -t. The three checksums are body, fuz1
+and fuz2. All are covered by the 'cmn' setting. DCC say to set them at
+'many'. I found results dissappointing, and set it to 10, where things
+worked better. The blfs list, often repeatedly fails dcc checks at
+that lower level. My dccifd options are
+
+
+-I luser:group  # Who it runs as. A real person, please.
+-p /tmp/dccifd	# Location of socket
+-m /var/dcc/map # Location of map
+-d -B set:debug # Debug (both options)
+-x 		# Try extra hard to connect to a server (I needed that)
+-t cmn,10	# Set all thresholds to 10
+
+Make sure to finish the 'stop' section with rm -f /tmp/dccifd to
+remove a stray socket if it exists. It will prevent dccifd from
+restarting. 
+
+Lastly, tail -f /var/log/mail.log and type
+
+cat test |dccproc  (DO NOT type 'cat test > dccproc' or you try to
+overwrite it! Either way you're a twit.)
+
+
+SPAMASSASSIN:
+
+Here's where I hope you have pcregrep and formail. This is
+actually basically operable usually, but in a mess. I would
+suggest surfing to 
+http://www.rulesemporium.com/rules.htm
+
+and download whatever rule sets you choose. Pop them in
+/etc/mail/spamassassin. As root, mv the original local.cf (if it exists)
+aside and download mine from
+http://www.linuxfromscratch.org/hints/downloads/files/ATTACHMENTS/spamstuff.tar.bz2.
+Pop it likewise in /etc/mail/spamassassin.
+We're calling this spamassassin, but you're actually going to use spamd
+and spamc, the binaries.  Download 70_sare_sc_top200.cf also. Don't
+install it, just keep it handy.
+
+Open v310.pre in vim, and make sure it has the following lines
+
+loadplugin Mail::SpamAssassin::Plugin::SpamCop
+loadplugin Mail::SpamAssassin::Plugin::WhiteListSubject
+loadplugin Mail::SpamAssassin::Plugin::Razor2
+loadplugin Mail::SpamAssassin::Plugin::DCC
+loadplugin Mail::SpamAssassin::Plugin::DomainKeys
+loadplugin Mail::SpamAssassin::Plugin::RelayCountry
+
+I think that's it. All these .pre files are read and the plugins loaded 
+from them. 
+
+Download my init script or write your own. You need to start dccifd
+(because spamc/spamd use that) and spamd. Spamassassin wants to be a
+user, but not a real one. I added the user spamc in the group postfix.
+I have a pause (5 seconds) in the restart option so things will let go
+before they try to take hold again. This is for spamd. My spamd options are:
+
+-d		# Daemonize = get lost in the background
+-l		# allow learning thus facilitating bayes
+-m 10		# Max processes. These are seriously memory hungry
+I only have 10 to facilitate mass tests. 5 is plenty.
+-u spamc	# run as user spamc. Otherwise it's nobody, and
+things fall over.
+
+Start them now.
+(as luser)	tail -f -n 20 /var/log/mail.log
+
+(as root)	/etc/rc.d/init.t/spamd		I see
+	
+getpwnam(genius:users): Success				[  OK  ]
+Starting..SPAMD.........				[  OK  ]
+
+The first is dccifd, and the second is spamd. Check in
+particular that dcc stayed running 
+
+	pgrep dccifd	I get 4 process ids
+	
+	pgrep spamd	I get three. More means it's working.
+
+A regular problem is dccifd quitting. In that case, rm -f /tmp/dccifd
+(the socket). Check that log. Usually, again, this is a permissions
+issue.
+
+	Now I presume you will copy in my available config file and edit
+that, rather than your own. I describe a sitewide config, but user
+configs can be created, and maintained by different users. The same process 
+applies. spamassassin -c creates a user config. You can test your setup with 
+(as anybody:)
+
+	cat test | spamc -R - you should get a report, and an extract.
+
+root is a positive disadvantage for all mail tests, as these programs
+refuse to hold onto root priviliges, and drop to a specified user, or to
+nobody. They are all called by the user _receiving_ the mail, so they
+can write in his maildir, which typically has 0600 permissions. Root
+will never receive mail this way, as user nobody certainly can't write
+to root's directory! Alias root to a user. You need root for starting these 
+tools however
+
+	Sorting out the bugs in things (There will be many) is achieved
+by these commands.
+
+	1. spamassassin -D --lint > debug.txt 2>&1 Examine this file for
+negatives 
+	2. Change the -d to -D for spamd and restart from a root
+terminal. It will hold the terminal, and information.  
+
+	3. Poring over the entrails of /var/log/mail.log. This as set
+up, this file covers your mail client, spamassassin, & DCC. If someone
+knows how to set up a separate syslog facility, let me know and I'll
+stuff one in for spam. I did have a go myself, but things fell over so I
+reverted.
+
+Look for the things that didn't happen, and config lines not parsed.
+Your rulesets, I presume, will be different from mine. Here's mine:
+
+[root at genius ~]# ls /etc/mail/spamassassin
+20_dec.cf                70_sare_obfu.cf        88_FVGT_headers.cf
+70_sare_adult.cf         70_sare_oem.cf		88_FVGT_rawbody.cf
+70_sare_genlsubj0.cf     70_sare_random.cf	88_FVGT_subject.cf
+70_sare_genlsubj1.cf     70_sare_spoof.cf       88_FVGT_uri.cf
+70_sare_genlsubj_eng.cf  70_sare_uri0.cf        99_FVGT_meta.cf
+70_sare_header0.cf       70_sare_uri1.cf	99_FVGT_Tripwire.cf
+70_sare_header1.cf       70_sare_uri_eng.cf	99_sare_fraud_post25x.cf
+70_sare_header_eng.cf    72_sare_bml_post25x.cf         init.pre
+70_sare_highrisk.cf      72_sare_redirect_post3.0.0.cf  local.cf
+70_sare_html0.cf         82_antidrug.cf                 spam@
+70_sare_html1.cf         88_FVGT_body.cf                v310.pre
+
+
+20_dec.cf are my own rules, and spam@ is a symlink to
+/usr/share/spamassassin. Spamassassin ignores subdirs, so you can have
+an archive. The bigger your mail setup, the fewer rules you want to
+avoid loading the system. The best ones of the above lot are the sare header,
+html, uri, drug & adult. The higher the number, the later it is read,
+and the more priority it has. Presuming you sort your bugs, you now have
+an integrated sitewide anti-spam setup.
+
+	You now need one other item of information. Are your mails being
+checked against blacklists (like spamcop, sorbs.net) upstream? To find
+out, use 70_sare_sc_top200.cf. View it in one console and cd to your
+subdir with the spam mailmoxes (I am presuming they are named spam1,
+spam2, etc). The first entry in 70_sc_top200.cf today is 
+
+
+Received =~ /\b12\.(?:210\.176\.205|211\.4\.79|217\.81\.151)\b/
+
+Now you can check for that with pcregrep. You cannot restrict your
+search to the Received line too handy, but you can do this
+
+pcregrep '\b12\.(?:210\.176\.205|211\.4\.79|217\.81\.151)\b' spam?
+
+any instances will show. You will notice I removed the /regex/
+delimiters and replaced them with 'regex'. Just one other word of
+warning: pcregrep appears not to like the /i at the end of most regexes
+in the rules. Use pcregrep -i and remove the /i. You can also use -c to
+check the number of times. I do not get any instances of the top200
+spammers, so I presume the top 200 are not getting through directly to
+me. The ruleset is therefore unneccessary for me. If that is your case
+it means you don't have to worry about setting up tests to spamcop,
+sorbs, or any of them.
+
+If you haven't got prce, egrep -e will apply posix rules which are
+close, but different. The main weakness is in unusual character types
+like \d which do not behave in egrep.
+
+INTEGRATION:
+
+Penultimately, Integration. If your mail is relayed to you, use
+procmail. If you are online 24/7 and serious.spammer.co.tw can reach
+your box directly, set up a reject configuration in your mail client.
+The amavisd-new package includes many configuration options for weird and
+wonderful mail clients with a better understanding of them than you
+will usually find in the documentation. 
+
+Think this course through. Mailing lists will get spam, and will forward
+it. If you bounce repeatedly to a mailing list, you will be
+unsubscribed, sometimes automatically.
+
+Procmail's recipe looks like this (in ~/.procmailrc)
+
+:0fw
+| /usr/bin/spamc	
+:0
+* X-Spam-Level: \*\*\*\*\*
+$HOME/Mail/spam
+
+That pipes through spamd (which calls razor & dcc) and dumps it in a
+spam mailbox on 5 stars. man procmail or man procmailex help here.
+Those exact procmail lines put spam in ~/Mail/spam. Make sure it exists.
+
+		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+SECTION 4: TUNING:
+
+The standard config is very soft, and lets some spam
+through. Mine is short on negative rules, and hard on porn particularly.
+Even if you don't want to use mine, download it and lint with it once,
+as it will show you errors on other places. Your friends are
+
+man Mail::SpamAssassin::Conf
+man Mail::SpamAssassin::Plugin::Name  (e.g URIDNSBL)
+
+beware of the latter manpages,as they drift between config options and
+rules pretty seamlessly without telling you. Next tune up! As root,
+
+vim /etc/mail/spamassassin/local.cf  
+
+Looking at my local.cf
+The first things are basic setup. Leave the first line there unless you
+are using nfs, in which case it must come out. The host 216.171.238.83
+is linuxfromscratch.org, which hopefully cuts  dns testsfrom that host.
+Otherwise, it seems to dcc check everything regardless of what you tell
+dccifd.
+
+PYZOR config options are there, but commented out. I tried it, and found
+it very little use. You can run a local server in a large outfit and
+allow your users to blacklist dynamically this way. It also runs in
+python, which is another interpeter and libs to load. They reccomend
+readyexec, which takes care of that some clever way. Suit yourself.
+The install is a doddle, but not worth it, imho.
+
+DCC options are clear enough - paths to everything, and much of the
+stuff on the dccifd command line. The very last option is for dccproc.
+Spamc/spamd use dccifd, the daemon. Spamassassin the perl script runs
+dccproc. The -B option sets a check on spamhaus.org, which returns
+127.0.0.2 as a positive result. Multiple -B options are allowed.
+
+RAZOR options are simple. It's neat code.
+
+BAYES options allow learning from ham/spam. Also there are uridnsbl
+(blocklist stuff). It you don't need the blocklist, comment these out
+and comment out URIDNSBL in /etc/mail/spamassassin init.pre
+
+SPF is Sender Policy Framework. ISPs should have a policy, and the mail
+is checked against that . Weak, but it catches the occasional thing.
+
+Next come whitelist from. Include Family, friends, business contacts,
+paypal (If you're registered). The bayes_ignore entries should be all
+mailing lists, as some get spam, and their spam score will rise
+otherwise.
+
+Finally we get rules, listed under groups as one progresses through an
+email, and scored. The general policy is to assign a weight to a score,
+and arrive for spam at a score of 5 or above, and for other mail, to
+keep the score at below 5. To check any rule (This is where the'spam'
+symlink comes in handy) cd to /etc/mail/spamassassin and type
+
+grep -r RULE_NAME *
+
+Here's an example
+lfs:/etc/mail/spamassassin$grep -r FORGED_RCVD_HELO *
+
+local.cf:score FORGED_RCVD_HELO 1.22 
+spam/20_head_tests.cf:header FORGED_RCVD_HELO eval:check_for_forged_received_hel
+o() 
+spam/20_head_tests.cf:describe FORGED_RCVD_HELO Received: contains a forged HELO
+spam/50_scores.cf:score FORGED_RCVD_HELO 0 0 0 0.135
+
+20_head_tests is an original spamassassin ruleset. spam/50_scores.cf is
+the default score  0 until the fourth time when it scores 0.135
+
+The scores relate to successive hits of a rule. It scores basically
+nothing, but I have lifted it to 1.22. It is an excellent indicator of
+spam or the linuxfromscratch lists where half cocked mail setups abound.
+If your mailer gives out a domain that s dns check can't resolve, you're
+in trouble here. If you have a legit A and MX record where people would
+expect to find them, you're ok. All broadband modems have urls in the
+range of the isp, so if your private network goes out, something smells.
+
+Mine and html rules are very good. Mind you , I have trained most people
+to send text. If you use html a lot, back some of these off. Some are still
+excellent spam indicators, even if you want to allow for half-assed mail
+from m$ outlook etc. These ones are always good
+
+HTML_EMBEDS 3		HTML_FONT_BIG 3
+HTML_FONT_LOW_CONTRAST 	HTML_FONT_INVISIBLE 	HTML_IMAGE_ONLY_04 
+HTML_IMAGE_ONLY_08 	HTML_IMAGE_ONLY_12   	HTML_IMAGE_RATIO_(all)
+
+The high ratios are also useful. Even outlook sends text as well.
+The MIME tests are excellent also. 
+
+Tests that throw false positives are:
+
+FORGED_ anything, Example: when a (top post)reply from hotmail.com comes from  
+hotmail to a question from yahoo.com and then you get FORGED_YAHOO_RCVD. 
+
+These clever tests like backhair trip over linux program versions.
+Posted kernel configs are ALL_CAPITALS. Spamsigns are detected in
+directory names.  A subject line like VIA GRATIS (The way of thanks in
+latin) also has VIAGRA in there. You can't make a rule against 'love'
+because 'glover' is a surname. Tune accordingly. try this
+
+cat spam1 |formail -n 2 -ds spamc -R >> spam1_reports (presuming ~50 messages)
+
+and repeat for all the others. DO NOT try that on a big mailbox, as
+spamc processes detach from formail, and it starts another before you
+finish. In 400 emails, I had 200 spamc processes looking for 10 spamd
+processes in one test. Then the modem backed up, and I lost all dns
+tests. If you don't have spare memory, drop the '-n 2' option and wait. 
+
+Then try it on your ham, your saved messages. You can do that version,
+or simply 
+
+cat ham |formail -n 2 -ds spamc -c
+
+the -ds splits the mailbox and pipes to the following command.
+
+and they will roll up the screen, scores only. If you had spamassassin
+installed already, they will all hit weird scores. Level the playing
+pitch as follows:
+
+cat mailbox |formail -ds spamassassin -d >>ham1
+
+Removing the markups and then 
+
+sed 's/>From/d'  < ham1 > ham2  to remove any escaped >From lines which
+may have been inserted during internal delivery and which throw analysis
+off track. Finally,
+
+rm -f ham1 (it's intermediate processing).
+
+Once you get spamd running and working, the above process is necessary
+before repeat checks. Killing dccifd before repeats is also clever. You
+can razor-check all you like. Remember to remove the socket if you kill
+dccifd.
+
+cat ham2 |formail -ds spamc -c to test your ham. If there are failures,
+analyse more closely
+
+cat ham2 |formail -ds spamc -R |less gives you the reports and an
+extract on successive lines. Open consoles as you need them. On another console,
+get any ham marked as spam onscreen and presuming gpm is working, you
+can find the problem this way.
+
+Get the rule onscreen	grep -r SOME_RULE_NAME /etc/mail/spamassassin/*
+and locate the regex
+
+Set up the test		pcregrep -i 'whatever_regex'
+
+This doesn't tell it where to search, so it looks on stdin. This command
+will hog that console. You can paste in any suspicious expressions and
+press return. There will be no output unless the regex matches. Tweak
+scores to let the mail you want through. 
+
+In the general run of play, you can probably lower my html scores, and
+adjust for your own situation. If you are a doctor, you will obviously
+have to adjust or whitelist any mail sources that send mail about drugs.
+
+Try to find negative rules that apply to your situation. Find a similar
+rule. Don't fiddle with the 'eval do something' type rules as they are
+spamassassin builtins. The various header lines are specified by this
+sort of thing "Received: = ~  and just check those lines. Invent your
+own rules as appropiate. These headers (Received, From, Subject, etc.)
+are all in ram as variables when a message is checked. Invent your own
+regex, and don't forget to run
+
+spamassassin -D --lint afterwards to check it out. Never mind what the
+errors are, (some mistakes redirect) undo what you did last and lint
+again.  Man perlre helps. Unrecognized options are a sign of missing
+plugins. I, for instance, do not use HashCash or RelayCountry plugins.
+If you decide to use them, enter the options off the man page.
+
+Keep your spam for a month at least after you set the system running.
+You ideally need reports back of false positives and false negatives. Never
+get cocky, as there will be both. It's just minimizing them is the aim.
+
+My current ratio is 
+	~ 99% of all spam successfully caught.
+	~ 3% of ham marked as spam (Entirely from the lfs lists) . This
+is a high figure, but I'm lazy. The real problem is that if the query
+goes to spam, the answers do also. I haven't retuned recently, and I can
+afford to lose a thread off the lfs list. I'll back a few things off one
+of these days, to cope with lfs.
+
+What gets through is mail that mimicks your own mail, and genuinely sent
+spam from webmail, short stuff, that doesn't trigger enough to top the
+spam score. What gets wrongly caught usually is misinterpeted signs of spam.
+Regexes are a non thinking tool.
+
+Save off false positives and false negatives in a separate directory,
+and get them through by readjusting and restarting your spamd daemons.
+
+
+ACKNOWLEDGEMENTS:
+
+Authors of all software, and the regex Maestros of rulesemporium.com
+
+
+CHANGELOG:
+Nov. 15th 2005: Finsihed this 1st draft.
+




More information about the hints mailing list