Pruning those huge Spamassassin files (Shallow Thoughts)

Akkana's Musings on Open Source, Science, and Nature.

Thu, 07 May 2009

Pruning those huge Spamassassin files

During a server backup, Dave complained that my .spamassasin directory was taking up 87Mb. I had to agree, that seemed a bit excessive.

The only two large files were auto-whitelist at 42M and bayes_seen at 41M. Apparently these never get pruned by spamassassin.

Unfortunately, these are binary files, so you can't just edit them and remove the early stuff, and spamassassin doesn't seem to have any documentation on how to prune their data files. A thread on the Spamassassin Users list on managing Spamassassin data says it's okay to delete bayes_seen and it will be regenerated.

For pruning auto-whitelist, that same post suggests a program called check-whitelist that is only available in a spamassassin source tarball -- it's not installed as part of distro packages. Run this with --clean. But a search on the wiki turns up an entry on AutoWhitelist that says you should use tools/sa-awlUtil instead (it doesn't say how to run it or where to get it -- presumably download a source tarball and then RTFSC -- read the source code?)

Really, I'm not sure auto whitelisting is such a good idea anyway, especially auto whitelist entries from several years ago, so I opted for a simpler solution: removing the auto-whitelist file at the same time that I removed bayes_seen. Indeed, both files were immediately generated as new mail came in, but they were now much smaller.

I've run for a few weeks since doing that, and I'm not noticing any difference in either the number of false positives or false negatives. (Both are, unfortuantely, large enough to be noticable, but that was true before the change as well.)

Tags: , ,
[ 19:38 May 07, 2009    More tech/email | permalink to this entry ]