Spam increase due to SpamAssassin Bayes database not available for scanning

For approximately the past six weeks or so, I’ve noticed an uptick in the amount of spam getting through (and delivered) on my primary mail server. At first the increase in false negatives (meaning spam not getting flagged as such) wasn’t all that bad, so I didn’t think much of it. However, starting last week and especially this week, the increase was so dramatic that it prompted me to look further into the problem.

I started by looking through my SpamAssassin and amavis settings to make sure that nothing was blatantly wrong, but nothing stood out as having recently changed. I made sure that I had the required Perl modules for all of SpamAssassin’s filtering, and again, nothing had recently changed. Coming up empty-handed, I decided to take a look at some headers for an email that came through even though it was spam:

X-Spam-Status: No, score=1.502 required=3.2 tests=[
	DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.25,
	FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.25,
	HTML_MESSAGE=0.001]

I thought that possibly something had changed with how SpamAssassin assigned points for various tests, and temporarily dropped the score required for spam flagging to 1.4. Delving more deeply, though, I found that the point assignments had not changed, so I reverted to 3.2 and kept investigating. After looking again, I noticed that one key test wasn’t showing in the ‘X-Spam-Status’ header for this email: Bayesian filtering. Normally, there would be some type of reference to ‘BAYES_%=#’ (where % represents the chance that the Bayesian filter thought that the message could be spam and # represents the score assigned to the email based on that chance) in the spam header. However, it was no longer showing up, which indicated to me that the Bayes filters weren’t running.

I then started with some basic Bayes troubleshooting steps, and found some clues. By analysing the output of spamassassin -D --lint, I saw that there could be problem with the Bayes database:

Apr 14 22:10:41.875 [20701] dbg: plugin: loading Mail::SpamAssassin::Plugin::Bayes from @INC
Apr 14 22:10:42.061 [20701] dbg: config: fixed relative path: /var/lib/spamassassin/3.004004/updates_spamassassin_org/23_bayes.cf
Apr 14 22:10:42.061 [20701] dbg: config: using "/var/lib/spamassassin/3.004004/updates_spamassassin_org/23_bayes.cf" for included file
Apr 14 22:10:42.061 [20701] dbg: config: read file /var/lib/spamassassin/3.004004/updates_spamassassin_org/23_bayes.cf
Apr 14 22:10:42.748 [20701] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x55f1700749b8) implements 'learner_new', priority 0
Apr 14 22:10:42.748 [20701] dbg: bayes: learner_new self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x55f1700749b8), bayes_store_module=Mail::SpamAssassin::BayesStore::DBM
Apr 14 22:10:42.759 [20701] dbg: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x55f170db5ba8)
Apr 14 22:10:42.759 [20701] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x55f1700749b8) implements 'learner_is_scan_available', priority 0
Apr 14 22:10:42.759 [20701] dbg: bayes: tie-ing to DB file R/O /var/amavis/.spamassassin/bayes_toks
Apr 14 22:10:42.760 [20701] dbg: bayes: tie-ing to DB file R/O /var/amavis/.spamassassin/bayes_seen
Apr 14 22:10:42.760 [20701] dbg: bayes: found bayes db version 3
Apr 14 22:10:42.760 [20701] dbg: bayes: DB journal sync: last sync: 1586894330
Apr 14 22:10:42.760 [20701] dbg: bayes: not available for scanning, only 0 ham(s) in bayes DB < 200
Apr 14 22:10:42.760 [20701] dbg: bayes: untie-ing
Apr 14 22:10:42.764 [20701] dbg: bayes: tie-ing to DB file R/O /var/amavis/.spamassassin/bayes_toks
Apr 14 22:10:42.764 [20701] dbg: bayes: tie-ing to DB file R/O /var/amavis/.spamassassin/bayes_seen
Apr 14 22:10:42.765 [20701] dbg: bayes: found bayes db version 3
Apr 14 22:10:42.765 [20701] dbg: bayes: DB journal sync: last sync: 1586894330
Apr 14 22:10:42.765 [20701] dbg: bayes: not available for scanning, only 0 ham(s) in bayes DB < 200
Apr 14 22:10:42.765 [20701] dbg: bayes: untie-ing

In particular, the following lines indicated a problem to me:

Apr 14 22:10:42.760 [20701] dbg: bayes: not available for scanning, only 0 ham(s) in bayes DB < 200
<snip>
Apr 14 22:10:42.765 [20701] dbg: bayes: not available for scanning, only 0 ham(s) in bayes DB < 200

Thinking back, I remembered that there were some changes to the amavis implementation in Gentoo that caused me problems in late February of 2020. One of those changes was relocating the amavis user’s home/runtime directory from /var/amavis/ to /var/lib/amavishome/. That’s when I saw it in the debugging output:

Apr 14 22:10:42.759 [20701] dbg: bayes: tie-ing to DB file R/O /var/amavis/.spamassassin/bayes_toks
Apr 14 22:10:42.760 [20701] dbg: bayes: tie-ing to DB file R/O /var/amavis/.spamassassin/bayes_seen

The directory for the Bayes database shouldn’t be /var/amavis/.spamassassin/bayes* any longer, but instead should be /var/lib/amavishome/.spamassassin/bayes*. I made that change:

# grep bayes_path /etc/mail/spamassassin/local.cf 
bayes_path /var/lib/amavishome/.spamassassin/bayes

and restarted both amavis and spamd, and now I could see the Bayes filter in the ‘X-Spam-Status’ header:

X-Spam-Status: Yes, score=18.483 required=3.2 tests=[BAYES_99=5.75,
	BAYES_999=8, FROM_SUSPICIOUS_NTLD=0.499,
	FROM_SUSPICIOUS_NTLD_FP=0.514, HTML_MESSAGE=0.001,
	HTML_OFF_PAGE=0.927, PDS_OTHER_BAD_TLD=1.999, RDNS_NONE=0.793]

After implementing the change for the Bayes database location in amavis, I have seen the false negative level drop back to where it used to be. 🙂

Cheers,
Zach

Leave a Reply

Your email address will not be published.