The harder they come, the harder they fall: SpamAssassin stats

(this document is a part of a larger spam filtering evaluation)
Time Stats

Spam Assassin executed with compiled rules and default settings on a FreeBSD. No Bayesian filtering, auto whitelisting or balcklisting used. This script is executed for each of the test message folders (pubham, pubspam/, privham/ and privspam/). It feeds all files inside the folder to spamc, and records the appropriate message score in the output file.

$time for i in *; do spamc -c < $i >> ./spamass/pubham.sa.txt ; done

real 5m26.087s
user 0m2.247s
sys 0m4.061s

$time for i in *; do spamc -c < $i >> ./spamass/pubspam.sa.txt ; done

real 7m59.680s
user 0m3.436s
sys 0m5.371s

$time for i in *; do spamc -c < "$i" >> ./spamass/privspam.sa.txt ; done

real 1m37.555s
user 0m0.601s
sys 0m0.937s

$time for i in *; do spamc -c < "$i" >> ./spamass/privham.sa.txt ; done

real 1m56.154s
user 0m0.480s
sys 0m1.110s

Each message generated a single stat score in the output file, so we can easily check how many messages were processed:

$wc -l *
100 privham.sa.txt
100 privspam.sa.txt
405 pubham.sa.txt
561 pubspam.sa.txt

So here are the computed (real) time stats per message for each of the groups:
Pubham: 0.80 sec per message
Pubspam: 0.85 sec per message
Privspam: 0.87 sec per message
Privham: 1.16 sec per message

This is about 3-10 times slower than running the different CRM classifiers via Python (which means that it is even more slower when compared to a "pure" CRM program).

Accuracy stats

The output files I got after running the time stats scripts contain Spam Assassin scores like:

...
8.4/5.0
6.3/5.0
-6.0/5.0
-5.6/5.0
0.9/5.0
...

I'm not really interested in the default threshold value (the '/5.0' part). I want to see what would be the accuracy with different between 5 and 10. So, a small awk script is in order to group the score into different thresholds.

$cat spamass/pubham.sa.txt | awk -F '/' '{if (int($1)<5){score["subfive"]++}; if (int($1)>=10) {score["tenplus"]++} if (int($1)>=5 && int($1) < 10) {score[int($1)]++} } END {for (s in score) {print "Score " s " : " score[s]} }'
Score 5 : 2
Score subfive : 403

$cat spamass/pubspam.sa.txt | awk -F '/' '{if (int($1)<5){score["subfive"]++}; if (int($1)>=10) {score["tenplus"]++} if (int($1)>=5 && int($1) < 10) {score[int($1)]++} } END {for (s in score) {print "Score " s " : " score[s]} }'
Score 5 : 15
Score 6 : 29
Score 7 : 24
Score 8 : 8
Score 9 : 39
Score tenplus : 400
Score subfive : 46

$cat spamass/privham.sa.txt | awk -F '/' '{if (int($1)<5){score["subfive"]++}; if (int($1)>=10) {score["tenplus"]++} if (int($1)>=5 && int($1) < 10) {score[int($1)]++} } END {for (s in score) {print "Score " s " : " score[s]} }'
Score 5 : 2
Score 6 : 2
Score subfive : 96

$cat spamass/privspam.sa.txt | awk -F '/' '{if (int($1)<5){score["subfive"]++}; if (int($1)>=10) {score["tenplus"]++} if (int($1)>=5 && int($1) < 10) {score[int($1)]++} } END {for (s in score) {print "Score " s " : " score[s]} }'
Score 5 : 3
Score 6 : 10
Score 7 : 10
Score 8 : 6
Score 9 : 4
Score tenplus : 37
Score subfive : 30

OK, by this point I got sick of scripts. Some manual calculations to relate the number of messages to the accuracy with the corresponding tresholds:

Pubham:
Score subfive : 403
Threshold 5 (all messages with higher score are blocked): 2 hams blocked (Accuracy 99.9%)
Threshold 6 and more: 0 hams blocked (Accuracy 100%)

Pubspam:
Score subfive : 46 (spams with score below 5)
Threshold 5 ( spams with lower score are let in): 46 spams missed (Accuracy 92%)
Threshold 6 : 61 spams missed (Accuracy 86%)
Threshold 7 : 90 spams missed (Accuracy 80%)
Threshold 8 : 114 spams missed (Accuracy 75%)
Threshold 9 : 122 spams missed (Accuracy 73%)
Threshold 10: 161 spams missed (Accuracy 65%)
Score above 10: 400 spams

Privham:
Score subfive : 96
Treshold 5 (messages with higher score are blocked): 4 hams blocked (Accuracy 96%)
Treshold 6 : 2 hams blocked (Accuracy 98%)
Threshold 7 and more: 0 hams blocked (Accuracy 100%)

Privspam:
Score subfive : 30
Score 5 (spams with lower score are let in) :30 spams missed (Accuracy 70%)
Score 6 : 33 spams missed (Accuracy 67%)
Score 7 : 43 spams missed (Accuracy 57%)
Score 8 : 53 spams missed (Accuracy 47%)
Score 9 : 59 spams missed (Accuracy 41%)
Score 10: 63 spams missed (Accuracy 37%)
Score above 10 : 37

Pretty awesome ham recognition and quite weak spam filtering within the tested spam score levels (5-10). According to the Spam Assassin docs, setting the treshold to 5 is pretty aggressive. However, this does not seem to be the case according to the tests above.

The harder they come, the harder they fall

сряда, 4 февруари 2009 г.

SpamAssassin stats

Няма коментари:

Архив на блога

Всичко за мен