сряда, 4 февруари 2009 г.

CRM114 Classify stats

(this document is a part of a larger spam filtering evaluation)

Note: I had to increase the default window size of crm (with the -w option) to be able to run through some big emails with attachments.

Classifier stats on Spam/Ham messages taken from the Pubham/Pubspam archive:
$./mass_class.py
Classifier: 'microgroom' Cat: ham T: 112.16 sec Samples: 405 Avg T/s: 0.28 Accuracy: 0.94 (24 misses)
Classifier: 'microgroom' Cat: spam T: 141.40 sec Samples: 561 Avg T/s: 0.25 Accuracy: 0.97 (17 misses)
Classifier: 'osb unigram' Cat: ham T: 54.26 sec Samples: 405 Avg T/s: 0.13 Accuracy: 0.98 (7 misses)
Classifier: 'osb unigram' Cat: spam T: 74.24 sec Samples: 561 Avg T/s: 0.13 Accuracy: 0.86 (76 misses)
Classifier: 'osb unique microgroom' Cat: ham T: 62.81 sec Samples: 405 Avg T/s: 0.16 Accuracy: 0.96 (16 misses)
Classifier: 'osb unique microgroom' Cat: spam T: 84.86 sec Samples: 561 Avg T/s: 0.15 Accuracy: 0.88 (70 misses)
Classifier: 'osbf unique microgroom' Cat: ham T: 55.84 sec Samples: 405 Avg T/s: 0.14 Accuracy: 1.00 (0 misses)
Classifier: 'osbf unique microgroom' Cat: spam T: 75.64 sec Samples: 561 Avg T/s: 0.13 Accuracy: 0.96 (25 misses)
Classifier: 'hyperspace' Cat: ham T: 97.14 sec Samples: 405 Avg T/s: 0.24 Accuracy: 0.64 (146 misses)
Classifier: 'hyperspace' Cat: spam T: 133.04 sec Samples: 561 Avg T/s: 0.24 Accuracy: 1.00 (0 misses)
Classifier: 'hyperspace unique' Cat: ham T: 93.08 sec Samples: 405 Avg T/s: 0.23 Accuracy: 0.64 (145 misses)
Classifier: 'hyperspace unique' Cat: spam T: 126.90 sec Samples: 561 Avg T/s: 0.23 Accuracy: 1.00 (0 misses)
Tests with 100 Spam and 100 Ham messages taken from my private mailbox (Privham/Privspam):
./mass_class.py
Classifier: 'microgroom' Cat: ham T: 55.73 sec Samples: 100 Avg T/s: 0.56 Accuracy: 0.68 (32 misses)
Classifier: 'microgroom' Cat: spam T: 27.67 sec Samples: 100 Avg T/s: 0.28 Accuracy: 0.67 (33 misses)
Classifier: 'osb unigram' Cat: ham T: 17.00 sec Samples: 100 Avg T/s: 0.17 Accuracy: 0.88 (12 misses)
Classifier: 'osb unigram' Cat: spam T: 14.06 sec Samples: 100 Avg T/s: 0.14 Accuracy: 0.48 (52 misses)
Classifier: 'osb unique microgroom' Cat: ham T: 22.05 sec Samples: 100 Avg T/s: 0.22 Accuracy: 0.73 (27 misses)
Classifier: 'osb unique microgroom' Cat: spam T: 16.16 sec Samples: 100 Avg T/s: 0.16 Accuracy: 0.61 (39 misses)
Classifier: 'osbf unique microgroom' Cat: ham T: 17.17 sec Samples: 100 Avg T/s: 0.17 Accuracy: 0.73 (27 misses)
Classifier: 'osbf unique microgroom' Cat: spam T: 14.49 sec Samples: 100 Avg T/s: 0.14 Accuracy: 0.64 (36 misses)
Classifier: 'hyperspace' Cat: ham T: 28.90 sec Samples: 100 Avg T/s: 0.29 Accuracy: 0.18 (82 misses)
Classifier: 'hyperspace' Cat: spam T: 24.56 sec Samples: 100 Avg T/s: 0.25 Accuracy: 0.88 (12 misses)
Classifier: 'hyperspace unique' Cat: ham T: 27.81 sec Samples: 100 Avg T/s: 0.28 Accuracy: 0.25 (75 misses)
Classifier: 'hyperspace unique' Cat: spam T: 23.51 sec Samples: 100 Avg T/s: 0.24 Accuracy: 0.84 (16 misses)

Няма коментари: