{"id":66,"date":"2004-01-09T13:17:19","date_gmt":"2004-01-09T12:17:19","guid":{"rendered":"http:\/\/wp.devco.net\/?p=66"},"modified":"2009-10-09T17:29:09","modified_gmt":"2009-10-09T16:29:09","slug":"analysis_of_55_000_spam_mails","status":"publish","type":"post","link":"https:\/\/www.devco.net\/archives\/2004\/01\/09\/analysis_of_55_000_spam_mails.php","title":{"rendered":"Analysis of 55 000 Spam Mails"},"content":{"rendered":"
I handle mail for about 40 domains on my servers at the moment, some are secondary and some are primary, they all get spam.<\/p>\n
I have been keeping close track of all emails in and out of my machine. I keep lots of meta information about these emails including to, from, sender hostname, subject, attachments, time spent processing, is it spam or not etc. I do this partly because there are certain legal requirements for this to be done in the EU and because i like the kind of stats I can pull out of this.<\/p>\n
It has now been a year since I started keeping this stats and my SpamAssassin<\/a> has tagged 55 000 emails as spam. I religiously check my own spam folder for false positives and do not get much, but I am aware of some html newsletters that gets tagged as spam when it shouldn’t be. Overall though I believe that my tagging is fairly accurate.<\/p>\n What follows in the extended entry is a bit of analysis I did on this data to find out what ISP’s, Countries and so forth are to blame for this plague.<\/p>\n It is important to note that I am not setting out to have a hugely scientific approach to this or even a highly accurate one. If there were a very accurate way to identify spam we would not have a problem with it, this is merely interesting observations made on a small system.<\/p>\n First some general interesting bits about my mail volumes, I am by no means a big carrier of email, in fact its a very modest mail installation.<\/p>\n
\nAll of the data is kept in a SQL database for ease of query, I put the data there using iScan<\/a> and a plugin I wrote to dump its memory state into SQL statements after processing of an email is complete.<\/p>\n