{"id":450,"date":"2009-05-30T00:43:36","date_gmt":"2009-05-29T23:43:36","guid":{"rendered":"http:\/\/wp.devco.net\/?p=450"},"modified":"2009-10-09T12:22:24","modified_gmt":"2009-10-09T11:22:24","slug":"bayes_host_classification","status":"publish","type":"post","link":"https:\/\/www.devco.net\/archives\/2009\/05\/30\/bayes_host_classification.php","title":{"rendered":"Bayes Host Classification"},"content":{"rendered":"

I run a little anti spam service and often try out different strategies to combat spam.  At present I have a custom nameserver that I wrote that does lots of regex checks against hostnames and tries to determine if a host is a dynamic ip or a static ip.  I use the server in standard RBL lookups.<\/p>\n

The theory is that dynamic hosts are suspicious and so they get a greylist penalty, doing lots of regular expressions though is not the best option and I often have to fiddle these things to be effective.  I thought I’d try a Bayesian approach using Ruby Classifier<\/a><\/p>\n

I pulled out 400 known dynamic ips and 400 good ones from my stats and used them to train the classifier:<\/p>\n

require ‘rubygems’
require ‘stemmer’
require ‘classifier’<\/p>\n

classifier = Classifier::Bayes.new(‘bad’, ‘good’)<\/p>\n

classifier.train_bad(“3e70dcb2.adsl.enternet.hu”)
.
.<\/p>\n

classifier.train_good(“mail193.messagelabs.com”)
.
.<\/p>\n<\/blockquote>\n

I then fed 100 of each known good and known bad hostnames – ones not in the initial dataset –  through it and had a 100% hit on good names and only 5 bad hosts classified as good.<\/p>\n

This is very impressive and more than acceptable for my needs, now if only there was a good Net::DNS<\/a> port to Ruby that also included the Nameserver classes.<\/p>\n","protected":false},"excerpt":{"rendered":"

I run a little anti spam service and often try out different strategies to combat spam.  At present I have a custom nameserver that I wrote that does lots of regex checks against hostnames and tries to determine if a host is a dynamic ip or a static ip.  I use the server in standard […]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","footnotes":""},"categories":[7],"tags":[121,30,13,29],"_links":{"self":[{"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/posts\/450"}],"collection":[{"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/comments?post=450"}],"version-history":[{"count":1,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/posts\/450\/revisions"}],"predecessor-version":[{"id":501,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/posts\/450\/revisions\/501"}],"wp:attachment":[{"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/media?parent=450"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/categories?post=450"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/tags?post=450"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}