{"id":450,"date":"2009-05-30T00:43:36","date_gmt":"2009-05-29T23:43:36","guid":{"rendered":"http:\/\/wp.devco.net\/?p=450"},"modified":"2009-10-09T12:22:24","modified_gmt":"2009-10-09T11:22:24","slug":"bayes_host_classification","status":"publish","type":"post","link":"https:\/\/www.devco.net\/archives\/2009\/05\/30\/bayes_host_classification.php","title":{"rendered":"Bayes Host Classification"},"content":{"rendered":"<p>I run a little anti spam service and often try out different strategies to combat spam.&nbsp; At present I have a custom nameserver that I wrote that does lots of regex checks against hostnames and tries to determine if a host is a dynamic ip or a static ip.&nbsp; I use the server in standard RBL lookups.<\/p>\n<p>The theory is that dynamic hosts are suspicious and so they get a greylist penalty, doing lots of regular expressions though is not the best option and I often have to fiddle these things to be effective.&nbsp; I thought I&#8217;d try a Bayesian approach using <a href=\"http:\/\/classifier.rubyforge.org\/\">Ruby Classifier<\/a><\/p>\n<p>I pulled out 400 known dynamic ips and 400 good ones from my stats and used them to train the classifier:<\/p>\n<blockquote><p>require &#8216;rubygems&#8217;<br \/>require &#8216;stemmer&#8217;<br \/>require &#8216;classifier&#8217;<\/p>\n<p>classifier = Classifier::Bayes.new(&#8216;bad&#8217;, &#8216;good&#8217;)<\/p>\n<p>classifier.train_bad(&#8220;3e70dcb2.adsl.enternet.hu&#8221;)<br \/>.<br \/>.<\/p>\n<p>classifier.train_good(&#8220;mail193.messagelabs.com&#8221;)<br \/>.<br \/>.<\/p>\n<\/blockquote>\n<p>I then fed 100 of each known good and known bad hostnames &#8211; ones not in the initial dataset &#8211;&nbsp; through it and had a 100% hit on good names and only 5 bad hosts classified as good.<\/p>\n<p>This is very impressive and more than acceptable for my needs, now if only there was a good <a href=\"http:\/\/www.net-dns.org\/\">Net::DNS<\/a> port to Ruby that also included the Nameserver classes.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I run a little anti spam service and often try out different strategies to combat spam.&nbsp; At present I have a custom nameserver that I wrote that does lots of regex checks against hostnames and tries to determine if a host is a dynamic ip or a static ip.&nbsp; I use the server in standard [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","footnotes":""},"categories":[7],"tags":[121,30,13,29],"_links":{"self":[{"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/posts\/450"}],"collection":[{"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/comments?post=450"}],"version-history":[{"count":1,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/posts\/450\/revisions"}],"predecessor-version":[{"id":501,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/posts\/450\/revisions\/501"}],"wp:attachment":[{"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/media?parent=450"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/categories?post=450"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/tags?post=450"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}