Select Page

Tips and Tricks for puppet debugging

We often have people asking ‘will this work…’ or ‘how do I…’ type questions on IRC, usually because it seems like such a big deal to upload a bit of code to your master just to test.

Here are a quick few tips and tricks for testing out bits of puppet code to get the feel for things, I’ll show you how to test code without using your puppetmaster or needing root, so it’s ideal for just playing around on your shell, exploring the language structure and syntax..

Getting more info
Most people know this one, but just running puppetd –test will highlight all the various steps puppet is taking, any actions its
performing etc in a nice colored display, handy to just do one-offs and see what is happening.

Testing small bits of code:
Often you’re not sure if you’ve got the right syntax, especially for case statements, selectors and such, or you just want to test out some scenarios, you can’t just dump the stuff into your master because it might not even compile.

Puppet comes with a executable called ‘puppet’ that’s perfect for this task, simply puppet some manifest into a file called test.pp and run it:

puppet --debug --verbose test.pp

This will run through your code in test.pp and execute it.  You should be aware that you couldn’t fetch files from the master in this case since it’s purely local but see the File types reference for a explanation of how the behavior changes – you can still copy files, just not from the master.

You can do anything that puppet code can do, make classes, do defines, make sub classes, install packages, this is great for testing out small concepts in a safe way.  Everything you see in this article was done in my shell like this.

What is the value of a variable?
You’ve set a variable in some class but you’re not sure if it’s set to what you’re expecting, maybe you don’t know the scoping rules just yet or you just want to log some state back to the node or master log.

In a usual scripting language you’d add some debug prints, puppet is the same.  You can print simple values in the master log by doing this in your manifest:

notice("The value is: ${yourvar}")

Now when you run your node you should see this being printed to the syslog (by default) on your master.

To log something to your client, do this:

notify{"The value is: ${yourvar}": }

Now your puppet logs – syslog usually – will show lines on the client or you could just do puppetd –test on the client to see it run and see your debug bits.

What is in an array?
You’ve made an array, maybe from some external function, you want to know what is in it?  This really is an extension to the above hint that would print garbage when passed arrays.

Building on the example above and the fact that puppet loops using resources, lets make a simple defined type that prints each member of an array out to either the master or the client (use the technique above to choose)

$arr = [1, 2, 3]
 
define print() {
   notice("The value is: '${name}'")
}
 
print{$arr: }

This will print one line for each member in the array – or just one if $arr isn’t an array at all.

$ puppet test.pp
notice: Scope(Print[1]): The value is: '1'
notice: Scope(Print[3]): The value is: '3'
notice: Scope(Print[2]): The value is: '2'

Writing shell scripts with puppet?
Puppet’s a great little language, you might even want to replace some general shell scripts with puppet manifest code, here’s a simple hello world:

#!/usr/bin/puppet
 
notice("Hello world from puppet!")
notice("This is the host ${fqdn}")

If we run this we get the predictable output:

$ ./test.pp
notice: Scope(Class[main]): Hello world from puppet!
notice: Scope(Class[main]): This is the host your.box.com

And note that you can access facts and everything from within this shell script language, really nifty!

Did I get the syntax right?
If I introduce a deliberate error in the code above – remove the last ” – it would blow up, you can test puppet syntax using puppet itself:

$ puppet --parseonly test.pp
err: Could not parse for environment production: Unclosed quote after '' in 'Hello world from puppet!)
' at /home/rip/test.pp:1

You can combine this with a pre-commit hook on your SCM to make sure you don’t check in bogus stuff.

How should I specify the package version? or user properties?
Often you’ve added a package, but not 100% sure how to pass the version string  to ensure => or you’re not sure how to specify password hashes etc, puppet comes with something called ralsh that can interrogate a running system, some samples below:

% ralsh package httpd
package { 'httpd':
    ensure => '2.2.3-22.el5.centos.2'
}
 
% ralsh package httpd.i386
package { 'httpd.i386':
   ensure => '2.2.3-22.el5.centos.2'
}
 
% ralsh user apache
user { 'apache':
   password => '!!',
   uid => '48',
   comment => 'Apache',
   home => '/var/www',
   gid => '48',
   ensure => 'present',
   shell => '/sbin/nologin'
}

Note in the 2nd case I ran it as root, puppet needs to be able to read shadow and so forth.  The code it’s outputting is valid puppet code that you can put in manifests.

Will Puppet destroy my machine?
Maybe you’re just getting ready to run puppet on a host for the first time or you’re testing some new code and you want to be sure nothing terrible will happen, puppetd has a no-op option to make it just print what will happen, just run puppetd –test –noop

What files etc are being managed by puppet?
See my previous post for a script that can tell you what puppet is managing on your machine, note this does not yet work on 0.25.x branch of code.

Is my config changes taking effect?
Often people make puppet.conf changes and it just isn’t working, perhaps they put them in the wrong [section] in the config file, a simple way to test is to run puppetd –genconfig this will dump all the active configuration options.

Be careful though don’t just dump this file over your puppet.conf thinking you’ll have a nice commented file, this wont work as the option for genconfig will be set to true in the file and you’ll end up with a broken configuration.  In general I recommend keeping puppet.conf simple and short only showing the things you’re changing away from defaults, that makes it much easier to see what differs from standard behavior when asking for help.

Getting further help
Puppet has a irc channel #puppet on freenode, we try and be helpful and welcoming to newcomers, pop by if you have questions.

As always you need to have these wiki pages bookmarked and they should be your daily companions:

Managing web traffic with ruby-pdns

A short while ago I wrote about releasing a Ruby Development framework for PowerDNS the release is still early days, feature complete but needs some robustness tweaks and a new release will be out in a week or so to address that.

I wanted though to highlight some success that I’ve had using it.  I have a small static farm for a client that handles around 2MiB/sec of 200×200 jpg files, this setup is for a startup so out of necessity its all built to be cheap, I host on networks I don’t own yet I need pretty good control over it, what IPs will be used to serve traffic and so forth.

The graph above shows the case before caused by the windows DNS bug, you’ll see the bottom host is working pretty hard getting a large chunk of the bandwidth.

This is a problem because come mid month this poor machine has already used up its allocation of 2.5TiB of transfer and I need to move it from the pool.

So my goal was to shift the traffic to the yellow and green machines and just generally balance things out a bit. I used the Weighted Round Robin feature of ruby-pdns to adjust the biases, it took a bit of fiddling because for some other reason even when this machine gets fewer requests per second it still seems to manage more in terms of bandwidth, this is the eventual code snippet:

ips = ["213.x.x.232",                         # dark blue
          "88.x.x.201",                       # lighter blue
          "82.x.x.180",                       # yellow
          "82.x.x.181"].randomize([1,2,2,3])  # green
 
answer.shuffle false
answer.content [:A, ips[0]]
answer.content [:A, ips[1]]
answer.content [:A, ips[2]]

The thresholds seems odd but that’s what worked after some fiddling, see the graph below.

This is much nicer balanced, it’s not perfect and I doubt I will get it perfect with just 4 machines to play with but I believe it’s already at the point where it means I can use all my machines for the entire month without hitting any limits.

Here’s another graph over the week showing things side by side:

The improvement is very obvious in this graph and you can see I’ve not lost anything in performance between first day and last day on the graph in terms of throughput (the lower days were days where lower traffic is expected).

If I look at my actual transfer used it’s better balanced now, first lets see the 12th:

08/12/09    12.67 GiB |  46.42 GiB |  59.09 GiB |   5.74 Mbit/s
08/12/09     7.71 GiB |  21.32 GiB |  29.04 GiB |   2.82 Mbit/s
08/12/09     9.05 GiB |  23.05 GiB |  32.10 GiB |   3.12 Mbit/s
08/12/09     6.94 GiB |  16.56 GiB |  23.50 GiB |   2.28 Mbit/s

Again the skew is very clear with a 23GiB on the lowest compared to 59GiB on the highest use machine, on the 17th it looked a lot better:

08/17/09     7.84 GiB |  28.55 GiB |  36.39 GiB |   3.53 Mbit/s
08/17/09     8.46 GiB |  25.66 GiB |  34.12 GiB |   3.31 Mbit/s
08/17/09    11.21 GiB |  30.70 GiB |  41.91 GiB |   4.07 Mbit/s
08/17/09    10.25 GiB |  28.20 GiB |  38.46 GiB |   3.73 Mbit/s

Obviously much better when looking at the 2nd to last column. The first column is received the increase in those is down to a slightly lower hit ratio on the caching proxy on these machines meaning it’s fetching more files from origin than the others.

Overall I am extremely pleased with this solution, I agree one should not be using DNS as a hammer to all your nails but for startups and cloud based people who do not have control over networks, BGP tables and so forth this really does represent a viable option to what would otherwise be an extremely expensive problem to solve.

Ruby PowerDNS Framework

Regular readers here will know I patch bind with GeoIP extensions, this has served me well but my needs have now outgrown simply doing geo related replies.

I’ve for a long time had an itch to be able to do completely custom DNS, maybe respond to monitoring, or time of day, geographical location or even to work around some unbelievably annoying bugs in windows that breaks all round robin dns, this has not been possible with Bind.

PowerDNS has a backend that simply speaks via STDIN and STDOUT to any script, the documentation though is pretty shoddy but I quickly realized this is the way to go.  Once I figured out all the various weird things about PDNS and the Pipe backend I set about writing a framework to host many records in a single PDNS server – in a way that hides and abstracts all the PowerDNS details from the code

The end goal is that I would dump some Ruby code into a file on the server and it should just be served, when I get new code I just want to overwrite the old code, no restarts or anything it must just serve it.

I wanted the code to be trivially simple, something like this:

module Pdns
  newrecord(“www.your.net”) do |query, answer|
    case country(query[:remoteip])
      when “US”, “CA”
        answer.content “64.xx.xx.245”

      when “ZA”, “ZW”
        answer.content “196.xx.xx.10”

      else
        answer.content “78.xx.xx.140”
      end
  end
end

should be all that is needed to do GeoIP based serving, and really complex things like weighted random round robins that effectively work around the bugs in client resolvers like the windows one above:

        ips = [“1.x.x.x”, “2.x.x.x”, “3.x.x.x”, “4.x.x.x”, “5.x.x.x”]

        ips = ips.randomize([1,5,3,3,3])

        answer.shuffle false
        answer.ttl 300
        answer.content ips[0]
        answer.content ips[1]
        answer.content ips[2]

This code will take 5 ip addresses, shuffle them giving the first one least weight, the 2nd one most weight and return only 3 out of the 5 results, this would be impossible in Bind but trivial to imagine coding if only you could hook into the nameserver.

Anyway, so I wrote a framework that enables exactly this, the code snippets above are actual working snippets.  The code is hosted on Google Code as ruby-pdns and is at version 0.3 at present.

I’ve release tarball and RPM versions of the code, the code is publicly browsable and licensed under the GPLv2. 

At present I think I’ve documented it all fairly well with a good set of Wiki pages though the install instructions for non RPM based install leaves a bit to be desired, I’ll work on improving that.

I’ve been running this code myself serving 10’s of 1000s of queries a day and have used the technique above to work around windows bugs.  I’m looking for testers to start using the code and sending me feedback, there are groups, tickets and all set up for that on Goole Code.

Google Code

I am working on a new Open Source development framework for PowerDNS and needed somewhere to host the project, typically I’d host my own SVN and Wiki and just take patches via email but I thought that’s a bit stupid for this day and age.

I set up a Google Code project for Ruby PDNS to give it a go and must say I’ve been really impressed with it.

Feature wise it provides most of what you need when combined with Google Groups, Pages, Analytics and so forth but the core feature set is not all bad either. 

  • They support SVN or Mercurial
  • The wiki is (loosely) based on MoinMoin which is nice cos I already had lots of Moin docs for this project.  Crucially you can access the wiki pages over SVN for local editing.
  • The ticketing system is OK, it’s probably the worst part of the project hosting systems but I think I can get used to it for sure.  Specifically I want to be able to add blockers and such when the ticket gets created already.  I also want to look at the ticket and see all commits pertaining to this ticket, not possible it seems.
  • Code commits can interact with the ticketing system, this is great you can make tickets, comment on tickets, add CC’s etc or even close or sent for code review from inside your commits, I like this alot.
  • The source browser is good, on par with other self hosted ones I’ve used.
  • Importing my old svn repo into Google Code was easy and kept my timestamps and all which I was very impressed with.

In contrast to using Sourceforge in the past I have to say this is really quite pleasant to use.  The recent re-design of Sourceforge which on the surface looks nice is in fact absolutely horrid, and as it’s the 2nd bad redesign in a row I think it’s time projects get new homes.  For example a recent post to the Bacula lists mentioned their hatred for the new design too and they’ve had enough and will migrate elsewhere.

If you’re looking for code hosting and use either SVN or HG, check out Google Code.

Update: You can update tickets from SVN commits, you just need to be careful about ordering of the text, see the Issue Tracker wiki page

What does puppet manage on a node?

Sometimes it’s nice to try and figure out what resources of a machine are being managed by puppet.  Puppet keeps a state file in either YAML or Marshall format called localconfig.yaml it’s full of useful information, I wrote a quick script to parse it and show you what’s being managed.

Typical output is:

Classes included on this node:
        nephilim.ml.org
        common::linux
        <snip>

Resources managed by puppet on this node:
        service{smokeping: }
                defined in common/modules/smokeping/manifests/service.pp:6

        file{/etc/cron.d/mrtg: }
                defined in common/modules/puppet/manifests/init.pp:201
<snip>

It will show all classes and all resources including where in your manifests the resource comes from.  Unfortunately for resources created by defines it shows the define as the source but I guess you can’t have it all.

You can get the code here it’s pretty simple, just pass it a path to your localconfig.yaml file, it supports both YAML and Marshal formats.

The file also has every property of the resources in it etc, so you can easily extend this to print a lot of other information, just use something like pp to dump out the contents of Puppet::TransObject objects to see what’s possible.

Bayes Host Classification

I run a little anti spam service and often try out different strategies to combat spam.  At present I have a custom nameserver that I wrote that does lots of regex checks against hostnames and tries to determine if a host is a dynamic ip or a static ip.  I use the server in standard RBL lookups.

The theory is that dynamic hosts are suspicious and so they get a greylist penalty, doing lots of regular expressions though is not the best option and I often have to fiddle these things to be effective.  I thought I’d try a Bayesian approach using Ruby Classifier

I pulled out 400 known dynamic ips and 400 good ones from my stats and used them to train the classifier:

require ‘rubygems’
require ‘stemmer’
require ‘classifier’

classifier = Classifier::Bayes.new(‘bad’, ‘good’)

classifier.train_bad(“3e70dcb2.adsl.enternet.hu”)
.
.

classifier.train_good(“mail193.messagelabs.com”)
.
.

I then fed 100 of each known good and known bad hostnames – ones not in the initial dataset –  through it and had a 100% hit on good names and only 5 bad hosts classified as good.

This is very impressive and more than acceptable for my needs, now if only there was a good Net::DNS port to Ruby that also included the Nameserver classes.