Uncategorized | R.I.Pienaar

London DevOps meet 28/04/2010

by R.I. Pienaar | Apr 14, 2010 | Uncategorized

I have finalized speakers for the next London DevOps get together, I sent the mail below to the list, looking forward to seeing everyone there!

Hello,

I am glad to announce speakers for our first meet hosted by The Guardian.

We will meet at their shiny new offices in Kings Cross to start at 7pm, those who went to
Scale Camp will know the venue.

We have two talks of roughly 30 minutes each lined up:

Stephen Nelson-Smith will do a talk about the Git Source Code Management system

Jon Topper will talk about Zabbix monitoring

We will have some time for a few lightning talks if there’s any interest before retiring to a nearby pub. If anyone has Pub suggestions please send them along.

Map and details can be found as usual at http://londondevops.org/meetings/

Thanks again to The Guardian for the venue, if anyone out there want to sponsor some sodas or something for at the venue please get in contact.

I will try to set up some RSVP system, if you mention the meet on twitter please use the #ldndevops hashtag!

MCollective Release 0.4.4

by R.I. Pienaar | Apr 3, 2010 | Uncategorized

I just released version 0.4.4 of The Marionette Collective. This release is primarily a bug fix release addressing issues with log files and general code cleanups.

The biggest change in this release is that controlling the daemon has become better, you can ask it to reload an agent or all agents and a few other bits. Read all about it on the wiki..

Please see the Release Notes, Changelog and Download List for full details.

For background information about the MCollective project please see the project website.

Infrastructure testing with MCollective and Cucumber

by R.I. Pienaar | Mar 27, 2010 | Uncategorized

Some time ago I showed some sample code I had for driving MCollective with Cucumber. Today I’ll show how I did that with SimpleRPC.

Cucumber is a testing framework, it might not be the perfect fit for systems scripting but you can achieve a lot if you bend it a bit to your will. Ultimately I am building up to using it for testing, but we need to start with how to drive MCollective first.

The basic idea is that you wrote a SimpleRPC Agent for your needs like the one I showed here. The specific agent has a number of tasks it can perform:

Install, Uninstall and Update Packages
Query NRPE status for a specific NRPE command
Start, Stop and Restart Services

These features are all baked into a single agent, perfect for driving from a set of Cucumber features. The sample I will show here is only driving the IPTables agent since that code is public and visible.

First I’ll show the feature I want to build, we’re still concerned with driving the agent here not testing so much – though the steps are tested and idempotent:

Feature: Manage the iptables firewall
 
    Background:
    Given the load balancer has ip address 192.168.1.1
    And I want to update hosts with class /dev_server/
    And I want to update hosts with fact country=de
    And I want to pre-discover the nodes to manage
 
    Scenario: Manage the firewall
        When I block the load balancer
        Then traffic from the load balancer should be blocked
 
        # other tasks like package management, service restarts
        # and monitor tasks would go here
 
        When I unblock the load balancer
        Then traffic from the load balancer should be unblocked

To realize the above we’ll need some setup code that fires up our RPC client and manage options in a single place, we’ll place this in in support/env.rb:

require 'mcollective'
 
World(MCollective::RPC)
 
Before do
    @options = {:disctimeout => 2,
                :timeout     => 5,
                :verbose     => false,
                :filter      => {"identity"=>[], "fact"=>[], "agent"=>[], "cf_class"=>[]},
                :config      => "etc/client.cfg"}
 
    @iptables = rpcclient("iptables", :options => @options)
    @iptables.progress = false
end

First we load up the MCollective code and install it into the Cucumber World, this achieves more or less what include MCollective::RPC would in a Cucumber friendly way.

We then set some sane default options and start our RPC client.

Now we can go onto writing some steps, we store these in step_definitions/mcollective_steps.rb, first we want to capture some data like the load balancer IP and filters:

Given /^the (.+) has ip address (\d+\.\d+\.\d+\.\d+)$/ do |device, ip|
    @ips = {} unless @ips
 
    @ips[device] = ip
end
 
Given /I want to update hosts with fact (.+)=(.+)$/ do |fact, value|
    @iptables.fact_filter fact, value
end
 
Given /I want to update hosts with class (.+)$/ do |klass|
    @iptables.class_filter klass
end
 
Given /I want to pre-discover the nodes to manage/ do
    @iptables.discover
 
    raise("Did not find any nodes to manage") if @iptables.discovered.size == 0
end

Here we’re just creating a table of device names to ips and we manipulate the MCollective Filters. Finally we do a discover and we check that we are actually matching any hosts. If your filters were not matching any nodes the cucumber run would bail out.

Now we want to first do the work to block and unblock the load balancers:

When /^I block the (.+)$/ do |device|
    raise("Unknown device #{device}") unless @ips.include?(device)
 
    @iptables.block(:ipaddr => @ips[device]) 
 
    raise("Not all nodes responded") unless @iptables.stats[:noresponsefrom].size == 0
end
 
When /^I unblock the (.+)$/ do |device|
    raise("Unknown device #{device}") unless @ips.include?(device)
 
    @iptables.unblock(:ipaddr => @ips[device])
 
    raise("Not all nodes responded") unless @iptables.stats[:noresponsefrom].size == 0
end

We do some very basic sanity checks here, simply catching nodes that did not respond and bailing out if there are any. Key is to note that to actually manipulate firewalls on any number of machines is roughly 1 line of code.

Now that we’re able to block and unblock IPs we also need a way to confirm those tasks were 100% done:

Then /^traffic from the (.+) should be blocked$/ do |device|
    raise("Unknown device #{device}") unless @ips.include?(device)
 
    unblockedon = @iptables.isblocked(:ipaddr => @ips[device]).inject(0) do |c, resp|
        c += 1 if resp[:data][:output] =~ /is not blocked/    
    end
 
    raise("Not blocked on: #{unblockedon} / #{@iptables.discovered} hosts") if unblockedon 
    raise("Not all nodes responded") unless @iptables.stats[:noresponsefrom].size == 0
end
 
Then /^traffic from the (.+) should be unblocked$/ do |device|
    raise("Unknown device #{device}") unless @ips.include?(device)
 
    blockedon = @iptables.isblocked(:ipaddr => @ips[device]).inject(0) do |c, resp|
        c += 1 if resp[:data][:output] =~ /is blocked/    
    end
 
    raise("Still blocked on: #{blockedon} / #{@iptables.discovered} hosts") if blockedon 
    raise("Not all nodes responded") unless @iptables.stats[:noresponsefrom].size == 0
end

This code does actual verification that the clients have the IP blocked or not. This code also highlights that perhaps my iptables agent needs some refactoring, I have two if blocks that checks for the existence of a string pattern in the result, I could make the agent return Boolean in addition to human readable results. This would make using the agent easier to use from a program like this.

That’s all there is to it really, MCollective RPC makes reusing code very easy and it makes addressing networks very easy.

Monitoring / Infrastructure Testing

The above code demonstrates how using MCollective+Cucumber you can address any number of machines, perform actions and get states within a testing framework. This seems an uncomfortable fit – since Cucumber is a testing framework – but it doesn’t need to be.

Above I am using cucumber to drive actions but it would be great to use this combination to do testing of infrastructure states using something like cucumber-nagios. The great thing that MCollective brings to the table here is that you can have sets of tests that changes behavior with the environment while having the ability to break out of the single box barriers.

With this you can easily write a kind of infrastructure test that transcends machine boundaries. You could check the state of one set of variables on one set of machines, and based on the value of those go and check that other machines are in a state that makes those variables valid variables to have.

We’re able to answer those ‘this machine is doing x, did the admin remember to do y on another machine?’ style questions. Examples of this could be:

If the backups are running, did the cron job that takes a database out of the service pool get run? This would flag up at any time, even if someone is doing a manual run of backups.
How many Puppet Daemons is currently actively doing manifests on all our nodes, alert if more than 10. Even this simple case is hard – you need a view of the status of an application in real time across many nodes, and requires information from now rather than the usual 5 minute window of Nagios.
If there are 10 concurrent puppetd runs happening right now, is the puppet master coping? This test would stay green, and not care for the master until the time comes that there are many puppetd’s doing manifest runs. This way if your backups or sysadmin action pushes the load up on the master the check will stay green, it will only trigger if you’re seeing many Puppet clients running. This could be useful indicators for capacity planning.

These simple cases are generally hard for systems like Nagios to do, it’s hard to track state of many checks, apply logic and then go CRITICAL if a combination of factors combine to give a failure, we can build such test cases with MCollective and Cucumber fairly easily.

The code here does not really show you how to do that per se, but what it does show is how natural and easy it is to interact with your network of hosts via MCollective and Ruby. In future I might post some more code here to show how we can build on these ideas and create test suites as described. As a example a test case for the above Puppet Master example might be:

Feature: Monitor the capacity of the Puppet Master
 
    Background:
    Given we know we can run 10 concurrent Puppet clients
    And the Puppet Master load average should be below 2
 
    Scenario: Monitor the Puppet Master capacity 
        When there are more than usual Puppet clients running
        Then the Puppet Master should have an acceptable load average

Running this under cucumber-nagios we’ll achieve our stated goals.

As a small post note, figuring out how many Puppet Daemons are currently running their manifests is trivial with the Puppet Agent:

p = rpcclient("puppetd")
p.progress = false
 
running = p.status.inject(0) {|c, status| c += status[:data][:running]}
puts("Currently running: #{running}")

$ ruby test.rb
Currently running: 3

Scheduling Puppet with MCollective

by R.I. Pienaar | Mar 17, 2010 | Uncategorized

Scheduling Puppet runs is a hard problem, you either run the daemon or run it through cron, both have drawbacks. There’s been some discussion about decoupling this or to improve the remote control abilities of Puppet, this is my entry into that discussion.

Running the daemon it’s all about the memory problems of pretty much everything involved, you also suffer if a dom0 reboots as the 20 domU’s on it will pile up and cause huge concurrency runs.

Running from cron you have problems scheduling it nicely, the simplest approach is just to sleep a random period of time, but this means clients don’t always run a predictable time and you still get concurrency issues.

I’ve written an mcollective based Command and Control for Puppet that launches Puppet runs. The aim is to spread the CPU load on my masters out evenly to ensure I can use lower spec machines for masters. Or in my case I can re-use my master machines as monitoring and middleware nodes.

It basically has these features:

Discover the list of nodes to manage based on a supplied filter, I have regional masters so I will manage groups of Puppet nodes independently
Evenly spreads out the Puppet runs over an interval, if I have 10 nodes and a 30 minute interval I will get a run every 3 minutes.
Nodes run at a predictable time every time, even after reboots since the node list is just run through alphabetically. If the node list stays constant you’ll always run at the same time give or take 10 seconds. If nodes get added the behavior will be predictable.
Before scheduling a run it checks the overall concurrency of Puppet runs, if it exceeds a limit it will skip a background run. I want to give priority to runs that I run by hand with –test, this ensures that happens.
If the client it is about to run ran its Catalog recently – maybe via –test – it will skip that run

The result is pretty good, spreading 6 nodes out over 30 minutes I get a nice even CPU spread, the spike in the graph after the change is when the node itself runs Puppet. The 2nd graph is eth0 network output, the dip is when localhost is running:

The resulting CPU usage is much smoother, there aren’t periods of no CPU usage and there are no spikes caused by nodes bunching up together.

Below output from a C&C session managing 3 machines with an interval of 1 minute and a max concurrency of 1, these machines were still running cron based puppetd so you can see the C&C is not scheduling runs when it hits the concurrency limit due to cron runs:

$ puppetcommander.rb --interval 1 -W /dev_server/ --max-concurrent 1
Wed Mar 17 08:31:29 +0000 2010> Looping clients with an interval of 1 minute(s)
Wed Mar 17 08:31:29 +0000 2010> Restricting to 1 concurrent puppet run(s)
Wed Mar 17 08:31:31 +0000 2010> Found 3 puppet nodes, sleeping for ~20 seconds between runs
Wed Mar 17 08:31:31 +0000 2010> Current puppetds running: 1
Wed Mar 17 08:31:31 +0000 2010> Puppet run for client dev1.my.net skipped due to current concurrency of 1
Wed Mar 17 08:31:31 +0000 2010> Sleeping for 20 seconds
Wed Mar 17 08:31:51 +0000 2010> Current puppetds running: 1
Wed Mar 17 08:31:51 +0000 2010> Puppet run for client dev2.my.net skipped due to current concurrency of 1
Wed Mar 17 08:31:51 +0000 2010> Sleeping for 20 seconds
Wed Mar 17 08:32:12 +0000 2010> Current puppetds running: 0
Wed Mar 17 08:32:12 +0000 2010> Running agent for dev3.my.net
Wed Mar 17 08:32:15 +0000 2010> Sleeping for 16 seconds

There are many advantages to this approach over some other that’s been suggested:

No host lists to maintain, it reconfigures itself dynamically on demand.
It doesn’t rely on some other on-master fact like signed certificates that breaks models where the CA is separate
It doesn’t rely on stored configs that doesn’t work well at scale or on a setup with many regional masters.
It doesn’t suffer from issues if a node isn’t available but it’s in your host lists.
It understands the state of the entire platform and so you can control concurrency and therefore resources on your master.
It’s easy to extend with our own logic or demands, the current version of the code is only 90 lines of Ruby including CLI options parsing.
Concurrency control can mitigate other problems. Have a cluster of 10 nodes, don’t want your config change to restart them all at the same time, no problem. Just make sure you only run 2 a time.

In reality this means I can remove 256MB RAM from my master – since I can now run fewer puppetmasterd processes, this will save me $15/month hosting fee on this specific master, it’s small change but always good to control my platform costs.

Puppet Concat 20100312

by R.I. Pienaar | Mar 12, 2010 | Uncategorized

I am pleased to announce the next version of my Puppet Concat script, we now have 0.24.8 and newer support and a few smaller bits mentioned below.

For background of what this is about please see my earlier post: Building files from fragments with Puppet

New in this release

Paul Elliot sent in most of the patches that enabled this release, lots of thanks Paul!

0.24.8 and newer is supported
You can now prepend warnings to generated files as a shell style comment using the warn property
You can enable the ability to create empty concat files using the force property
You can configure the location of your sort binary in setup.pp

The code should auto configure for 0.24.8 use, if it does not work please see setup.pp.

You can grab the code here.

Known issues

As with my earlier attempts at making a concat tool for 0.24.x this version when used on 0.24 will raise some false notifies. Basically the method we use to clear the concat store of unmanaged files has a side effect and on the next run you will get an unneeded notify. Puppets behavior has improved in 0.25 so it works as expected there, for 0.24 though there is no known work around.

You cannot change the owner of a file, I know how to work around this issue and will have something in the next release.

What does Puppet manage on a node?

by R.I. Pienaar | Feb 26, 2010 | Uncategorized

Last year I wrote a tool to parse the localconfig.yaml from Puppet 0.24 and display a list of resources and classes. This script failed when 0.25 came out, I’ve updated it for 0.25 support.

The yaml cache has some added features in 0.25 so now I can also show the list of tags on a node, output would be:

# parselocalconfig.rb /var/lib/puppet/client_yaml/catalog/fqdn.yaml
Classes included on this node:
        fqdn
        common::linux
        <snip>
 
Tags for this node:
        fqdn
        common::linux
        <snip>
 
Resources managed by puppet on this node:
        yumrepo{centos-base: }
                defined in common/modules/yum/manifests/init.pp:24
 
        file{/root/.ssh: }
                defined in common/modules/users/manifests/root.pp:20
 
        <snip>

You can get the script that supports both 0.24 and 0.25 here.

« Older Entries

Next Entries »