Select Page

Xen Live Migration with MCollective

I retweeted this on twitter, but it’s just too good to not show. Over at rottenbytes.com Nicolas is showing some proof of concept code he wrote with MCollective that monitors the load on his dom0 machines and initiate live migrations of virtual machines to less loaded servers.

This is the kind of crazy functionality I wanted to enable with MCollective and it makes me very glad to see this kind of thing. The server side and client code combined is only 230 lines – very very impressive.

This is a part of what VMWare DRS does Nico has some ideas to add other sexy features as well as this was just a proof of concept. The logic for what to base migrations on will be driven by a small DSL for example.

I asked him how long it took to knock this together: time taken to get acquainted with MCollective combined with time to write the agent and client was only 2 days, that’s very impressive. He already knew Ruby well though ๐Ÿ™‚ And has a Ruby gem to integrate with Xen.

I’m copying the output from his code below, but absolutely head over to his blog to check it out he has the source up there too:

[mordor:~] ./mc-xen-balancer
[+] hypervisor2 : 0.0 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.11 load and 3 slice(s) running
[+] added test1 on hypervisor3 with 0 CPU time (registered 18.4 as a reference)
[+] added test2 on hypervisor3 with 0 CPU time (registered 19.4 as a reference)
[+] added test3 on hypervisor3 with 0 CPU time (registered 18.3 as a reference)
[+] sleeping for 30 seconds
 
[+] hypervisor2 : 0.0 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.33 load and 3 slice(s) running
[+] updated test1 on hypervisor3 with 0.0 CPU time eaten (registered 18.4 as a reference)
[+] updated test2 on hypervisor3 with 0.0 CPU time eaten (registered 19.4 as a reference)
[+] updated test3 on hypervisor3 with 1.5 CPU time eaten (registered 19.8 as a reference)
[+] sleeping for 30 seconds
 
[+] hypervisor2 : 0.16 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.33 load and 3 slice(s) running
[+] updated test1 on hypervisor3 with 0.0 CPU time eaten (registered 18.4 as a reference)
[+] updated test2 on hypervisor3 with 0.0 CPU time eaten (registered 19.4 as a reference)
[+] updated test3 on hypervisor3 with 1.7 CPU time eaten (registered 21.5 as a reference)
[+] hypervisor3 has 3 threshold overload
[+] Time to see if we can migrate a VM from hypervisor3
[+] VM key : hypervisor3-test3
[+] Time consumed in a run (interval is 30s) : 1.7
[+] hypervisor2 is a candidate for being a host (step 1 : max VMs)
[+] hypervisor2 is a candidate for being a host (step 2 : max load)
trying to migrate test3 from hypervisor3 to hypervisor2 (10.0.0.2)
Successfully migrated test3 !

Authorization plugins for MCollective SimpleRPC

Till now The Marionette Collective has relied on your middleware to provide all authorization and authentication for requests. You’re able to restrict certain middleware users from certain agents, but nothing more fine grained.

In many cases you want to provide much finer grain control over who can do what, some cases could be:

  • A certain user can only request service restarts on machines with a fact customer=acme
  • A user can do any service restart but only on machines that has a certain configuration management class
  • You want to deny all users except root from being able to stop services, others can still restart and start them

This kind of thing is required for large infrastructures with lots of admins all working in their own group of machines but perhaps a central NOC need to be able to work on all the machines, you need fine grain control over who can do what and we did not have this will now. It would also be needed if you wanted to give clients control over their own servers but not others.

Version 0.4.5 will have support for this kind of scheme for SimpleRPC agents. We wont provide a authorization plugin out of the box with the core distribution but I’ve made one which will be available as a plugin.

So how would you write an auth plugin, first a typical agent would be:

module MCollective
    module Agent
         class Service<RPC::Agent
             authorized_by :action_policy
 
             # ....
         end
    end
end

The new authorized_by keyword tells MCollective to use the class MCollective::Util::ActionPolicy to do any authorization on this agent.

The ActionPolicy class can be pretty simple, if it raises any kind of exception the action will be denied.

module MCollective
    module Util
         class ActionPolicy
              def self.authorize(request)
                  unless request.caller == "uid=500"
                      raise("You are not allow access to #{request.agent}::#{request.action}")
                  end
              end
         end
    end
end

This simple check will deny all requests from anyone but Unix user id 500.

It’s pretty simple to come up with your own schemes, I wrote one that allows you to make policy files like the one below for the service agent:

policy default deny
allow   uid=500 *                    *                *
allow   uid=502 status               *                *
allow   uid=600 *                    customer=acme    acme::devserver

This will allow user 500 to do everything with the service agent. User 502 can get the status of any service on any node. User 600 will be able to do any actions on machines with the fact customer=acme that also has the configuration management class acme::devserver on them. Everything else will be denied.

You can do multiple facts and multiple classes in a simple space separated list. The entire plugin to implement such policy controls was only 120 – heavy commented – lines of code.

I think this is a elegant and easy to use layer that provides a lot of functionality. We might in future pass more information about the caller to the nodes. There’s some limitations, specifically about the source of the caller information being essentially user provided so you need to keep that mind.

As mentioned this will be in MCollective 0.4.5.

MCollective Release 0.4.4

I just released version 0.4.4 of The Marionette Collective. This release is primarily a bug fix release addressing issues with log files and general code cleanups.

The biggest change in this release is that controlling the daemon has become better, you can ask it to reload an agent or all agents and a few other bits. Read all about it on the wiki..

Please see the Release Notes, Changelog and Download List for full details.

For background information about the MCollective project please see the project website.

Infrastructure testing with MCollective and Cucumber

Some time ago I showed some sample code I had for driving MCollective with Cucumber. Today I’ll show how I did that with SimpleRPC.

Cucumber is a testing framework, it might not be the perfect fit for systems scripting but you can achieve a lot if you bend it a bit to your will. Ultimately I am building up to using it for testing, but we need to start with how to drive MCollective first.

The basic idea is that you wrote a SimpleRPC Agent for your needs like the one I showed here. The specific agent has a number of tasks it can perform:

  • Install, Uninstall and Update Packages
  • Query NRPE status for a specific NRPE command
  • Start, Stop and Restart Services

These features are all baked into a single agent, perfect for driving from a set of Cucumber features. The sample I will show here is only driving the IPTables agent since that code is public and visible.

First I’ll show the feature I want to build, we’re still concerned with driving the agent here not testing so much – though the steps are tested and idempotent:

Feature: Manage the iptables firewall
 
    Background:
    Given the load balancer has ip address 192.168.1.1
    And I want to update hosts with class /dev_server/
    And I want to update hosts with fact country=de
    And I want to pre-discover the nodes to manage
 
    Scenario: Manage the firewall
        When I block the load balancer
        Then traffic from the load balancer should be blocked
 
        # other tasks like package management, service restarts
        # and monitor tasks would go here
 
        When I unblock the load balancer
        Then traffic from the load balancer should be unblocked

To realize the above we’ll need some setup code that fires up our RPC client and manage options in a single place, we’ll place this in in support/env.rb:

require 'mcollective'
 
World(MCollective::RPC)
 
Before do
    @options = {:disctimeout => 2,
                :timeout     => 5,
                :verbose     => false,
                :filter      => {"identity"=>[], "fact"=>[], "agent"=>[], "cf_class"=>[]},
                :config      => "etc/client.cfg"}
 
    @iptables = rpcclient("iptables", :options => @options)
    @iptables.progress = false
end

First we load up the MCollective code and install it into the Cucumber World, this achieves more or less what include MCollective::RPC would in a Cucumber friendly way.

We then set some sane default options and start our RPC client.

Now we can go onto writing some steps, we store these in step_definitions/mcollective_steps.rb, first we want to capture some data like the load balancer IP and filters:

Given /^the (.+) has ip address (\d+\.\d+\.\d+\.\d+)$/ do |device, ip|
    @ips = {} unless @ips
 
    @ips[device] = ip
end
 
Given /I want to update hosts with fact (.+)=(.+)$/ do |fact, value|
    @iptables.fact_filter fact, value
end
 
Given /I want to update hosts with class (.+)$/ do |klass|
    @iptables.class_filter klass
end
 
Given /I want to pre-discover the nodes to manage/ do
    @iptables.discover
 
    raise("Did not find any nodes to manage") if @iptables.discovered.size == 0
end

Here we’re just creating a table of device names to ips and we manipulate the MCollective Filters. Finally we do a discover and we check that we are actually matching any hosts. If your filters were not matching any nodes the cucumber run would bail out.

Now we want to first do the work to block and unblock the load balancers:

When /^I block the (.+)$/ do |device|
    raise("Unknown device #{device}") unless @ips.include?(device)
 
    @iptables.block(:ipaddr => @ips[device]) 
 
    raise("Not all nodes responded") unless @iptables.stats[:noresponsefrom].size == 0
end
 
When /^I unblock the (.+)$/ do |device|
    raise("Unknown device #{device}") unless @ips.include?(device)
 
    @iptables.unblock(:ipaddr => @ips[device])
 
    raise("Not all nodes responded") unless @iptables.stats[:noresponsefrom].size == 0
end

We do some very basic sanity checks here, simply catching nodes that did not respond and bailing out if there are any. Key is to note that to actually manipulate firewalls on any number of machines is roughly 1 line of code.

Now that we’re able to block and unblock IPs we also need a way to confirm those tasks were 100% done:

Then /^traffic from the (.+) should be blocked$/ do |device|
    raise("Unknown device #{device}") unless @ips.include?(device)
 
    unblockedon = @iptables.isblocked(:ipaddr => @ips[device]).inject(0) do |c, resp|
        c += 1 if resp[:data][:output] =~ /is not blocked/    
    end
 
    raise("Not blocked on: #{unblockedon} / #{@iptables.discovered} hosts") if unblockedon 
    raise("Not all nodes responded") unless @iptables.stats[:noresponsefrom].size == 0
end
 
Then /^traffic from the (.+) should be unblocked$/ do |device|
    raise("Unknown device #{device}") unless @ips.include?(device)
 
    blockedon = @iptables.isblocked(:ipaddr => @ips[device]).inject(0) do |c, resp|
        c += 1 if resp[:data][:output] =~ /is blocked/    
    end
 
    raise("Still blocked on: #{blockedon} / #{@iptables.discovered} hosts") if blockedon 
    raise("Not all nodes responded") unless @iptables.stats[:noresponsefrom].size == 0
end

This code does actual verification that the clients have the IP blocked or not. This code also highlights that perhaps my iptables agent needs some refactoring, I have two if blocks that checks for the existence of a string pattern in the result, I could make the agent return Boolean in addition to human readable results. This would make using the agent easier to use from a program like this.

That’s all there is to it really, MCollective RPC makes reusing code very easy and it makes addressing networks very easy.

 

Monitoring / Infrastructure Testing

The above code demonstrates how using MCollective+Cucumber you can address any number of machines, perform actions and get states within a testing framework. This seems an uncomfortable fit – since Cucumber is a testing framework – but it doesn’t need to be.

Above I am using cucumber to drive actions but it would be great to use this combination to do testing of infrastructure states using something like cucumber-nagios. The great thing that MCollective brings to the table here is that you can have sets of tests that changes behavior with the environment while having the ability to break out of the single box barriers.

With this you can easily write a kind of infrastructure test that transcends machine boundaries. You could check the state of one set of variables on one set of machines, and based on the value of those go and check that other machines are in a state that makes those variables valid variables to have.

We’re able to answer those ‘this machine is doing x, did the admin remember to do y on another machine?’ style questions. Examples of this could be:

  • If the backups are running, did the cron job that takes a database out of the service pool get run? This would flag up at any time, even if someone is doing a manual run of backups.
  • How many Puppet Daemons is currently actively doing manifests on all our nodes, alert if more than 10. Even this simple case is hard – you need a view of the status of an application in real time across many nodes, and requires information from now rather than the usual 5 minute window of Nagios.
  • If there are 10 concurrent puppetd runs happening right now, is the puppet master coping? This test would stay green, and not care for the master until the time comes that there are many puppetd’s doing manifest runs. This way if your backups or sysadmin action pushes the load up on the master the check will stay green, it will only trigger if you’re seeing many Puppet clients running. This could be useful indicators for capacity planning.

These simple cases are generally hard for systems like Nagios to do, it’s hard to track state of many checks, apply logic and then go CRITICAL if a combination of factors combine to give a failure, we can build such test cases with MCollective and Cucumber fairly easily.

The code here does not really show you how to do that per se, but what it does show is how natural and easy it is to interact with your network of hosts via MCollective and Ruby. In future I might post some more code here to show how we can build on these ideas and create test suites as described. As a example a test case for the above Puppet Master example might be:

Feature: Monitor the capacity of the Puppet Master
 
    Background:
    Given we know we can run 10 concurrent Puppet clients
    And the Puppet Master load average should be below 2
 
    Scenario: Monitor the Puppet Master capacity 
        When there are more than usual Puppet clients running
        Then the Puppet Master should have an acceptable load average

Running this under cucumber-nagios we’ll achieve our stated goals.

As a small post note, figuring out how many Puppet Daemons are currently running their manifests is trivial with the Puppet Agent:

p = rpcclient("puppetd")
p.progress = false
 
running = p.status.inject(0) {|c, status| c += status[:data][:running]}
puts("Currently running: #{running}")

$ ruby test.rb
Currently running: 3

Scheduling Puppet with MCollective

Scheduling Puppet runs is a hard problem, you either run the daemon or run it through cron, both have drawbacks. There’s been some discussion about decoupling this or to improve the remote control abilities of Puppet, this is my entry into that discussion.

Running the daemon it’s all about the memory problems of pretty much everything involved, you also suffer if a dom0 reboots as the 20 domU’s on it will pile up and cause huge concurrency runs.

Running from cron you have problems scheduling it nicely, the simplest approach is just to sleep a random period of time, but this means clients don’t always run a predictable time and you still get concurrency issues.

I’ve written an mcollective based Command and Control for Puppet that launches Puppet runs. The aim is to spread the CPU load on my masters out evenly to ensure I can use lower spec machines for masters. Or in my case I can re-use my master machines as monitoring and middleware nodes.

It basically has these features:

  • Discover the list of nodes to manage based on a supplied filter, I have regional masters so I will manage groups of Puppet nodes independently
  • Evenly spreads out the Puppet runs over an interval, if I have 10 nodes and a 30 minute interval I will get a run every 3 minutes.
  • Nodes run at a predictable time every time, even after reboots since the node list is just run through alphabetically. If the node list stays constant you’ll always run at the same time give or take 10 seconds. If nodes get added the behavior will be predictable.
  • Before scheduling a run it checks the overall concurrency of Puppet runs, if it exceeds a limit it will skip a background run. I want to give priority to runs that I run by hand with –test, this ensures that happens.
  • If the client it is about to run ran its Catalog recently – maybe via –test – it will skip that run

The result is pretty good, spreading 6 nodes out over 30 minutes I get a nice even CPU spread, the spike in the graph after the change is when the node itself runs Puppet. The 2nd graph is eth0 network output, the dip is when localhost is running:

The resulting CPU usage is much smoother, there aren’t periods of no CPU usage and there are no spikes caused by nodes bunching up together.

Below output from a C&C session managing 3 machines with an interval of 1 minute and a max concurrency of 1, these machines were still running cron based puppetd so you can see the C&C is not scheduling runs when it hits the concurrency limit due to cron runs:

$ puppetcommander.rb --interval 1 -W /dev_server/ --max-concurrent 1
Wed Mar 17 08:31:29 +0000 2010> Looping clients with an interval of 1 minute(s)
Wed Mar 17 08:31:29 +0000 2010> Restricting to 1 concurrent puppet run(s)
Wed Mar 17 08:31:31 +0000 2010> Found 3 puppet nodes, sleeping for ~20 seconds between runs
Wed Mar 17 08:31:31 +0000 2010> Current puppetds running: 1
Wed Mar 17 08:31:31 +0000 2010> Puppet run for client dev1.my.net skipped due to current concurrency of 1
Wed Mar 17 08:31:31 +0000 2010> Sleeping for 20 seconds
Wed Mar 17 08:31:51 +0000 2010> Current puppetds running: 1
Wed Mar 17 08:31:51 +0000 2010> Puppet run for client dev2.my.net skipped due to current concurrency of 1
Wed Mar 17 08:31:51 +0000 2010> Sleeping for 20 seconds
Wed Mar 17 08:32:12 +0000 2010> Current puppetds running: 0
Wed Mar 17 08:32:12 +0000 2010> Running agent for dev3.my.net
Wed Mar 17 08:32:15 +0000 2010> Sleeping for 16 seconds

There are many advantages to this approach over some other that’s been suggested:

  • No host lists to maintain, it reconfigures itself dynamically on demand.
  • It doesn’t rely on some other on-master fact like signed certificates that breaks models where the CA is separate
  • It doesn’t rely on stored configs that doesn’t work well at scale or on a setup with many regional masters.
  • It doesn’t suffer from issues if a node isn’t available but it’s in your host lists.
  • It understands the state of the entire platform and so you can control concurrency and therefore resources on your master.
  • It’s easy to extend with our own logic or demands, the current version of the code is only 90 lines of Ruby including CLI options parsing.
  • Concurrency control can mitigate other problems. Have a cluster of 10 nodes, don’t want your config change to restart them all at the same time, no problem. Just make sure you only run 2 a time.

In reality this means I can remove 256MB RAM from my master – since I can now run fewer puppetmasterd processes, this will save me $15/month hosting fee on this specific master, it’s small change but always good to control my platform costs.