Select Page

Managing puppetd with mcollective

It’s typical during maintenance windows that you would want to disable puppet, do your work, enable again and do a run. Or perhaps you don’t run puppet all the time, you just want to kick it off during your maintenance window. Doing this with ssh for loops is slow and annoying, here’s a way to target large sums of machines for these actions using mcollective.

Using mcollective‘s discovery features and a suitable agent this is really easy, I’ve written such an agent and made it available on the mcollective-plugins site.

You can see below a sample session with it. In all of the examples below we’re constraining it to hosts with the roles::dev_server puppet class using mcollective discovery. Not shown here is that you can get status as well as use the splay options provided by puppet, see the wiki page for details on that.

First we’ll make sure it’s enabled.

$ mc-puppetd --with-class roles::dev_server enable
Determining the amount of hosts matching filter for 2 seconds .... 1
 
.
 
Finished processing 1 / 1 hosts in 9.81 ms

Now we’ll disable it

$ mc-puppetd --with-class roles::dev_server disable
Determining the amount of hosts matching filter for 2 seconds .... 1
 
.
 
Finished processing 1 / 1 hosts in 3252.13 ms

We’ll attempt a runonce, this should fail because we just disabled the agent.

$ mc-puppetd --with-class roles::dev_server runonce -v
Determining the amount of hosts matching filter for 2 seconds .... 1
 
dev1.your.net                      status=false
    Lock file exists                        
 
 
---- puppetd agent stats ----
           Nodes: 1 / 1
      Start Time: Sun Nov 29 23:02:30 +0000 2009
  Discovery Time: 2006.38ms
      Agent Time: 47.62ms
      Total Time: 2054.00ms

Let’s enable it and then try to run again.

$ mc-puppetd --with-class roles::dev_server enable
Determining the amount of hosts matching filter for 2 seconds .... 1
 
.
 
Finished processing 1 / 1 hosts in 9.81 ms
 
$ mc-puppetd --with-class roles::dev_server runonce
Determining the amount of hosts matching filter for 2 seconds .... 1
 
.
 
Finished processing 1 / 1 hosts in 2801.82 ms

I think this is a good way to orchestrate these type of maintenance window and I hope someone finds it useful.

MCollective Security with ActiveMQ

As part of rolling out mcollective you need to think about security. The various examples in the quick start guide and on this blog has allowed all agents to talk to all nodes all agents. The problem with this approach is that should you have untrusted users on a node they can install the client applications and read the username/password from the server config file and thus control your entire architecture.

Since revision 71 of trunk the structure of messages has changed to be compatible with ActiveMQ authorization structure, I’ve also made the structure of the message targets configurable. The new default format is compatible with ActiveMQ wildcard patterns and so we can now do fine grained controls over who can speak to what.

General information about ActiveMQ Security can be found on their wiki.

The default message targets looks like this:

/topic/mcollective.agentname.command
/topic/mcollective.agentname.reply

The nodes only need read access to the command topics and only need write access to the reply topics. The examples below also give them admin access so these topics can be created dynamically. For simplicity we’ll wildcard the agent names, you could go further and limit certain nodes to only run certain agents. Adding these controls effectively means anyone who gets onto your node will not be able to write to the command topics and so will not be able to send commands to the rest of the collective.

There’s one special case and that’s the registration topic, if you want to enable the registration feature you should give the nodes access to write on the command channel for the registration agent. Nothing should reply on the registration topic so you can limit that in the ActiveMQ config.

We’ll let mcollective log in as the mcollective user, create a group called systemusers, we’ll then give the systemsuser group access to run as a typical registration enabled mcollective node.

The rip user is a mcollective admin and can create commands and receive replies.

First we’ll create users and the groups.

<simpleAuthenticationPlugin>
 <users>
  <authenticationUser username="mcollective" password="pI1SkjRi" groups="mcollectiveusers,everyone"/>
  <authenticationUser username="rip" password="foobarbaz" groups="admins,everyone"/>
 </users>
</simpleAuthenticationPlugin>

Now we’ll create the access rights:

<authorizationPlugin>
  <map>
    <authorizationMap>
      <authorizationEntries>
        <authorizationEntry queue="mcollective.>" write="admins" read="admins" admin="admins" />
        <authorizationEntry topic="mcollective.>" write="admins" read="admins" admin="admins" />
        <authorizationEntry topic="mcollective.*.reply" write="mcollectiveusers" admin="mcollectiveusers" />
        <authorizationEntry topic="mcollective.registration.command" write="mcollectiveusers" read="mcollectiveusers" admin="mcollectiveusers" />
        <authorizationEntry topic="mcollective.*.command" read="mcollectiveusers" admin="mcollectiveusers" />
        <authorizationEntry topic="ActiveMQ.Advisory.>" read="everyone,all" write="everyone,all" admin="everyone,all"/>
      </authorizationEntries>
    </authorizationMap>
  </map>
</authorizationPlugin>

You could give just the specific node that runs the registration agent access to mcollective.registration.command to ensure the secrecy of your node registration.

Finally the nodes need to be configured, the server.cfg should have the following at least:

topicprefix = /topic/mcollective
topicsep = .
plugin.stomp.user = mcollective
plugin.stomp.password = pI1SkjRi
plugin.psk = aBieveenshedeineeceezaeheer

For my clients I can use the ability to configure the user details in my shell environment:

export STOMP_USER=rip
export STOMP_PASSWORD=foobarbaz
export STOMP_SERVER=stomp1
export MCOLLECTIVE_PSK=aBieveenshedeineeceezaeheer

And finally the rip user when logged into a shell with these variables have full access to the various commands. You can now give different users access to the entire collective or go further and give a certain admin user access to only run certain agents by limiting the command topics they have access to. Doing the user and password settings in shell environments means it’s not kept in any config file in /etc/ for example.

Registration in MCollective

Since rolling out mcollective to more and more machines I sometimes noticed one or two weren’t checking in and found it hard to figure out which ones it was. One person evaluating it also expressed interest in some form of registration ability so that they can build up an inventory of what is out there using mcollective.

At first it seemed a bit against what I set out to do – no central database, use discovery instead – but I think the two compliment each other well, I still use discovery to actually interact with the network, registration is there to assist in building web interfaces or other inventories.

I added the ability to call a configurable plugin at a configurable interval, basically whatever data your plugin returns will be sent to the collective directed at an agent ‘registration’. A sample plugin is provided, it simply returns a list of agents as an array and you can see how trivial it is to write your own.

Using the registration system I wrote a plugin that simply keeps a file in a directory for each member and a simple nagios check will then report if there are any files older than registration interval + 30. It’s quite simple but works well, the moment one of my machines goes silent the monitor goes red.

You can grab the agent and monitor script here.

Note that whatever work your registration agent will do need to be fast, you’ll be getting a large amount of registration messages from all over your network so if you take many seconds to process each you’ll run into problems. You can get some more details about registration on the wiki page

ActiveMQ Clustering

As part of deploying MCollective + ActiveMQ instead of my old Spread based system I need to figure out a multi location setup, the documentation says I’d possible so I thought I better get down and figure it out.

In my case I will have per-country ActiveMQ’s, I’ve had the same with Spread in the past and it’s proven reliable enough for my needs, each ActiveMQ will carry 30 or so nodes.

ActiveMQ Cluster

ActiveMQ Cluster

The above image shows a possible setup, you can go much more complex, you can do typical hub-and-spoke setups, a fully meshed setup or maybe have a local one in your NOC etc, ActiveMQ is clever enough not to create message loops or storms if you create loops so you can build lots of resilient routes.

ActiveMQ calls this a Network of Brokers and the minimal docs can be found here. They also have docs on using SSL for connections, you can encrypt the inter DC traffic using that.

I’ll show sample config below of the one ActiveMQ node, the other would be identical except for the IP of it’s partner.ย  The sample uses authentication between links as I think you really should be using auth everywhere.

   <broker xmlns="http://activemq.org/config/1.0" brokerName="your-host" useJmx="true"
      dataDirectory="${activemq.base}/data">
 
      <transportConnectors>
         <transportConnector name="openwire" uri="tcp://0.0.0.0:6166"/>
         <transportConnector name="stomp"   uri="stomp://0.0.0.0:6163"/>
      </transportConnectors>

These are basically your listeners, we want to accept Stomp and OpenWire connections.

Now comes the connection to the other ActiveMQ server:

<networkConnectors>
   <networkConnector name="amq1-amq2" uri="static:(tcp://192.168.1.10:6166)" userName="amq" password="Afuphohxoh"/>
</networkConnectors>

This sets up a connection to the remote server at 192.168.1.10 using username amq and password Afuphohxoh. You can also designate failover and backup links, see the docs for samples. If you’re building lots of servers talking to each other you should give every link on every server a unique name. Here I called it amq1_amq2 for comms from a server called amq1 to amq2, this is a simple naming scheme that ensures things are unique.

Next up comes the Authentication and Authorization bits, this sets up the amq user and an mcollective user that can use the topic /topic/mcollective.*. More about ActiveMQ’s security model can be found here.

    <plugins>
      <simpleAuthenticationPlugin>
        <users>
          <authenticationUser username="amq" password="Afuphohxoh" groups="admins,everyone"/>
          <authenticationUser username="mcollective" password="pI1jweRV" groups="mcollectiveusers,everyone"/>
        </users>
      </simpleAuthenticationPlugin>
      <authorizationPlugin>
        <map>
          <authorizationMap>
            <authorizationEntries>
              <authorizationEntry queue=">" write="admins" read="admins" admin="admins" />
              <authorizationEntry topic=">" write="admins" read="admins" admin="admins" />
              <authorizationEntry topic="mcollective.>" write="mcollectiveusers" read="mcollectiveusers" admin="mcollectiveusers" />
              <authorizationEntry topic="ActiveMQ.Advisory.>" read="everyone,all" write="everyone,all" admin="everyone,all"/>
            </authorizationEntries>
          </authorizationMap>
        </map>
      </authorizationPlugin>
    </plugins>
  </broker>

If you setup the other node with a setup connecting back to this one you will have bi-directional messages working correctly.

You can now connect your MCollective clients to either one of the servers and everything will work as if you had only one server. ActiveMQ servers will attempt reconnects regularly if the connection breaks.

You can also test using my generic stomp client that I posted in the past

RightScale facts

I’m trying to build up a nice demo of mcollective and trying to save some effort by using the RightScale CentOS AMI’s.ย  I noticed they came with a nice script to pull down the user data and meta data so figured I might as well make some facts.

require 'find'
 
if File.exists?("/var/spool/ec2/meta-data")
    Find.find("/var/spool/ec2/meta-data") do |path|
        filename = File.basename(path)
        factname = "ec2_#{filename}"
 
        factname.gsub!(/-/, "_")
 
        if File.file?(path)
            lines = File.readlines(path)
 
            if lines.size == 1
                Facter.add(factname) do
                    setcode { lines.first.chomp.to_s }
                end
            else
                lines.each_with_index do |line, i|
                    Facter.add("#{factname}_#{i}") do
                        setcode { lines[i].chomp }
                    end
                end
            end
        end
    end
end
 
if File.exists?("/var/spool/ec2/user-data.raw")
        lines = File.readlines("/var/spool/ec2/user-data.raw")
 
        lines.each do |l|
                if l.chomp =~ /(.+)=(.+)/
                    f = $1; v = $2
 
                    Facter.add(f) do
                        setcode { v }
                    end
                end
        end
end

If you arrange to run /opt/rightscale/bin/ec2.sh in rc.local and pop this fact above into your factdir you should be able to access all the meta data from facter.

# facter -p
ec2_ami_id => ami-73270c07
ec2_ami_launch_index => 0
ec2_ami_manifest_path => pinetecltd-centos-clustera/cluster-webserver-1257783713.manifest.xml
ec2_ancestor_ami_ids_0 => ami-31c72258
ec2_ancestor_ami_ids_1 => ami-ef01e486
ec2_ancestor_ami_ids_2 => ami-0916f360
ec2_ancestor_ami_ids_3 => ami-c8ac48a1
ec2_ancestor_ami_ids_4 => ami-cd52b6a4
ec2_ancestor_ami_ids_5 => ami-19be966d
ec2_ancestor_ami_ids_6 => ami-65200b11
ec2_ancestor_ami_ids_7 => ami-3d200b49
ec2_ancestor_ami_ids_8 => ami-91200be5
ec2_ancestor_ami_ids_9 => ami-81200bf5
ec2_block_device_mapping_ami => sda1
ec2_block_device_mapping_ephemeral0 => sdb
ec2_block_device_mapping_ephemeral1 => sdc
ec2_block_device_mapping_ephemeral2 => sdd
ec2_block_device_mapping_ephemeral3 => sde
ec2_block_device_mapping_root => /dev/sda1
ec2_block_device_mapping_swap => sda3
ec2_hostname => ip-10-227-43-134.eu-west-1.compute.internal
ec2_instance_action => none
ec2_instance_id => i-9411e7e3
ec2_instance_type => m1.small
ec2_kernel_id => aki-7e0d250a
ec2_local_hostname => ip-10-227-43-134.eu-west-1.compute.internal
ec2_local_ipv4 => 10.227.43.134
ec2_placement_availability_zone => eu-west-1b
ec2_public_hostname => ec2-79-125-33-224.eu-west-1.compute.amazonaws.com
ec2_public_ipv4 => 79.125.33.224
ec2_public_keys_0_openssh_key => ssh-rsa AAA
ec2_ramdisk_id => ari-7d0d2509
ec2_reservation_id => r-c655bab1
ec2_security_groups_0 => rip
ec2_security_groups_1 => defaultcluster => a

In addition if you just pass nice key=val pairs in as user data it will add those as facts too, the last above is from that.