Select Page

MCollective Components, Terminology and Flow

I often see some confusion about terminology in use in MCollective, what the major components are, where software needs to be installed etc.

I attempted to address this in a presentation and screen cast covering:

  • What middleware is and how we use it.
  • The major components and correct terminology.
  • Anatomy of a request life cycle.
  • And an actual look inside the messages we sent and receive.

You can grab the presentation from Slideshare or view a video of it on blip.tv. Below find an embedded version of the slideshare deck including audio. I suggest you view it full screen as there’s some code in it.

EC2 Bootstrap Helper

I’ve been working a bit on streamlining the builds I do on EC2 and wanted a better way to provision my machines. I use CentOS and things are pretty rough to non existent for nicely built EC2 images. I’ve used the Rightscale ones till now and while they’re nice they are also full of lots of code copyrighted by Rightscale.

What I really wanted was something as full featured as Ubuntu’s CloudInit but also didn’t feel much like touching any Python. I hacked up something that more or less do what I need. You can get it on GitHub. It’s written and tested on CentOS 5.5.

The idea is that you’ll have a single multi purpose AMI that you can easily bootstrap onto your puppet/mcollective infrastructure using this system. Below for some details.

I prepare my base CentOS AMI with the following mods:

  • Install Facter and Puppet – but not enabled
  • Install the EC2 utilities
  • Setup the usual getsshkeys script
  • Install the ec2-boot-init RPM
  • Add a custom fact that reads /etc/facts.txt – see later why. Get one here.

With this in place you need to create some ruby scripts that you will use to bootstrap your machines. Examples of this would be to install mcollective, configure it to find your current activemq. Or to set up puppet and do your initial run etc.

We host these scripts on any webserver – ideally S3 – so that when a machine boots it can grab the logic you want to execute on it. This way you can bug fix your bootstrapping without having to make new AMIs as well as add new bootstrap methods in future to existing AMIs.

Here’s a simple example that just runs a shell command:

newaction("shell") do |cmd, ud, md, config|
    if cmd.include?(:command)
        system(cmd[:command])
    end
end

You want to host this on any webserver in a file called shell.rb. Now create a file list.txt in the same location that just have this:

shell.rb

You can list as many scripts as you want. Now when you boot your instance pass it data like this:

--- 
:facts: 
  role: webserver
:actions: 
- :url: http://your.net/path/to/actions/list.txt
  :type: :getactions
- :type: :shell
  :command: date > /tmp/test

The above will fetch the list of actions – our shell.rb – from http://your.net/path/to/actions/list.txt and then run using the shell action the command date > /tmp/test. The actions are run in order so you probably always want getactions to happen first.

Other actions that this script will take:

  • Cache all the user and meta data in /var/spool/ec2boot
  • Create /etc/facts.txt with all your facts that you passed in as well as a flat version of the entire instance meta data.
  • Create a MOTD that shows some key data like AMI ID, Zone, Public and Private hostnames

The boot library provides a few helpers that help you write scripts for this environment specifically around fetching files and logging:

    ["rubygems-1.3.1-1.el5.noarch.rpm",
     "rubygem-stomp-1.1.6-1.el5.noarch.rpm",
     "mcollective-common-#{version}.el5.noarch.rpm",
     "mcollective-#{version}.el5.noarch.rpm",
     "server.cfg.templ"].each do |pkg|
        EC2Boot::Util.log("Fetching pkg #{pkg}")
        EC2Boot::Util.get_url("http://foo.s3.amazonaws.com/#{pkg}", "/mnt/#{pkg}")
     end

This code fetches a bunch of files from a S3 bucket and save them into /mnt. Each one gets logged to console and syslog. Using this GET helper has the advantage that it has sane retrying etc built in for you already.

It’s fairly early days for this code but it works and I am using it, I’ll probably be adding a few more features soon, let me know in comments if you need anything specific or even if you find it useful.

MCollective Data Definition Language

I mentioned in my recent post about mcollective Road Map about the DDL.

The DDL is used to describe agents in a way that is accessible by other programs, web applications, client libraries and so forth to help those various client tools to configure themselves correctly.

An actual example of a DDL file can be found here if you want to have a good look at it and full docs here.

I’ve created a short video showing the DDL and some of the features of the upcoming 0.4.7 release, you probably want to view it full screen to really see what’s going on.

And a quick note about the colors, I know people tend to feel strongly about this kind of thing, you can disable them in the config file of the client ๐Ÿ™‚

This is also my first attempt at using blip.tv, please let me know if you see any problems.

MCollective pgrep

The unix pgrep utility is great, it lets you grep through your process list and find interesting things. I wanted to do something similar but for my entire server group so built something quick ontop of MCollective.

I am using the Ruby sys-proctable gem to do the hard work, it returns a massive amount of information about each process and have written a simple agent on top of this.

The agent supports grepping the process tree but also supports kill and pgre+kill though I have not yet implemented more than the basic grep on the command line. Frankly the grep+kill combination scares me and I might remove it. A simple grep slipup and you will kill all processes on all your machine ๐Ÿ™‚ Sometimes too much power is too much and should just be avoided.

At the moment mc-pgrep outputs a set format but I intend to make that configurable on the command line, here’s a sample:

% mc-pgrep -C /dev_server/ ruby
 
 * [ ============================================================> ] 4 / 4
 
dev1.my.com
       root   9833  ruby /usr/sbin/mcollectived --pid=/var/run/mcollectived.pid 
       root  21608  /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_pass
 
dev2.my.com
       root  14568  /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_pass
       root  31595  ruby /usr/sbin/mcollectived --pid=/var/run/mcollectived.pid 
 
dev3.my.com
       root   1620  /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_pass
       root  14093  ruby /usr/sbin/mcollectived --pid=/var/run/mcollectived.pid 
 
dev4.my.com
       root   3231  /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_pass
       root  20557  ruby /usr/sbin/mcollectived --pid=/var/run/mcollectived.pid 
 
   ---- process list stats ----
        Matched hosts: 4
    Matched processes: 8
        Resident Size: 37.264KB
         Virtual Size: 629.578MB

You can also limit it to only find zombies with the -z option.

This has been quite interesting for me, if I limit the pgrep to “.” (the pattern is regex) every machine will send back a Sys::ProcTable hash for all its processes. This is a 50 to 70 KByte payload per server. I’ve so far seen no problem getting his much traffic through ActiveMQ + MCollective and processing it all in a very short time:

% time mc-pgrep -F "country=/uk|us/" .
 
   ---- process list stats ----
        Matched hosts: 20
    Matched processes: 1958
        Resident Size: 1.777MB
         Virtual Size: 60.072GB
 
mc-pgrep -F "country=/uk|us/" .  0.19s user 0.06s system 7% cpu 3.420 total

That 3.4 seconds is with a 2 second discovery overhead client machine in Germany and the filter matching UK and US machines – all the way to the West Coast – my biggest delay here is network and not MC or ActiveMQ.

The code can be found at my GitHub account and still a bit of a work in progress, wiki pages will follow once I am happy with it.

And as an aside, I am slowly migrating at least my code to GitHub if not wiki and ticketing. So far my Plugins have moved, MC will move soon too.

DKIM with CentOS 5 and Exim

DomainKeys Identified Mail – DKIM – is a recent attempt to add some sender verification to email. Read more here, here and in the RFC 4871 to get some background info.

If you’re sending any newsletters you really want to be investigating this, if you’re doing anti spam it’s good to start looking at tracking this and really everyone should have DKIM on their domains. Exim recently – as of 4.70 – have decent support for it but CentOS is still on 4.63 thanks to RHEL.

To get a new Exim on your CentOS machine I suggest just using ATrpms who as of writing has 4.71 packages available for Exim and the other bits you need. I needed:

exim-4.71-40.el5.i386.rpm
exim-mysql-4.71-40.el5.i386.rpm
libspf2_2-1.2.5-5.0.el5.i386.rpm
libsrs_alt1-1.0-3_rc1.0.el5.i386.rpm

As well as the 64bit versions, you can just add ATrpms to your systems but really you should have your own repos and carefully control the packages that goes out to your estate.

Once you have upgraded your stock Exim to these versions – it’s a totally clean and compatible upgrade – configuring Exim to automagically sign outgoing mail with DKIM is pretty easy. We’ll make it so it looks for keys in a specific location based on outgoing mail domain so if you’re a relay for many domains you just need to drop down the certs.

Put the following near the top of our /etc/exim/exim.conf file, this just sets some macros we’ll use later on:

DKIM_DOMAIN = ${lc:${domain:$h_from:}}
DKIM_FILE = /etc/exim/dkim/${lc:${domain:$h_from:}}.pem
DKIM_PRIVATE_KEY = ${if exists{DKIM_FILE}{DKIM_FILE}{0}}

This will use, based on sender domain, a private key in /etc/exim/dkim/sender_domain.pem. By default exim just logs DKIM verification, doesn’t block any incoming mail I won’t cover doing blocks here just sending.

Next find your remote_smtp transport later in the file and change it to look like this:

remote_smtp:
  driver = smtp
  dkim_domain = DKIM_DOMAIN
  dkim_selector = x
  dkim_private_key = DKIM_PRIVATE_KEY
  dkim_canon = relaxed
  dkim_strict = 0

This will make Exim do the DKIM signing on outgoing mail but only if it can find a certificate.

To make the certificates is pretty easy, we’ll use a domain example.com:

$ mkdir /etc/exim/dkim/ && cd /etc/exim/dkim/
$ openssl genrsa -out example.com.pem 1024 
$ openssl rsa -in example.com.pem -out example.com-public.pem -pubout -outform PEM

All that’s left now is to update your dns, sticking to example.com you’d add something like this into your bind zone file the text to add after p= is the stuff you’ll find in the public key called example.com-public.pem in our example:

x._domainkey     IN      TXT     "v=DKIM1\; t=y\; k=rsa\; p=MIGfMA0<snip>AQAB"
_domainkey       IN      TXT     "t=y\; o=~\;"

The x matches up with your dkim_selector in the SMTP transport above. The t=y tells the world you’re still testing your setup so remove that only when you’re all 100% certain it works. The o=~ tells everyone you will sign only some mail. You can make that o=- if all mail from you would be signed.

You can verify your DNS is right like this:

$ dig +short txt x._domainkey.example.com
"v=DKIM1\; k=rsa\; p=MIGfMA0<snip>AQAB"

And finally if you’re sending mail you should now see a header in the mail like this:

DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=example.com; s=x;
	h=From:To:Message-Id:Date; bh=g3zLY<snip>5uGs=; b=fonAB<snip>bceHhQ==;

Finally you can send an email to check-auth@verifier.port25.com and it will reply with all sorts of test output about your domain including DKIM validation details.

Puppet Concat 20100507

I’ve had quite a lot of contributions to my Puppet Concat module and after some testing by various people I’m ready to do a new release.

Thanks to Paul Elliot, Chad Netzer and David Schmitt for patches and assistance.

For background of what this is about please see my earlier post: Building files from fragments with Puppet

You can download the release here. Please pay special attention to the upgrade instructions below.

Changes in this release

  • Several robustness improvements to the helper shell script.
  • Removed all hard coded paths in the helper script to improve portability.
  • We now use file{} to copy the combined file to its location. This means you can now change the ownership of a file by just changing the owner/group in concat{}.
  • You can specify ensure => “/some/other/file” in concat::fragment to include the contents of another file in the fragment. Even files not managed by puppet.
  • The code is now hosted on Github and we’ll accept patches there.

Upgrading

When upgrading to this version you need to take particular care. All the fragments are now owned by root, the shell script runs as root and we use file{} to copy the resulting file out.

This means you’ll see the diff of not just the fragments but also the final file when running puppetd –test but unfortunately it also means the first time you run puppet with the new code your Puppet will fire off all notifies that you have on your concat{} resources. You’ll also see a lot of changes to resources in the fragments directory on first run. This is normal and expected behavior.

So if say you’re using the concat to create my.cf and notify the service to restart automatically then simply upgrading this module will result in MySQL restarting. This is a one off notify that happens only the first time, from then on it will be as normal. So I’d suggest when upgrading to disable those notifies till this upgrade is running everywhere and then put it back.