This ia a post in a series of posts I am doing about MCollective 2.0 and later.
We’ve discussed Direct Addressing Mode before and today I’ll show one of the new features this mode enables.
Overview
MCollective is very fast which is great usually. Sometimes though when you’re restarting webservers the speed and concurrency can be a problem. Restarting all your webservers at the same time is generally a bad idea.
In the past the general way to work around this was using a fact like cluster=a to cut your server estate into named groups and then only address them based on that. This worked OK but was clearly not the best possibly outcome.
Apart from this the concurrency also meant that once a request is sent you cannot ^C out of it. Any mistake made is final and processing cannot be interrupted.
Since MCollective 2.0 has the ability to address nodes directly without broadcasting it has become much easier to come up with a good solution to these problems. You can now construct RPC requests targeted at 100s of nodes but ask MCollective to communicate with them in smaller batches with a configurable sleep in between batches. You can ^C at any time and only batches that has already received requests will be affected.
Using on the CLI
Using this feature on the CLI is pretty simple, all RPC clients have some new CLI options:
% mco service restart httpd --batch 10 --batch-sleep 2 Discovering hosts using the mongo method .... 26 * [============================================================> ] 26 / 26 . . . Finished processing 26 / 26 hosts in 6897.66 ms |
What you will see when running it on the CLI is that the progress bar will progress in groups of 10, pause 2 seconds and then do the next 10. In this case you could ^C at any time and only the machines in earlier batches and the 10 of the current batches will have restarted, future nodes would not yet be affected in any way.
Under the hood MCollective detects that you want to do batching then force the system into Direct Addressing Mode and makes batches of requests. The requestid stays the same throughout, auditing works, results work exactly as before and display behaviour does not change apart from progressing in steps.
Using in code
Naturally you can also use this from your own code, here’s a simple script that does the same thing as above.
1 2 3 4 5 6 7 8 9 10 11 |
#!/usr/bin/ruby require 'mcollective' include MCollective::RPC svcs = rpcclient("service") svcs.batch_size = 10 svcs.batch_sleep_time = 2 printrpc svcs.restart(:service => "httpd") |
The key lines here are lines 8 and 9 that has the same behaviour as –batch and –batch-sleep