{"id":3794,"date":"2018-08-13T12:37:09","date_gmt":"2018-08-13T11:37:09","guid":{"rendered":"https:\/\/www.devco.net\/?p=3794"},"modified":"2018-08-16T14:12:39","modified_gmt":"2018-08-16T13:12:39","slug":"mass-provisioning-choria-servers","status":"publish","type":"post","link":"https:\/\/www.devco.net\/archives\/2018\/08\/13\/mass-provisioning-choria-servers.php","title":{"rendered":"Mass Provisioning Choria Servers"},"content":{"rendered":"
The Choria Server is the agent component of the Choria Orchestrator system, it runs on every node and maintains a connection to the middleware.<\/p>\n
Traditionally we’ve configured it using Puppet along with its mcollective compatibility layer. We intend to keep this model for the foreseeable future. Choria Server though has many more uses – it’s embeddable so can be used in IoT, tools like our go-backplane, side cars in kubernetes in more. In these and other cases the Puppet model do not work:<\/p>\n
In all these cases there are real complex problems to solve in configuring Choria Server. We’ve built a system that can help solve this problem, it’s called the Choria Server Provisioner and this post introduce it.<\/p>\n
The Provisioner is inspired by old school bootstrap PXE networks – unconfigured nodes would join a VLAN where they will do network boot and get their configuration, once configured they reboot into the right VLAN where they will be production servers. <\/p>\n
As in that model Choria has a mode called Provisioning Mode where it will use compiled in defaults for it’s bootstrap configuration – essentially to find its equivalent of a PXE VLAN – and then it will allow programatic configuration.<\/p>\n
\r\n% choria buildinfo\r\nChoria build settings:\r\n\r\nBuild Data:\r\n Version: 0.5.1\r\n Git SHA: 7d0a215\r\n Build Date: 2018-08-10 10:26:42 +0000\r\n License: Apache-2.0\r\n Go Version: go1.10.2\r\n\r\nNetwork Broker Settings:\r\n Maximum Network Clients: 50000\r\n Embedded NATS Server Version: 1.2.0\r\n\r\nServer Settings:\r\n Provisioning Brokers: prov.example.net:4222\r\n Provisioning Default: true\r\n Default Provisioning Agent: true\r\n Provisioning TLS: false\r\n Provisioning Registration Data: \/etc\/choria\/metadata.json\r\n Provisioning Facts: \/etc\/choria\/metadata.json\r\n Provisioning Token: set\r\n\r\nAgent Providers:\r\n Golang MCollective Agent Compatibility version 0.0.0\r\n\r\nSecurity Defaults:\r\n TLS: true\r\n x509 Security: true\r\n<\/pre>\nHere under Server Settings<\/em> you can see the compiled in defaults. When this server starts up without a configuration that specifically prevent provisioning mode it will connect to prov.example.net:4222<\/em> without TLS, in that mode it will only connect to the provisioning<\/em> sub collective and it will publish periodically its \/etc\/choria\/metadata.json<\/em> to the topic choria.provisioning_data<\/em>. <\/p>\nIn the next release of Choria the method for discovering the provisioning broker is pluggable, so you can supply any logic you wish rather than use this single compile time flag.<\/p>\n
It will have an agent choria_provision<\/em> running that expose actions to request a CSR, configure it, restart it and more.<\/p>\nIt will then wait until some process starts interacting with it and eventually give it a configuration file and ask it to restart. Once restarted it will join it’s real home and continue there as a normal server. This is where the Choria Server Provisioner come in.<\/p>\n
Choria Server Provisioner<\/H2><\/p>\n
As you saw above the Choria Server will connect to a specific broker and sit in a provisioning<\/em> sub collective waiting to be managed. We wrote a generic high performance manager that lets you plug your logic into it and it will configure your nodes. In our tests with a very fast helper script this process is capable of provisioning many thousands of machines a minute – many more than any cloud will allow you to boot.<\/p>\nThe basic flow that the provisioner has is this:<\/p>\n
On startup it will:<\/p>\n
\n- start to listen for events on the topic choria.provisioning_data<\/em><\/li>\n
- do a discover on the provisioning<\/em> sub collective and keep doing it on regular intervals<\/li>\n<\/ul>\n
Any nodes identified using any of these 2 methods are added to the work queue where one of the configured number of workers will start provisioning them, this per worker flow is:<\/p>\n
\n- Fetch the inventory using rpcutil#inventory<\/li>\n
- Request a CSR if the PKI feature is enabled using choria_provision#gencsr<\/li>\n
- Call the helper with the inventory and CSR, expecting to be configured<\/li>\n
\n- If the helper sets defer to true the node provisioning is ended and next cycle will handle it<\/li>\n
- Helper returns a configuration, signed certificate and CA chain in JSON format<\/li>\n<\/ol>\n
- Configure the node using choria_provision#configure<\/li>\n
- Restart the node using choria_provision#restart<\/li>\n<\/ol>\n
You can see here this is a generic flow and all the magic is left up to a helper, so lets look at the helper in detail.<\/p>\n
The helper is simply a script or program written in any configuration language that receives node specific JSON on STDIN and returns JSON on its STDOUT.<\/p>\n
The input JSON looks something like this:<\/p>\n
\r\n{\r\n \"identity\": \"node1.example.net\",\r\n \"csr\": {\r\n \"csr\": \"-----BEGIN CERTIFICATE REQUEST-----....-----END CERTIFICATE REQUEST-----\",\r\n \"ssldir\": \"\/path\/to\/ssldir\"\r\n },\r\n \"inventory\": \"{\\\"agents\\\":[\\\"choria_provision\\\",\\\"choria_util\\\",\\\"discovery\\\",\\\"rpcutil\\\"],\\\"facts\\\":{},\\\"classes\\\":[],\\\"version\\\":\\\"0.5.1\\\",\\\"data_plugins\\\":[],\\\"main_collective\\\":\\\"provisioning\\\",\\\"collectives\\\":[\\\"provisioning\\\"]}\"\r\n}\r\n<\/pre>\nIn this example the PKI feature is enabled and the CSR seen here was created by the node in question – and it kept its private key secure there never transferring it anywhere. The inventory is what you would get if you ran mco rpc rpcutil inventory -I node1.example.net<\/em>, here the main thing you’d look at is the facts<\/em> which would be all the metadata found in \/etc\/choria\/metadata.json<\/em>.<\/p>\nThe helper then is any program that outputs JSON resembling this:<\/p>\n
\r\n{\r\n \"defer\": false,\r\n \"msg\": \"Reason why the provisioning is being defered\",\r\n \"certificate\": \"-----BEGIN CERTIFICATE-----......-----END CERTIFICATE-----\",\r\n \"ca\": \"-----BEGIN CERTIFICATE-----......-----END CERTIFICATE-----\",\r\n \"configuration\": {\r\n \"plugin.choria.server.provision\": \"false\",\r\n \"identity\": \"node1.example.net\"\r\n }\r\n}\r\n<\/pre>\nHere’s a bit of code showing CFSSL integration and country specific configuration:<\/p>\n
\r\nrequest = JSON.parse(STDIN.read)\r\nrequest[\"inventory\"] = JSON.parse(request[\"inventory\"])\r\n\r\nreply = {\r\n \"defer\" => false,\r\n \"msg\" => \"\",\r\n \"certificate\" => \"\",\r\n \"ca\" => \"\",\r\n \"configuration\" => {}\r\n}\r\n\r\nidentity = request[\"identity\"]\r\n\r\nif request[\"csr\"] && request[\"csr\"][\"csr\"]\r\n ssldir = request[\"csr\"][\"ssldir\"]\r\n\r\n # save the CSR\r\n File.open(\"%s.csr\" % identity, \"w\") do |f|\r\n f.puts request[\"csr\"][\"csr\"]\r\n end\r\n\r\n # sign the CSR using CFSSL\r\n signed = %x[cfssl sign -ca ca.pem -ca-key ca-key.pem -loglevel 5 #{identity}.csr 2>&1]\r\n signed = JSON.parse(signed)\r\n abort(\"No signed certificate received from cfssl\") unless signed[\"cert\"]\r\n\r\n # Store the CA and the signed cert in the reply\r\n reply[\"ca\"] = File.read(\"ca.pem\")\r\n reply[\"certificate\"] = signed[\"cert\"]\r\n\r\n # Create security configuration customised to the SSL directory the server chose\r\n reply[\"configuration\"].merge!(\r\n \"plugin.security.provider\" => \"file\",\r\n \"plugin.security.file.certificate\" => File.join(ssldir, \"certificate.pem\"),\r\n \"plugin.security.file.key\" => File.join(ssldir, \"private.pem\"),\r\n \"plugin.security.file.ca\" => File.join(ssldir, \"ca.pem\"),\r\n \"plugin.security.file.cache\" => File.join(ssldir, \"cache\")\r\n )\r\nend\r\n<\/pre>\nWith that out of the way lets create the rest of our configuration, we’re going to look at per country specific brokers here:<\/p>\n
\r\ncase request[\"inventory\"][\"facts\"][\"country\"]\r\nwhen \"mt\"\r\n broker = \"choria1.mt.example.net:4223\"\r\nwhen \"us\"\r\n broker = \"choria1.us.example.net:4223\"\r\nelse\r\n broker = \"choria1.global.example.net:4223\"\r\nend\r\n\r\nreply[\"configuration\"].merge!(\r\n \"identity\" => identity,\r\n \"plugin.choria.middleware_hosts\" => broker,\r\n \"classesfile\" => \"\/opt\/puppetlabs\/puppet\/cache\/state\/classes.txt\",\r\n \"collectives\" => \"mcollective\",\r\n \"loglevel\" => \"warn\",\r\n \"plugin.yaml\" => \"\/tmp\/mcollective\/generated-facts.yaml\",\r\n \"plugin.choria.server.provision\" => \"false\",\r\n)\r\n\r\nputs reply.to_json\r\n<\/pre>\nThe configuration is simply Choria configuration as key value pairs – all strings. With the provisioning mode on by default you must disable it specifically so be sure to set plugin.choria.server.provision=false<\/em>.<\/p>\nYou can see you can potentially integrate into any CA you wish and employ any logic or data source for making the configuration. In this case we used the CFSSL CLI but you’d in reality use its API and I integrate with our asset databases to ensure a node goes with the rest of it’s POD – we have multiple networks per DC and this helps our orchestrators perform better. You could perhaps consider using Jerakia<\/a> for a more suitable store for this than the case statement above.<\/p>\nThe provisioner will expose it’s statistics using Prometheus format and it embeds our Choria Backplane so you can perform actions like Circuit Breaking etc fleet wide.<\/p>\n
<\/center><\/p>\nThis dashboard is available in the GitHub repository.<\/p>\n
Demonstration<\/H2><\/p>\n
I made a video explainer that goes in more detail and show the system in action:<\/p>\n