Since my last post I’ve spoken to a lot of people all excited to see something fresh in the monitoring space. I’ve learned a lot – primarily what I learned is that no one tool will please everyone. This is why monitoring systems are so hated – they try to impose their world view, they’re hard to hack on and hard to get data out. This served only to reinforce my believe that rather than build a new monitoring system I should build a framework that can build monitoring systems. <\/p>\n
DevOps shops who can cut code, should be able to build the monitoring they want, not the monitoring their vendor thought they want.<\/em><\/p>\n
Thus my focus has not been on how can I declare relationships between services, or how can I declare an escalation matrix. My focus has been on events and how events relate to each other.<\/p>\n
Identifying an Event<\/strong>
\nEvents can come from many places, in the recent video demo I did<\/a> you saw events from Nagios and events from MCollective. I also have event bridges for my Apache Blackbox<\/a>, SNMP Traps and it would be trivial to support events from GitHub commit hooks, Amazon SNS<\/a> and really any conceivable source.<\/p>\n
The event you see on the right is a metric event – it doesn’t represent one specific status and it’s a time series event which in this case got fed into Graphite.<\/p>\n
Status events get tracked automatically – a representation is built for each unique event based on its subject and name. This status representation can progress through states like OK, Warning, Critical etc. Events sent from many different sources gets condensed and summarized into a single status representing how that status looks based on most recent received data – regardless of source of the data. <\/p>\n
Each state transition and each non 0 severity event will raise an Alert and get routed to a – pluggable – notification framework or frameworks.<\/p>\n
Event Associations and Metadata<\/strong><\/p>\n
Events can have a lot of additional data past what the framework needs, this is one of the advantages of NoSQL based storage. A good example of this would be a GitHub commit hook<\/a>. You might want to store this and retain the rich data present in this event.<\/p>\n
Thanks to conversations with @unixdaemon<\/a> I’ve now added the ability to tag events with some additional data. If you are emitting many events from many subsystems out of a certain server you might want to embed into the events the version of software currently deployed on your machine. This way you can easily identify and correlate events before and after an upgrade.<\/p>\n