Monitoring
This section will give recommendations for:
-
Monitoring Rudder itself (besides standard monitoring)
-
Monitoring the state of your configuration management
Monitoring Rudder itself
Monitoring a Node
The monitoring of a node mainly consists in checking that the Node can speak with its policy server, and that the agent is run regularly.
You can use the 'rudder agent health' command to check for communication errors. It will check the agent configuration and look for connection errors in the last run logs. By default it will output detailed results, but you can start it with the '-n' option to enable "nrpe" mode (like Nagios plugins, but it can be used with other monitoring tools as well). In this mode, it will display a single line result and exit with:
-
0 for a success
-
1 for a warning
-
2 for an error
If you are using nrpe, you can put this line in your 'nrpe.cfg' file:
command[check_rudder]=/opt/rudder/bin/rudder agent health -n
To get the last run time, you can lookup the modification date of
/var/rudder/cfengine-community/last_successful_inputs_update
.
Monitoring the health of the Rudder server
See the Web application monitoring and relayd monitoring sections.
Monitoring your configuration management
There are two interesting types of information:
-
Events: all the changes made by the the agents on your Nodes
-
Compliance: the current state of your Nodes compared with the expected configuration
Monitor compliance
You can use the Rudder API to get the current compliance state of your infrastructure. It can be used to simply check for configuration errors, or be integrated in other tools.
Here is an very simple example of API call to check for errors (exits with 1 when there is an error):
curl -s -H "X-API-Token: yourToken" -X GET 'https:/your.rudder.server/rudder/api/latest/compliance/rules' | grep -qv '"status": "error"'
See the API documentation for more information about general API usage, and the compliance API documentation for a list of available calls.
Monitor events
The Web interface gives access to this, but we will here see how to process events
automatically. They are available on the root server, in /var/log/rudder/compliance/non-compliant-reports.log
.
This file contains two types of reports about all the nodes managed by this server:
-
All the modifications made by the agent
-
All the errors that prevented the application of a policy
The lines have the following format:
[%DATE%] N: %NODE_UUID% [%NODE_NAME%] S: [%RESULT%] R: %RULE_UUID% [%RULE_NAME%] D: %DIRECTIVE_UUID% [%DIRECTIVE_NAME%] T: %TECHNIQUE_NAME%/%TECHNIQUE_VERSION% C: [%COMPONENT_NAME%] V: [%KEY%] %MESSAGE%
In particular, the 'RESULT' field contains the type of event (change or error, respectively 'result_repaired' and 'result_error').
You can use the following regex to match the different fields:
^\[(?P<Date>[^\]]+)\] N: (?P<NodeUUID>[^ ]+) \[(?P<NodeFQDN>[^\]]+)\] S: \[(?P<Result>[^\]]+)\] R: (?P<RuleUUID>[^ ]+) \[(?P<RuleName>[^\]]+)\] D: (?P<DirectiveUUID>[^ ]+) \[(?P<DirectiveName>[^\]]+)\] T: (?P<TechniqueName>[^/]+)/(?P<TechniqueVersion>[^ ]+) C: \[(?P<ComponentName>[^\]]+)\] V: \[(?P<ComponentKey>[^\]]+)\] (?P<Message>.+)$
Below is a basic Logstash configuration file for parsing Rudder events. You can then use Kibana to explore the data, and create graphs and dashboards to visualize the changes in your infrastructure.
input { file { path => "/var/log/rudder/compliance/non-compliant-reports.log" } } filter { grok { match => { "message" => "^\[%{DATA:date}\] N: %{DATA:node_uuid} \[%{DATA:node}\] S: \[%{DATA:result}\] R: %{DATA:rule_uuid} \[%{DATA:rule}\] D: %{DATA:directive_uuid} \[%{DATA:directive}\] T: %{DATA:technique}/%{DATA:technique_version} C: \[%{DATA:component}\] V: \[%{DATA:key}\] %{DATA:message}$" } } # Replace the space in the date by a "T" to make it parseable by Logstash mutate { gsub => [ "date", " ", "T" ] } # Parse the event date date { match => [ "date" , "ISO8601" ] } # Remove the date field mutate { remove => "date" } # Remove the key field if it has the "None" value if [key] == "None" { mutate { remove => "key" } } } output { stdout { codec => rubydebug } }
← Performance tuning Server installation options →