Advanced node management

Override node FQDN

Sometime, the node’s FQDN reported in inventory is not the correct one. For example, the configured FQDN is an internal name, or it’s incorrectly set-up by OS or not found in a normalized format by the agent.

This can be problematic and for example break the possibility for Rudder to trigger an agent run from node’s compliance page.

In these cases, you can override the FQDN from the node using node inventory extension.

The inventory extension must define the reserved property key rudder_override_hostname with a string value.

If that key is defined, it is preferred to other values retrieved in agent inventory (<RUDDER><HOSTNAME> or else <OPERATINGSYSTEM><FQDN>) as the FQDN for the node.

When the override is used, an additional inventory extension key rudder_original_hostname is also defined. That key’s value contains the original node hostname which would have been used without an override.

This feature has beed introduced in Rudder 7.1.7 and 7.2.1.

The value used for overriding FQDN must be legit as of RFC1123 (section 2.1) else it won’t be used in place of the original one.

Example inventory hook script to put on the node so that the node will be seen on Rudder with the FQDN node1-overridden.rudder.local.override:

$ cat /var/rudder/hooks.d/override-hostname.sh
echo '{"rudder_override_hostname": "node1-overridden.rudder.local.override"}'

Example of resulting node properties (in Rudder UI and API):

{
  "rudder_override_hostname": "node1-overridden.rudder.local.override",
  "rudder_original_hostname": "agent1.rudder.local"
}

Reinitialize policies for a Node

To reinitialize the policies for a Node, delete the local copy of the applied policies fetched from the Rudder Server, and create a new local copy of the initial policies.

rudder agent reset

At next run of the Rudder Agent (it runs every five minutes), the initial policies will be used.

Use this procedure with caution: the applied policies of a Node should never get broken, unless some major change has occurred on the Rudder infrastructure, like a full reinstallation of the Rudder Server.

Completely reinitialize a Node

You may want to completely reinitialize a Node to make it seen as a new node on the server, for example after cloning a VM.

This command will permanently delete your node uuid and keys, and no configuration will be applied before re-accepting and configuring the node on the server.

The command to reinitialize a Node is:

rudder agent factory-reset

This command will delete all local agent data, including its uuid and keys, and also reset the agent internal state. The only configuration kept is the server hostname or ip configured in policy_server.dat. It will also send an inventory to the server, which will treat it as a new node inventory.

Change the agent run schedule

By default, the agent runs on all nodes every 5 minutes. You can modify this value in Settings → General page in Agent Run Schedule section, as well as the "splay time" across nodes (a random delay that alters scheduled run time, intended to spread load across nodes).

This settings can also be modified Node by Node, allowing you to customize the agent behavior (Node with little resource like a Raspberry Pi or with limited bandwidth). To do that, go into the Node details in the Settings tab

When reducing notably the run interval length, reporting can be in 'No report' state until the next run of the agent, which can take up to the previous (longer) interval.

Installation of the Rudder Agent

Static files

At installation of the Rudder Agent, files and directories are created in following places:

/etc: Scripts to integrate Rudder Agent in the system (init, cron).
/opt/rudder/share/bootstrap-promises: Bootstraping policies for the Rudder Agent. These policies are used to download a common set of initial policies from the server.
/opt/rudder/lib/perl5: The FusionInventory Inventory tool and its Perl dependencies.
/opt/rudder/bin/run-inventory: Wrapper script to launch the inventory.
/opt/rudder/bin: Binaries for agent.
/var/rudder/cfengine-community: This is the working directory for the agent.

Generated files

At the end of installation, the agent’s working directory is populated for first use, and unique identifiers for the Node are generated.

/var/rudder/cfengine-community/bin/: agent binaries are copied there.
/var/rudder/cfengine-community/inputs: Contains the actual working policies. Initial policies are copied here at installation. After validation of the Node, Applied Policies, which are the policies generated by Rudder for this particular Node, will be stored here.
/var/rudder/cfengine-community/ppkeys: An unique SSL key generated for the Node at installation time.
/opt/rudder/etc/uuid.hive: An unique identifier for the Node is generated into this file.

Services

After all of these files are in place, the agent daemons are launched:

Unresolved include directive in modules/usage/pages/advanced_node_management.adoc - include::partial$/glossary/cf-execd.adoc[]

Unresolved include directive in modules/usage/pages/advanced_node_management.adoc - include::partial$/glossary/cf-serverd.adoc[]

Configuration

At this point, you should configure the Rudder agent to actually enable the contact with the server. Type in the IP address of the Rudder Root Server in the following file:

rudder agent policy-server <rudder relay ip or hostname>

Rudder Agent interactive

You can force the Rudder Agent to run from the console and observe what happens.

rudder agent run

Error: the name of the Rudder Root Server can’t be resolved

If the Rudder Root Server name is not resolvable, the Rudder Agent will issue this error:

rudder agent run

Unable to lookup hostname (rudder-root) or cfengine service: Name or service not known

To fix it, either you set up the agent to use the IP address of the Rudder root server instead of its Domain name, either you set up accurately the name resolution of your Rudder Root Server, in your DNS server or in the hosts file.

The Rudder Root Server name is defined in this file

echo *IP_of_root_server* > /var/rudder/cfengine-community/policy_server.dat

Error: the rudder-agent service is not responding on the Rudder Root Server

If the rudder-agent is stopped on the Rudder Root Server you will get this error:

# rudder agent run
 !! Error connecting to server (timeout)
 !!! System error for connect: "Operation now in progress"
 !! No server is responding on this port
Unable to establish connection with rudder-root

Restart the rudder-agent service:

service rudder-agent restart

Processing new inventories on the server

Verify the inventory has been received by the Rudder Root Server

There is some delay between the time when the first inventory of the Node is sent, and the time when the Node appears in the New Nodes of the web interface. For the brave and impatient, you can check if the inventory was sent by listing incoming Nodes on the server:

ls /var/rudder/inventories/incoming/

Process incoming inventories

On the next run of the agent on Rudder Root Server, the new inventory will be detected and sent to the Inventory Endpoint. The inventory will be then moved in the directory of received inventories. The Inventory Endpoint do its job and the new Node appears in the interface.

You can force the execution of agent on the console:

rudder agent run

Validate new Nodes

User interaction is required to validate new Nodes.

Prepare policies for the Node

Policies are not shared between the Nodes for obvious security and confidentiality reasons. Each Node has its own set of policies. Policies are generated for Nodes according in the following cases;

A new node was accepted;
Inventory has changed;
Technique has changed;
Directive has changed;
Group of Node has changed;
Rule has changed;
Regeneration was forced by the user.

Agent execution frequency on nodes

Checking configuration

By default, Rudder is configured to check and repair configurations using the agent every 5 minutes, at 5 minutes past the hour, 10 minutes past the hour, etc.

The exact run time on each machine will be delayed by a random interval, in order to "smooth" the load across your infrastructure (also known as "splay time"). This reduces simultaneous connections on relay and root servers (both for the policy server and for sending reports).

See the agent run schedule section to see how to configure it

Inventory

The FusionInventory agent collects data about the node it’s running on such as machine type, OS details, hardware, software, networks, running virtual machines, running processes, environment variables…

This inventory is scheduled once every 24 hours, and will happen in between 0:00 and 5:00 AM. The exact time is randomized across nodes to "smooth" the load across your infrastructure.

Extend node inventory

It is quite common to need to gather information on your nodes that are not present in the standard Rudder inventory information.

As of Rudder 4.3.0, you can get more information about a node thanks to inventory hooks. These information will be available as standard node properties.

Starting from 6.0.3, the inventory extension is also available on DSC based agents.

Overview

On the node, you create inventory hooks executable and place them in /var/rudder/hooks.d. These binaries are executed in the alphanumerical order, only if executable, and their output is checked to ensure that it is proper JSON. On DSC based agents, they must be powershell scripts and be located under C:\Program Files\Rudder\hooks.d.

For example, one hook can output:

{
    "my_prop1": ["a", "json", "array"],
    "my_prop2": {"some": "more", "key": "value"}
}

When the node inventory is processed server side, the node properties will get new values, one per first-level key of all hooks.

These node properties are marked as "provided by inventory" and can not be deleted nor overwritten. Apart from that characteristic, they are normal node properties that can be used to create group, or as variables in Directives parameters.

Creating a node inventory hook

An inventory hook can be any kind of executable that can be called without parameters, from a shell script to a C program.

Hooks are located in directory /var/rudder/hooks.d. You may need to create that directory the first time you want to add hooks:

mkdir /var/rudder/hooks.d

They need to be executable by rudder agent.

For example, this hook will create a new "hello_inventory" node property:

% cd /var/rudder/hooks.d

% cat <<EOF > hello-world
#!/bin/sh
echo '{"hello_inventory": "a simple string value from inventory"}'
EOF

% chmod +x hello-world

% rudder agent inventory

And then, after the server has processed the inventory, the node (here with ID '74d10806-b41d-4575-ab86-8becb419949b') has the corresponding property:

% curl -k -H "X-API-Token: ......" -H "Content-Type: application/json" -X GET 'https://..../rudder/api/latest/nodes/74d10806-b41d-4575-ab86-8becb419949b?include=minimal,properties' | jq '.'
{
  "action": "nodeDetails",
  "id": "74d10806-b41d-4575-ab86-8becb419949b",
  "result": "success",
  "data": {
    "nodes": [
      {
        "id": "74d10806-b41d-4575-ab86-8becb419949b",
        ....
        "properties": [
          {
            "name": "hello_inventory",
            "value": "a simple string value from inventory",
            "provider": "inventory"
          }
        ]
      }
    ]
  }
}

Overriding

If two hooks provide the same first-level key, then the last executed hook values for that key are kept.

You should always use the first level keys as a namespace for your hooks to avoid unwanted overriding.

Inventory XML format

Properties coming from inventory hooks are stored in a tag named <CUSTOM_PROPERTIES>. The tag contains a JSON array with all the inventory hook properties merged:

<CUSTOM_PROPERTIES>[{ "key1" : "values"},{ "key2" : "values"}]</CUSTOM_PROPERTIES>

Node Lifecycle

Imagine you have a node that you must disconnect for a maintenance period. You know what is happening on the node, and during the maintenance period, you don’t want that the Rudder shows up the node as Not responding and trigger alert on global compliance level.

An other common use case is to be able to set specific policies for nodes just after acceptation that are used for provisioning, or just before node end of life to clean it up.

In Rudder 4.3, we introduced a way to manage the Node lifecycle, for both of theses uses cases:

nodes disconnected from Rudder Server can be excluded from policy generation and Compliance with the Ignored state,
the main states of a system life can be applied with the 4 states Initializing, Enabled, Preparing End of Life and Empty policies.

States Ignored and Empty policies automatically changes the policy generation and compliance:

Ignored prevents any new policy generation for the Nodes in this states.
Empty policies generates a minimal set of policies, only to manage the Rudder Agent itself.

Both states remove the nodes from the compliance.

Nodes with non-default state appears with a label next to their name in the nodes list to show their states, and their compliance doesn’t show up in Ignored nor Empty policies mode. You can filter by node lifecycle state in that list with the common Filter input field.

Node with a given lifecycle state can be searched thanks to the quick search tool in Rudder status bar. That state can also be used to construct groups (Node state attribute of Node summary) and they also show up in the API responses concerning node information.

Finally, the default state for a Node can be configured in the Settings page, to define in which mode accepted Nodes use.

In the future, these states will be configurable on a per node basis at acceptation, and the lifecycle states list will be configurable by users.

Disable the service listening for remote run on nodes

By default, every Rudder node runs a service that listens on port 5309 for remote execution triggers, callable from Rudder API.

On Rudder root server and relays, this service is necessary and cannot be disabled. It is used to serve configuration policies to managed nodes.

It is totally optional, and you can easily disable it:

On systemd systems, stop and disable the rudder-cf-serverd service on the nodes you don’t want it to run (you can do it through Rudder policies):

systemctl stop rudder-cf-serverd.service
systemctl disable rudder-cf-serverd.service

On systems still using the init script, edit /etc/default/rudder-agent and uncomment and set to 0 the CFENGINE_COMMUNITY_RUN_1 line:

CFENGINE_COMMUNITY_RUN_1="0"

Single directive execution

Normal rudder agent runs apply all the node configuration to the underlying system.

You may sometimes want to test a single directive, either because it is long to run or because you want to test it in isolation from directives.

To do this, you need to know the list of available directives on your agent with the command:

rudder directive list

Then you can run one with the command:

rudder directive run -u <uuid>

← Node management Configuration concepts →