Monitoring Subsystem 3.0

The Information Manager (IM) is in charge of monitoring the hosts. It comes with various sensors, each one responsible of a different aspects of the computer to be monitored (CPU, memory, hostname…). Also, there are sensors prepared to gather information from different hypervisors (currently KVM and XEN).

inlinetoc

Requirements

Depending on the sensors that are going to conform the IM driver there are different requirements, mainly the availability of the hypervisor corresponding to the sensor if one of the KVM sensor or XEN sensor are used at all. Also, as for all the OpenNebula configurations, SSH access to the hosts without password has to be possible.

OpenNebula Configuration

The OpenNebula daemon loads its drivers whenever it starts. Inside /etc/one/oned.conf there are definitions for the drivers. The following lines, will configure OpenNebula to use the Xen probes:

IM_MAD = [
    name       = "im_xen",
    executable = "one_im_ssh",
    arguments  = "xen" ]

Equivalently for KVM, you'd put the following in oned.conf:

IM_MAD = [
      name       = "im_kvm",
      executable = "one_im_ssh",
      arguments  = "kvm" ]

And finally for EC2:

IM_MAD = [
      name       = "im_ec2",
      executable = "one_im_ec2",
      arguments  = "im_ec2/im_ec2.conf" ]

Please remember that you can add your custom probes for later use by other OpenNebula modules like the scheduler.

Testing

In order to test the driver, add a host to OpenNebula using onehost, specifying the defined IM driver:

<xterm> $ onehost create ursa06 im_xen vmm_xen tm_ssh </xterm>

Now give it time to monitor the host (this time is determined by the value of HOST_MONITORING_INTERVAL in /etc/one/oned.conf). After one interval, check the output of onehost list, it should look like the following:

<xterm>

ID NAME               RVM   TCPU   FCPU   ACPU   TMEM   FMEM   AMEM   STAT
 0 ursa06               0    800    798    800    16G    14G    16G     on

</xterm>

If the status is not on but err you can get the error message using onehost show.

Host management information is logged to /var/log/one/oned.log. Correct monitoring log lines look like this:

Mon Oct  3 15:06:18 2011 [InM][I]: Monitoring host ursa06 (0)
Mon Oct  3 15:06:18 2011 [InM][D]: Host 0 successfully monitored.

Both lines have the ID of the host being monitored.

If there are problems monitoring the host you will get an err state:

<xterm>

ID NAME               RVM   TCPU   FCPU   ACPU   TMEM   FMEM   AMEM   STAT
 0 ursa06               0      0      0    100     0K     0K     0K    err

</xterm>

The way to get the error message for the host is using onehost show command, specifying the host id:

<xterm> $ onehost show 0 […] MONITORING INFORMATION ERROR=[

MESSAGE="Error monitoring host 0 : MONITOR FAILURE 0 Could not update remotes",
TIMESTAMP="Mon Oct  3 15:26:57 2011" ]

</xterm>

The log file is also useful as it will give you even more information on the error:

Mon Oct  3 15:26:57 2011 [InM][I]: Monitoring host ursa06 (0)
Mon Oct  3 15:26:57 2011 [InM][I]: Command execution fail: scp -r /var/lib/one/remotes/. ursa06:/var/tmp/one
Mon Oct  3 15:26:57 2011 [InM][I]: ssh: Could not resolve hostname ursa06: nodename nor servname provided, or not known
Mon Oct  3 15:26:57 2011 [InM][I]: lost connection
Mon Oct  3 15:26:57 2011 [InM][I]: ExitCode: 1
Mon Oct  3 15:26:57 2011 [InM][E]: Error monitoring host 0 : MONITOR FAILURE 0 Could not update remotes

In this case the node ursa06 could not be found in the DNS or /etc/hosts.

Tunning & Extending

Changing Monitoring Interval

Host monitoring interval can be changed in oned.conf:

HOST_MONITORING_INTERVAL = 600

The value is expressed in seconds and the default value is 600, 10 minutes. You can change this value down to the value in MANAGER_TIMER (by default is 30 seconds). If you want a lower value you need to change also MANAGER_TIMER.

The driver itself accepts the same options as the Virtual Machine driver, you can get information on the options at the Virtualization Subsystem guide.

Driver Files

This section details the files used by the Information Drivers to monitor the hosts. This files are placed in the following directories:

  • /usr/lib/one/mads/: The drivers executable files
  • /var/lib/one/remotes/im/<virtualizer>.d: Specific probes for a virtualizer.

Note: If you delete the remote files placed inside the “var” directory, there is also a back up of these files inside the “lib” directory of the OpenNebula installation. But remember, the folder that will be copied to the remote hosts is “var/remotes”.

Common files

These files are used by the IM regardless of the hypervisor present on the machine to be monitored:

  • /usr/lib/one/mads/one_im_ssh : shell script wrapper to the driver itself. Sets the environment and other bootstrap tasks.
  • /usr/lib/one/mads/one_im_ssh.rb : The actual Information driver.
  • /var/lib/one/remotes/im/*.d : sensors home. Little scripts or binaries that extract information from the remote hosts. Let's see a simple one to understand how they work:

<xterm> $> cat /var/lib/one/remotes/im/kvm.d/name.sh #!/bin/sh

echo NAME=`uname -n` </xterm>

This uses the uname command to get the hostname of the remote host, and then outputs the information as:

NAME=host1.mydomain.org

Probe execution

:!: There is always one obligatory attribute set by all the Information Drivers, HYPERVISOR set to the kind of hypervisor (XEN, KVM, VMWare, EC2, etc) present on the host this particular Information Driver is monitoring.

Files contained in remotes/im/<virtualizer>.d are executed in the remote host. You can add more files to this directory to get more information.

Information Driver is also the responsible to copy all this scripts (and Virtual Management driver scripts) to remote nodes. If you want it to refresh probes on the remote nodes you have to use the following command:

<xterm> $ onehost sync </xterm>

This way in the next monitoring cycle the probes and VMM Driver action scripts will be copied again to the node.

Configuration on where to copy these files on the remote nodes is done in etc/oned.conf, the parameter is called SCRIPTS_REMOTE_DIR, by default is set to /var/tmp/one.