Monitoring Subsystem 3.4

The Information Manager (IM) is in charge of monitoring the hosts. It comes with various sensors, each one responsible of a different aspects of the computer to be monitored (CPU, memory, hostname…). Also, there are sensors prepared to gather information from different hypervisors.

inlinetoc

Requirements

Depending on the sensors that are going to conform the IM driver there are different requirements, mainly the availability of the hypervisor corresponding to the sensors. Also, as for all the OpenNebula configurations, SSH access to the hosts without password has to be possible.

OpenNebula Configuration

The OpenNebula daemon loads its drivers whenever it starts. Inside /etc/one/oned.conf there are definitions for the drivers. The following lines, will configure OpenNebula to use the Xen probes:

IM_MAD = [
    name       = "im_xen",
    executable = "one_im_ssh",
    arguments  = "xen" ]

Equivalently for VMware, you'd uncomment the following in oned.conf:

IM_MAD = [
      name       = "im_vmware",
      executable = "one_im_sh",
      arguments  = "-t 15 -r 0 vmware" ]

And finally for EC2:

IM_MAD = [
      name       = "im_ec2",
      executable = "one_im_ec2",
      arguments  = "im_ec2/im_ec2.conf" ]

Please remember that you can add your custom probes for later use by other OpenNebula modules like the scheduler.

Testing

In order to test the driver, add a host to OpenNebula using onehost, specifying the defined IM driver:

<xterm> $ onehost create ursa06 –im im_xen –vm vmm_xen –net dummy </xterm>

Now give it time to monitor the host (this time is determined by the value of HOST_MONITORING_INTERVAL in /etc/one/oned.conf). After one interval, check the output of onehost list, it should look like the following:

<xterm> $ onehost list

ID NAME         CLUSTER     RVM   TCPU   FCPU   ACPU   TMEM   FMEM   AMEM STAT
 0 ursa06       -             0    800    798    800    16G    14G    16G   on

</xterm>

Host management information is logged to /var/log/one/oned.log. Correct monitoring log lines look like this:

Mon Oct  3 15:06:18 2011 [InM][I]: Monitoring host ursa06 (0)
Mon Oct  3 15:06:18 2011 [InM][D]: Host 0 successfully monitored.

Both lines have the ID of the host being monitored.

If there are problems monitoring the host you will get an err state:

<xterm> $ onehost list

ID NAME         CLUSTER     RVM   TCPU   FCPU   ACPU   TMEM   FMEM   AMEM STAT
 0 ursa06       -             0      0      0    100     0K     0K     0K  err

</xterm>

The way to get the error message for the host is using onehost show command, specifying the host id or name:

<xterm> $ onehost show 0 […] MONITORING INFORMATION ERROR=[

MESSAGE="Error monitoring host 0 : MONITOR FAILURE 0 Could not update remotes",
TIMESTAMP="Mon Oct  3 15:26:57 2011" ]

</xterm>

The log file is also useful as it will give you even more information on the error:

Mon Oct  3 15:26:57 2011 [InM][I]: Monitoring host ursa06 (0)
Mon Oct  3 15:26:57 2011 [InM][I]: Command execution fail: scp -r /var/lib/one/remotes/. ursa06:/var/tmp/one
Mon Oct  3 15:26:57 2011 [InM][I]: ssh: Could not resolve hostname ursa06: nodename nor servname provided, or not known
Mon Oct  3 15:26:57 2011 [InM][I]: lost connection
Mon Oct  3 15:26:57 2011 [InM][I]: ExitCode: 1
Mon Oct  3 15:26:57 2011 [InM][E]: Error monitoring host 0 : MONITOR FAILURE 0 Could not update remotes

In this case the node ursa06 could not be found in the DNS or /etc/hosts.

Tuning & Extending

Changing Monitoring Interval

Host monitoring interval can be changed in oned.conf:

HOST_MONITORING_INTERVAL = 600

The value is expressed in seconds and the default value is 600, 10 minutes. You can change this value down to the value in MANAGER_TIMER (by default is 30 seconds). If you want a lower value you need to change also MANAGER_TIMER.

The driver itself accepts the same options as the Virtual Machine driver, you can get information on the options at the Virtualization Subsystem guide.

Driver Files

This section details the files used by the Information Drivers to monitor the hosts. There are two important driver files:

  • Driver executable files. There are two basic types: SSH and SH based drivers. SSH monitoring occurs by login in the target host and then executing probe scripts. On the other hand, SH based monitoring occur by executing the probes in the OpenNebula front-end, usually to gather monitor information from a server. These files are /usr/lib/one/mads/one_im_ssh, and /usr/lib/one/mads/one_im_sh respectively.
  • Monitor Probes. Probes are defined for each hypervisor, and are located at /var/lib/one/remotes/im/<hypervisor>.d. A probe is a little script or binary that extract information from remotely (SSH) or locally (SH). The probe should return the metric in a simple NAME=VALUE format. Let's see a simple one to understand how they work:

<xterm> $ cat /var/lib/one/remotes/im/kvm.d/name.sh #!/bin/sh

echo HOSTNAME=`uname -n` </xterm>

This uses the uname command to get the hostname of the remote host, and then outputs the information as:

HOSTNAME=host1.mydomain.org

Probe execution

:!: There is always one obligatory attribute set by all the Information Drivers, HYPERVISOR set to the kind of hypervisor (XEN, KVM, VMWare, EC2, etc) present on the host this particular Information Driver is monitoring.

Files contained in /var/lib/one/remotes/im/<virtualizer>.d are executed in the remote host. You can add more files to this directory to get more information.

Information Driver is also the responsible to copy all this scripts (and Virtual Management driver scripts) to remote nodes. If you want it to refresh probes on the remote nodes you have to execute the following command in the front-end, as oneadmin:

<xterm> $ onehost sync </xterm>

This way in the next monitoring cycle the probes and VMM Driver action scripts will be copied again to the node.

Configuration on where to copy these files on the remote nodes is done in /etc/one/oned.conf, the parameter is called SCRIPTS_REMOTE_DIR, by default it is set to /var/tmp/one.