Monitoring Subsystem 3.0
The Information Manager (IM) is in charge of monitoring the hosts. It comes with various sensors, each one responsible of a different aspects of the computer to be monitored (CPU, memory, hostname…). Also, there are sensors prepared to gather information from different hypervisors (currently KVM and XEN).
Depending on the sensors that are going to conform the IM driver there are different requirements, mainly the availability of the hypervisor corresponding to the sensor if one of the KVM sensor or XEN sensor are used at all. Also, as for all the OpenNebula configurations, SSH
access to the hosts without password has to be possible.
The OpenNebula daemon loads its drivers whenever it starts. Inside /etc/one/oned.conf
there are definitions for the drivers. The following lines, will configure OpenNebula to use the Xen probes:
IM_MAD = [ name = "im_xen", executable = "one_im_ssh", arguments = "xen" ]
Equivalently for KVM, you'd put the following in oned.conf
:
IM_MAD = [ name = "im_kvm", executable = "one_im_ssh", arguments = "kvm" ]
And finally for EC2:
IM_MAD = [ name = "im_ec2", executable = "one_im_ec2", arguments = "im_ec2/im_ec2.conf" ]
Please remember that you can add your custom probes for later use by other OpenNebula modules like the scheduler.
In order to test the driver, add a host to OpenNebula using onehost, specifying the defined IM driver:
<xterm> $ onehost create ursa06 im_xen vmm_xen tm_ssh </xterm>
Now give it time to monitor the host (this time is determined by the value of HOST_MONITORING_INTERVAL in /etc/one/oned.conf
). After one interval, check the output of onehost list, it should look like the following:
<xterm>
ID NAME RVM TCPU FCPU ACPU TMEM FMEM AMEM STAT 0 ursa06 0 800 798 800 16G 14G 16G on
</xterm>
If the status is not on but err
you can get the error message using onehost show
.
Host management information is logged to /var/log/one/oned.log
. Correct monitoring log lines look like this:
Mon Oct 3 15:06:18 2011 [InM][I]: Monitoring host ursa06 (0) Mon Oct 3 15:06:18 2011 [InM][D]: Host 0 successfully monitored.
Both lines have the ID of the host being monitored.
If there are problems monitoring the host you will get an err
state:
<xterm>
ID NAME RVM TCPU FCPU ACPU TMEM FMEM AMEM STAT 0 ursa06 0 0 0 100 0K 0K 0K err
</xterm>
The way to get the error message for the host is using onehost show
command, specifying the host id:
<xterm> $ onehost show 0 […] MONITORING INFORMATION ERROR=[
MESSAGE="Error monitoring host 0 : MONITOR FAILURE 0 Could not update remotes", TIMESTAMP="Mon Oct 3 15:26:57 2011" ]
</xterm>
The log file is also useful as it will give you even more information on the error:
Mon Oct 3 15:26:57 2011 [InM][I]: Monitoring host ursa06 (0) Mon Oct 3 15:26:57 2011 [InM][I]: Command execution fail: scp -r /var/lib/one/remotes/. ursa06:/var/tmp/one Mon Oct 3 15:26:57 2011 [InM][I]: ssh: Could not resolve hostname ursa06: nodename nor servname provided, or not known Mon Oct 3 15:26:57 2011 [InM][I]: lost connection Mon Oct 3 15:26:57 2011 [InM][I]: ExitCode: 1 Mon Oct 3 15:26:57 2011 [InM][E]: Error monitoring host 0 : MONITOR FAILURE 0 Could not update remotes
In this case the node ursa06
could not be found in the DNS or /etc/hosts
.
Host monitoring interval can be changed in oned.conf
:
HOST_MONITORING_INTERVAL = 600
The value is expressed in seconds and the default value is 600, 10 minutes. You can change this value down to the value in MANAGER_TIMER
(by default is 30 seconds). If you want a lower value you need to change also MANAGER_TIMER
.
The driver itself accepts the same options as the Virtual Machine driver, you can get information on the options at the Virtualization Subsystem guide.
This section details the files used by the Information Drivers to monitor the hosts. This files are placed in the following directories:
/usr/lib/one/mads/
: The drivers executable files/var/lib/one/remotes/im/<virtualizer>.d
: Specific probes for a virtualizer.Note: If you delete the remote files placed inside the “var” directory, there is also a back up of these files inside the “lib” directory of the OpenNebula installation. But remember, the folder that will be copied to the remote hosts is “var/remotes”.
These files are used by the IM regardless of the hypervisor present on the machine to be monitored:
<xterm> $> cat /var/lib/one/remotes/im/kvm.d/name.sh #!/bin/sh
echo NAME=`uname -n` </xterm>
This uses the uname command to get the hostname of the remote host, and then outputs the information as:
NAME=host1.mydomain.org
There is always one obligatory attribute set by all the Information Drivers, HYPERVISOR set to the kind of hypervisor (XEN, KVM, VMWare, EC2, etc) present on the host this particular Information Driver is monitoring.
Files contained in remotes/im/<virtualizer>.d
are executed in the remote host. You can add more files to this directory to get more information.
Information Driver is also the responsible to copy all this scripts (and Virtual Management driver scripts) to remote nodes. If you want it to refresh probes on the remote nodes you have to use the following command:
<xterm> $ onehost sync </xterm>
This way in the next monitoring cycle the probes and VMM Driver action scripts will be copied again to the node.
Configuration on where to copy these files on the remote nodes is done in etc/oned.conf
, the parameter is called SCRIPTS_REMOTE_DIR
, by default is set to /var/tmp/one
.