Scheduler 4.0

The Scheduler module is in charge of the assignment between pending Virtual Machines and known Hosts. OpenNebula's architecture defines this module as a separate process that can be started independently of oned. The OpenNebula scheduling framework is designed in a generic way, so it is highly modifiable and can be easily replaced by third-party developments.

The Match-making Scheduler

OpenNebula comes with a match making scheduler (mm_sched) that implements the Rank Scheduling Policy. The goal of this policy is to prioritize those resources more suitable for the VM.

The match-making algorithm works as follows:

First those hosts that do not meet the VM requirements (see the ''SCHED_REQUIREMENTS'' attribute) and do not have enough resources (available CPU and memory) to run the VM are filtered out.
The ''SCHED_RANK'' expression is evaluated upon this list using the information gathered by the monitor drivers. Any variable reported by the monitor driver can be included in the rank expression.
Those resources with a higher rank are used first to allocate VMs.

This scheduler algorithm easily allows the implementation of several placement heuristics (see below) depending of the RANK expression used.

Configuring the Scheduling Policies

The policy used to place a VM can be configured in two places:

For each VM, as defined by the SCHED_RANK attribute in the VM template. Note that this option is potentially dangerous so it is only for the oneadmin group.

Globally for all the VMs in the sched.conf file

Re-Scheduling Virtual Machines

When a VM is in the running state it can be rescheduled. By issuing the onevm resched command the VM's recheduling flag is set. In a subsequent scheduling interval, the VM will be consider for rescheduling, if:

There is a suitable host for the VM
The VM is not already running in it

This feature can be used by other components to trigger rescheduling action when certain conditions are met.

Scheduling VM Actions

Users can schedule one or more VM actions to be executed at a certain date and time. The onevm command 'schedule' option will add a new SCHED_ACTION attribute to the Virtual Machine editable template. Visit the VM guide for more information.

Configuration

The behavior of the scheduler can be tuned to adapt it to your infrastructure with the following configuration parameters defined in /etc/one/sched.conf:

ONED_PORT: Port to connect to the OpenNebula daemon oned (Default: 2633)
SCHED_INTERVAL: Seconds between two scheduling actions (Default: 30)
MAX_VM: Maximum number of Virtual Machines scheduled in each scheduling action (Default: 5000). Use 0 to schedule all pending VMs each time.
MAX_DISPATCH: Maximum number of Virtual Machines actually dispatched to a host in each scheduling action (Default: 30)
MAX_HOST: Maximum number of Virtual Machines dispatched to a given host in each scheduling action (Default: 1)
HYPERVISOR_MEM: Fraction of total MEMORY reserved for the hypervisor. E.g. 0.1 means that only 90% of the total MEMORY will be used
LIVE_RESCHEDS: Perform live (1) or cold migrations (0) when rescheduling a VM
DEFAULT_SCHED: Definition of the default scheduling algorithm.
- POLICY: A predefined policy, it can be set to:

POLICY	DESCRIPTION
0	Packing: Minimize the number of hosts in use by packing the VMs in the hosts to reduce VM fragmentation
1	Striping: Maximize resources available for the VMs by spreading the VMs in the hosts
2	Load-aware: Maximize resources available for the VMs by using those nodes with less load
3	Custom: Use a custom RANK

RANK: Arithmetic expression to rank suitable hosts based in their attributes.

The optimal values of the scheduler parameter depends on the hypervisor, storage subsystem and number of physical hosts. The values can be derived by finding out the max number of VMs that can be started in your set up with out getting hypervisor related errors.

Sample Configuration:

 
ONED_PORT = 2633

SCHED_INTERVAL = 30

MAX_VM       = 5000
MAX_DISPATCH = 30
MAX_HOST     = 1

LIVE_RESCHEDS  = 0

HYPERVISOR_MEM = 0.1

DEFAULT_SCHED = [
   policy = 3,
   rank   = "- (RUNNING_VMS * 50  + FREE_CPU)"
]

Pre-defined Placement Policies

The following list describes the predefined policies that can be configured through the sched.conf file.

Packing Policy

Target: Minimize the number of cluster nodes in use
Heuristic: Pack the VMs in the cluster nodes to reduce VM fragmentation
Implementation: Use those nodes with more VMs running first

RANK = RUNNING_VMS

Striping Policy

Target: Maximize the resources available to VMs in a node
Heuristic: Spread the VMs in the cluster nodes
Implementation: Use those nodes with less VMs running first

RANK = "- RUNNING_VMS"

Load-aware Policy

Target: Maximize the resources available to VMs in a node
Heuristic: Use those nodes with less load
Implementation: Use those nodes with more FREECPU first

RANK = FREECPU