Elastic IP and Security Groups using OpenFlow

With Open vSwitch (OVS) support in OpenNebula it was natural to work on using OpenFlow to enable advanced network services. With OVS support the door was opened for a programmable network and a merge of OpenNebula networking paradigm with the area of software defined networking.

OpenFlow is a network controller that OVS switches can use to enforce network rules. The rules control packet flows instead of individual packets and can be created at any layer of the networking stack.

In our recent work we deployed an OpenFlow controller (NOX) on our cloud at Clemson University. The Onecloud is made of a couple KVM hypervisors that use OVS switches and a single NOX controller. The neat features that we implemented are an implementation of Amazon’s Elastic IP and Security Groups. We modified the econe server in OpenNebula to expose the EC2 query API for these services and we wrapped an xml-rpc server on top of NOX to directly set the proper network rules in the controller, which in turn forwarded them to the OVS switches on the hypervisors. The figure below summarizes this setup:

What are these rules? For elastic IP, these rules just re-write all packets at L3 and answer to ARP requests so that a public IP leased via OpenNebula can be associated with an instance previously started on a different vnet. For security groups, more changes were required on the OpenNebula side, like an extension of the database tables, but the rules are now at L4, allowing flows to a specific port of an instance.

The end result is that you can now do something like this (using boto) with an OpenNebula cloud:

#Lease and IP address
address = conn.allocate_address()

# Associate IP with VM
conn.associate_address(instance.id, address.public_ip)

# Remove Elastic IP from the instance it is associated with

# Return the Elastic IP address back to address pool

For security groups, if one started an instance with all ports blocked, to open port 22 one would proceed exactly the same way than with Amazon EC2:

# Start instance in a security group
reservation = image.run(instance_type="m1.small", security_groups=["sg-000007"])

# Create security group
sg = conn.create_security_group("Test Group", "Description of Test Group")

# Allow access from anyone for HTTP
sg.authorize("tcp", 80, 80, "", None)

# Allow SSH access from private subnet
sg.authorize("tcp", 22, 22, "", None)

The modified NOX controller is available on Google code, and the modified econe will be available in Opennebula 3.4. The Onecloud has nice screencasts of using these new OpenNebula enhancements

Sebastien Goasguen, Greg Stabler, Aaron Rosen and K. C Wang

Clemson University

Details of CERN’s OpenNebula deployment

Earlier this week, the 2nd Workshop on Adapting Applications and Computing Services to Multi-core and Virtualization Technologies was held at CERN, where we presented the lxcloud project and its application for a virtual batch farm. This post provides a fairly technical overview of lxcloud, its use of OpenNebula (ONE), and the cloud we are building at CERN. More details are available in the slides (Part I and Part II) from our presentations at the workshop.

The figure below shows a high level architecture of lxcloud.

Physical resources: The cloud we are currently building at CERN is not a production service and is still being developed and tested for robustness and potential weaknesses in the overall architecture design. Five hundred servers are being used temporarily to perform scaling tests (not only of our virtualization infrastructure but of other services as well), these servers have eight cores and most of them have 24GB RAM and two 500GB disks. They run Scientific Linux CERN (SLC) 5.5 and use Xen. Once KVM becomes more mainstream and CERN moves to SLC6 and beyond, KVM will be used as hypervisor but for now the cloud is 99% Xen. All servers are managed by Quattor.

Networking: The virtual machines provisioned by OpenNebula use a fixed lease file populated with private IP addresses routable within the CERN network. Each IP and corresponding MAC address is stored in the CERN network database (LANDB). Each VM is given a DNS name. To enable auditing, each IP/MAC pair is pinned to a specific host, which means that once a VM obtains a lease from OpenNebula this determines which hosts it is going to run on. This is very static but required for our regular operations. VMs defined in LANDB can be migrated to another hosts using an API, but this has not been worked on so far. The hosts run an init script which polls the list of IP/MAC pairs it is allowed to run from LANDB. This script is run very early in the boot sequence and it is also used to call the OpenNebula XML-RPC server and register the host. This way host registration is automated when the machines boot. A special ONE probe has been developed to check the list of MACs allowed on each host. Once a host registers, the list of MACs is readily available from the ONE frontend. The scheduler can then place a VM on the host that is allowed to run it.

The image repository/distribution: This component comprises a single server that runs virtual machines managed by the Quattor system. These virtual machines are our “golden nodes”; snapshots of these nodes are taken regularly and pushed/pulled on all the hypervisors. CERN does not use a shared file system other than AFS so pre-staging the disk images was needed. Pre-staging the source image of the VM instances allows us to gain a lot of time at image instantiation. The pre-staging can be done via sequential scp or via scp-wave which offers a logarithmic speed-up (very handy when you need to transfer an image to ~500 hosts) or via BitTorrent. The BitTorrent setup is currently being tuned to maximize bandwidth and the time for 100% of the hosts to get the image.

The disk images themselves are gzip files of LVM volumes created with dd (from the disk images of the golden nodes). When the file arrives on a hypervisor, the inverse operation happens: it is gunzipped and dd‘d onto a local LVM volume. Using LVM source images on all the hosts allows us to use the ONE LVM transfer scripts that create snapshots of the image at instantiation. That way instantiation takes only couple seconds. Currently we do not expect to push/pull images very often, but our measurements show that it takes ~20 minutes to transfer an image to ~450 hosts with BitTorrent and ~45 minutes with scp-wave.

OpenNebula: We use the latest development version of ONE, 1.6 with some changes added very recently that allow us to scale to ~8,000 VMs instances on the current prototype infrastructure. As mentioned earlier, the hosts are Xen hosts that auto-register via the XML-RPC server, a special information probe reads the allowed MACs on each host so that the scheduler can pin VMs to a particular host. We use the new OpenNebula MySQL backend which is faster than SQLite when dealing with thousands of VMs. We also use a new scheduler that uses XML-RPC and has solved a lot of database locking issues we were having. As reported in the workshop, we have tested the OpenNebula econe-server successfully and plan to take advantage of it or use the vCloud or OCCI interface. The choice of cloud interface for the users is still to be decided. Our tests have shown that OpenNebula can manage several thousands of VMs fairly routinely and we have pushed it to ~8,000 VMs,  with the scheduler dispatching the VMs at ~1VM/sec. This rate is tunable and we are currently trying to increase it. We have not tested the Haizea leasing system yet.

Provisioning: In the case of virtual worker nodes, we drive the provisioning of the VMs making full use of the XML-RPC API. The VMs that we start for the virtual batch farm are replicas of our lxbatch worker nodes (batch cluster at CERN), however they are not managed by Quattor. To make sure that they do not get out of date we define a VM lifetime (passed to the VM via contextualization). When a VM has been drained of its jobs, the VM literally “kills itself” by contacting ONE via XML-RPC and requesting to be shut down. In this way the provisioning only has to take care of filling the pool of VMs and enforcing the pool policies. Overtime the pool adapts and converges towards the correct mix of virtual machines. The VM call back is implemented has a straightforward python script triggered by a cron job.

We hope you found these details interesting,

Sebastien Goasguen (Clemson University and CERN-IT)

Ulrich Schwickerath (CERN-IT)