The Supercomputing Center of Galicia (CESGA) and the Supercomputing Center Foundation of Castilla y León (FCSCL) have built a federation of cloud infrastructures using the hybrid cloud computing functionality provided by OpenNebula. Both organizations have collaborated in order to execute an application to fight Malaria across both sites. This is a very interesting use case of cloud federation in the High Performance Computing field.
Last week at ISC Cloud 2010, Ulrich Schwickerath, from the CERN IT-PES/PS Group, presented the last benchmarking results of CERN’s OpenNebula cloud for batch processing. The batch computing farm makes a critical part of the CERN data centre. By making use of the new IaaS cloud, both the virtual machine provisioning system and the batch application itself have been tested extensively at large scale. The results show OpenNebula managing 16,000 virtual machines to support a virtualized computing cluster that executes 400,000 jobs.
- Interoperability in the private cloud by supporting most common hypervisors, such as KVM, VMware or Xen, and many other virtualization stacks through its libvirt plug-in
- Interoperability in the public cloud by exposing most common cloud interfaces, such as VMware vCloud and Amazon EC2; open community specifications, such us the OGF Open Cloud Computing Interface; and open interfaces, such as libcloud and deltacloud
- Interoperability in the hybrid cloud by supporting the combination of local private infrastructure with Amazon EC2 and ElasticHosts, and any major cloud provider, such as Rackspace, GoGrid or Terremark through a RedHat’s deltacloud adaptor
Because two data centers are not the same, building a cloud computing infrastructure requires the integration and orchestration of the underlying existing IT systems, services and processes. OpenNebula enables interoperability and portability, recognizing that our users have data-centers composed of different hardware and software components for security, virtualization, storage, and networking. Its open, architecture, interfaces and components provide the flexibility and extensibility that many enterprise IT shops need for internal cloud adoption. You only have to chose the right design and configuration in your Cloud architecture depending on your existing IT architecture and the execution requirements of your service workload.
The D-Grid Resource Center Ruhr (DGRZR) was established in 2008 at Dortmund University of Technology as part of the German Grid initiative D-Grid. In contrast to other resources, DGRZR used virtualization technologies from the start and still runs all Grid middleware, batch system and management services in virtual machines. In 2010, DGRZR was extended by the installation of OpenNebula as its Compute Cloud middleware to manage our virtual machines as a private cloud.
DGRZR consists of 256 HP blade servers with eight CPU cores (2048 cores in total) and 16 Gigabyte RAM each. The disk space per server is about 150 Gigabytes. 50% of this space is reserved for virtual machine images. The operating system on the physical servers is SUSE Enterprise Linux (SLES) 10 Service Pack 3 and will be changed to SLES 11 in the near future. We provide our D-Grid users with roughly 100 terabytes of central storage, mainly for home directories, experiment software and for the dCache Grid Storage Element. In 2009, the mass storage was upgraded by adding 25 terabyte of HP Scalable File Share 3.1 (a Lustre-like file system) and is currently migrated to version 3.2. 250 of the 256 blade servers will typically be running virtual worker nodes. The remaining servers run virtual machines for the Grid middleware services (gLite, Globus Toolkit and UNICORE), the batch system server, and other management services.
The network configuration of the resource center is static and assumes a fixed mapping from the MAC of the virtual machine to its public IP address. For each available node type (worker nodes, Grid middleware services and management services) a separate virtual LAN exists and DNS names for the possible leases have been setup in advance in the central DNS servers of the university.
Image repository and distribution:
The repository consists of images for the worker nodes based on Scientific Linux 4.8 and 5.4, UNICORE and also Globus Toolkit services. We will soon be working on creating of images for the gLite services.
The master images that are cloned to the physical servers are located on a NFS server and are kept up to date manually. The initial creation of such images (including installation and configuration of Grid services) is currently done manually, but will be replaced in near future by automated workflows. The distribution of those images to the physical servers happens on demand and uses the OpenNebula SSH transfer mechanism. Currently we have no need for pre-staging virtual machine images to the physical servers, but we may add this using scp-wave.
The migration of virtual machines has been tested in conjunction with SFS 3.1, but production usage has been postponed until the completion of the file system upgrade.
The version currently used is an OpenNebula 1.4 GIT snapshot from March 2010. Due to some problems of SLES10 with Xen (e. g. “tap:aio” not really working) modifications to the snapshot were made. In addition to this, we setup the OpenNebula Management Console and use it as a graphical user interface.
The SQLite3 database back-end performs well for the limited number of virtual machines we are running, but with the upgrade to OpenNebula 1.6 we will migrate to a MySQL back-end to prepare for an extension of our cloud to other clusters. Using Haizea as lease manager seems out of scope at the moment. With the upcoming integration of this resource as D-Grid IaaS resource, scheduler features like advanced reservations are mandatory.
Stefan Freitag (Robotics Research Institute, TU Dortmund)
Florian Feldhaus (ITMC, TU Dortmund)
Earlier this week, the 2nd Workshop on Adapting Applications and Computing Services to Multi-core and Virtualization Technologies was held at CERN, where we presented the lxcloud project and its application for a virtual batch farm. This post provides a fairly technical overview of lxcloud, its use of OpenNebula (ONE), and the cloud we are building at CERN. More details are available in the slides (Part I and Part II) from our presentations at the workshop.
The figure below shows a high level architecture of lxcloud.
Physical resources: The cloud we are currently building at CERN is not a production service and is still being developed and tested for robustness and potential weaknesses in the overall architecture design. Five hundred servers are being used temporarily to perform scaling tests (not only of our virtualization infrastructure but of other services as well), these servers have eight cores and most of them have 24GB RAM and two 500GB disks. They run Scientific Linux CERN (SLC) 5.5 and use Xen. Once KVM becomes more mainstream and CERN moves to SLC6 and beyond, KVM will be used as hypervisor but for now the cloud is 99% Xen. All servers are managed by Quattor.
Networking: The virtual machines provisioned by OpenNebula use a fixed lease file populated with private IP addresses routable within the CERN network. Each IP and corresponding MAC address is stored in the CERN network database (LANDB). Each VM is given a DNS name. To enable auditing, each IP/MAC pair is pinned to a specific host, which means that once a VM obtains a lease from OpenNebula this determines which hosts it is going to run on. This is very static but required for our regular operations. VMs defined in LANDB can be migrated to another hosts using an API, but this has not been worked on so far. The hosts run an init script which polls the list of IP/MAC pairs it is allowed to run from LANDB. This script is run very early in the boot sequence and it is also used to call the OpenNebula XML-RPC server and register the host. This way host registration is automated when the machines boot. A special ONE probe has been developed to check the list of MACs allowed on each host. Once a host registers, the list of MACs is readily available from the ONE frontend. The scheduler can then place a VM on the host that is allowed to run it.
The image repository/distribution: This component comprises a single server that runs virtual machines managed by the Quattor system. These virtual machines are our “golden nodes”; snapshots of these nodes are taken regularly and pushed/pulled on all the hypervisors. CERN does not use a shared file system other than AFS so pre-staging the disk images was needed. Pre-staging the source image of the VM instances allows us to gain a lot of time at image instantiation. The pre-staging can be done via sequential scp or via scp-wave which offers a logarithmic speed-up (very handy when you need to transfer an image to ~500 hosts) or via BitTorrent. The BitTorrent setup is currently being tuned to maximize bandwidth and the time for 100% of the hosts to get the image.
The disk images themselves are gzip files of LVM volumes created with dd (from the disk images of the golden nodes). When the file arrives on a hypervisor, the inverse operation happens: it is gunzipped and dd‘d onto a local LVM volume. Using LVM source images on all the hosts allows us to use the ONE LVM transfer scripts that create snapshots of the image at instantiation. That way instantiation takes only couple seconds. Currently we do not expect to push/pull images very often, but our measurements show that it takes ~20 minutes to transfer an image to ~450 hosts with BitTorrent and ~45 minutes with scp-wave.
OpenNebula: We use the latest development version of ONE, 1.6 with some changes added very recently that allow us to scale to ~8,000 VMs instances on the current prototype infrastructure. As mentioned earlier, the hosts are Xen hosts that auto-register via the XML-RPC server, a special information probe reads the allowed MACs on each host so that the scheduler can pin VMs to a particular host. We use the new OpenNebula MySQL backend which is faster than SQLite when dealing with thousands of VMs. We also use a new scheduler that uses XML-RPC and has solved a lot of database locking issues we were having. As reported in the workshop, we have tested the OpenNebula econe-server successfully and plan to take advantage of it or use the vCloud or OCCI interface. The choice of cloud interface for the users is still to be decided. Our tests have shown that OpenNebula can manage several thousands of VMs fairly routinely and we have pushed it to ~8,000 VMs, with the scheduler dispatching the VMs at ~1VM/sec. This rate is tunable and we are currently trying to increase it. We have not tested the Haizea leasing system yet.
Provisioning: In the case of virtual worker nodes, we drive the provisioning of the VMs making full use of the XML-RPC API. The VMs that we start for the virtual batch farm are replicas of our lxbatch worker nodes (batch cluster at CERN), however they are not managed by Quattor. To make sure that they do not get out of date we define a VM lifetime (passed to the VM via contextualization). When a VM has been drained of its jobs, the VM literally “kills itself” by contacting ONE via XML-RPC and requesting to be shut down. In this way the provisioning only has to take care of filling the pool of VMs and enforcing the pool policies. Overtime the pool adapts and converges towards the correct mix of virtual machines. The VM call back is implemented has a straightforward python script triggered by a cron job.
We hope you found these details interesting,
Sebastien Goasguen (Clemson University and CERN-IT)
Ulrich Schwickerath (CERN-IT)
Future enterprise data centers will look like private clouds supporting a flexible and agile execution of virtualized services, and combining local with public cloud-based infrastructure to enable highly scalable hosting environments. The key component in these cloud architectures will the cloud management system, also called cloud operating system (OS), being responsible for the secure, efficient and scalable management of the cloud resources. Cloud OS are displacing “traditional” OS, which will be part of the application stack.
Flexibility in Cloud Operating Systems
A Cloud OS administers the complexity of a distributed infrastructure in the execution of virtualized service workloads. The Cloud OS manages a number of servers and hardware devices and their infrastructure services which make up a cloud system, giving the user the impression that they are interacting with a single infinite capacity and elastic cloud. In the same way that multi-threaded OS define the thread as the unit of execution and the multi-threaded application as the management entity, supporting communication and synchronization instruments; multi-tier Cloud OS define the VM as the basic execution unit and the multi-tier virtualized service (group of VMs) as the basic management entity, supporting different communication instruments and their auto-configuration at boot time. This concept helps to create scalable applications because you can add VMs as and when needed. Individual multi-tier applications are all isolated from each other, but individual VMs in the same application are not as they all may share a communication network and services as and when needed.
A Cloud OS has a number of functions:
- Management of the Network, Computing and Storage Capacity: Orchestration of storage, network and virtualization technologies to enable the dynamic placement of the multi-tier services on distributed infrastructures
- Management of VM Life-cycle: Smooth execution of VMs by allocating the resources required for them to operate and by offering the functionality required to implement VM placement policies
- Management of Workload Placement: Support for the definition of workload and resource-aware allocation policies such as consolidation for energy efficiency, load balancing, affinity-aware, capacity reservation…
- Management of VM Images: Exposing of general mechanisms to transfer and clone VM images
- Management of Information and Accounting. Provision of indicators that can be used to diagnose the correct operation of the servers and VMs and to support the implementation of the dynamic VM placement policies
- Management of Security: Definition of security policy on the users of the system, guaranteeing that the resources are used only by users with the relevant authorizations and isolation between workloads
- Management of Remote Cloud Capacity: Dynamic extension of local capacity with resources from remote providers
OpenNebula is an open cloud OS that provides the above functionality on a wide range of technologies. However, in my view, the main differentiation of OpenNebula is not its leading edge functionality but its open, modular and extensible architecture that enables its seamless integration with any service and component in the ecosystem. The open architecture of OpenNebula provides the flexibility that many enterprise IT shops need for internal cloud adoption. Cloud computing is about integration, one solution does not fit all. Moreover, as pointed out in the CloudScaling “Infrastructure-as-a-Service Builder’s Guide“, the right configuration and components in a Cloud architecture also depend on the execution requirements of the service workload.
Interoperability at the Cloud Management Level
The IEEE defines interoperability as “the ability of two or more systems or components to exchange information and to use the information that has been exchanged” and Wikipedia introduces interoperability as “the property referring to the ability of diverse systems and organizations to work together (inter-operate)“. Being the core component in any cloud solution, interoperability is crucial for the success of a cloud management system. We can compare the cloud OS with a the kernel in “traditional” operating systems. The cloud OS represents the basic functions in a cloud and requires a well defined communication with underlying devices and interface to expose administration and user functionality.
At the cloud management level, interoperability means:
- Modularity and flexibility to easily interface with any service or technology in the virtualization and cloud ecosystem, and
- Standardization to avoid vendor lock-in and to create a healthy community around
In fact interoperability should be evaluated from three different angles:
- Infrastructure User Perspective: Users, application developers, integrators and aggregators are requiring a standard interface for the management of virtual machines, network and storage. OCCI is a simple REST API for Infrastructure as a Service based Clouds that is being defined in the context of OGF. This interfaces represents the first standard specification for life-cycle management of virtualized resources. OpenNebula has been the first referent implementation of this open cloud interface, and also implement the Amazon EC2 API.
- Infrastructure Management Perspective: Administrators are requiring cloud OS to inteface into existing infrastructure and management services, so fitting into any data center. OpenNebula provides a flexible back-end that can be integrated with any service for virtualization, storage and networking.
- Infrastructure Federation Perspective: Administrators are requiring cloud OS to manage resources from partner and commercial clouds
With high-end computing demands, cloud operating systems will continue to be a very active field of research and development. An open and flexible approach for cloud management ensures uptake and simplifies adaptation to different environments, being key for interoperability. The existence of an open and standard-based cloud management system like OpenNebula provides the foundation for building a complete cloud ecosystem, ensuring the new components and services in the ecosystem to have the widest possible market and user acceptability.
Last Friday, the OpenNebula project announced the implementation of the OGF OCCI draft specification. The release, that will be part of OpenNebula 1.4, includes a server implementation, clients command for using the service and enabling access to the full functionality of the OCCI interface, and several supporting documents. The last version of this open source toolkit for cloud computing, available for download in beta release, also brings libvirt, EC2 Query API, and a powerful CLI, and all of them can be used on the same OpenNebula instance, so users can use their favorite interface. In fact, OpenNebula brings support to develop other Cloud interfaces. Moreover all those interfaces can be used on any of the virtualization technologies supported, Xen, KVM and VMware.
The Open Grid Forum (OGF) Open Cloud Computing Interface (OCCI) Working Group was officially launched in April 2009 to deliver an interface specification for managing cloud infrastructure services, also known as Infrastructure as a Service or IaaS. This specification is being driven by the requirements in several use cases. The document Requirements and Use Cases for a Cloud API records the needs of IaaS Cloud computing managers and administrators in the form of Use Cases.
In the last days there has been an intensive discussion on the topic of IaaS Cloud interfaces. There are now three main players in the arena, the Amazon EC2 API, supported by the most well-known cloud computing provider, the VMware vCloud API, supported by the leader in virtualization and submitted to DMTF, and the OGF OCCI API, being defined by an open community in the Open Grid Forum. OpenNebula now implements two of them, EC2 and OCCI, and there is interest in the OpenNebula community in implementing the third interface, vCloud (after all, OpenNebula 1.4 supports VMware). However, the interest of OpenNebula as open-source community is not only to implement an interface specification controlled by a company, but also to contribute to its definition by providing feedback and playing an active role in subsequent versions. In this sense, OCCI-WG is the only open standard sanctioned by a standards body.
While some existing open-source technologies are just implementations of commercial products and interfaces, others open-source technologies such as OpenNebula, are powerful tools for innovation. A Cloud technology should not only be the implementation of an interface, standardized or not. OpenNebula, as technology being developed in the context of RESERVOIR European flagship project in cloud computing, provides many unique capabilities for the scalable and efficient management of the data center infrastructure. Those are the real differentiation in the cloud and virtualization market.
This is the first post I am writing to illustrate the main novelties of the new version of the OpenNebula Virtual Infrastructure Manager. OpenNebula is an open-source toolkit for building Public, Private and Hybrid Cloud infrastructures based on Xen, KVM and VMware virtualization platforms. OpenNebula v1.4 is available in beta release, incorporating bleeding edge technologies and innovations in many areas of virtual infrastructure management and Cloud Computing.
While previous versions concentrated on functionality for Private and Hybrid Cloud computing, this new version incorporates a new service to expose Cloud interfaces to Private or Hybrid Cloud deployments, so providing partners or external users with access to the private infrastructure, for example to sell overcapacity. The new version brings a new framework to easily develop Cloud interfaces, and implements as example a subset of the Amazon EC2 Query API. The OpenNebula EC2 Query is a web service that enables users to launch and manage virtual machines in an OpenNebula installation through the Amazon EC2 Query Interface. In this way, besides the Openebula CLI or the new libvirt interface, users can use any EC2 Query tool or utility to access your Private Cloud.
The OpenNebula team is also developing the RESERVOIR Cloud interface and is planning to develop the OGF OCCI API. Moreover, as it is stated in its Ecosystem page, the team will also collaborate with IaaS Cloud providers interested in an open-source implementation of their Cloud interface to foster adoption of their Cloud services.
Other new interesting feature is the support for VMware. The VMware Infrastructure API provides a complete set of language-neutral interfaces to the VMware virtual infrastructure management framework. By targeting the VMware Infrastructure API, the OpenNebula VMware adaptors are able to manage various flavors of VMware hypervisors: ESXi, ESX and VMware Server.
The combination of both innovations allows the creation of a Cloud infrastructure based on VMware that can be interfaced using Amazon EC2 Query API. I will cover more unique features and capabilities in upcoming posts.
Libvirt version 0.6.5 was released last week with a number of bug fixes and new features. The complete list of changes can be viewed at the libvirt web site. This new release includes an OpenNebula driver that provides a libvirt interface to an OpenNebula cluster.
What is it? OpenNebula is a Virtual Infrastructure Manager that controls Virtual Machines (VM) in a pool of distributed resources by orchestrating network, storage and virtualization technologies. The OpenNebula driver lets you manage your private cloud using a standard libvirt interface, including the API as well as the related tools (e.g. virsh) and VM description files.
Why a libvirt interface for your private cloud? Libvirt is evolving into a very rich and widely used interface to manage the virtualization capabilities of a server, including virtual network, storage and domain management. So, libvirt can be a very effective administration interface for a private cloud exposing a complete set of VM and physical node operations. In this way, libvirt + OpenNebula provides a powerful abstraction for your private cloud. More on interfaces for Private Clouds in this post…
An entire ecosystem is evolving around cloud computing. Interface standardization efforts, commercial products, cloud infrastructure and management services, virtual appliance providers and open-source solutions are filling niches in the cloud ecosystem. The role and position of a component or a service in the ecosystem are defined by its capabilities, the consumers of those capabilities and its relationship with other components and services.
This article presents public and private cloud computing from the perspective of their different application scope and interfaces.
Interfaces for Public Cloud Computing
Public or external clouds offer virtualized resources as a service, enabling the deployment of an entire IT infrastructure without the associated capital costs, paying only for the used capacity. Amazon EC2, ElasticHosts, GoGrid and FlexiScale are examples of commercial cloud providers of elastic capacity, offering a public interface for remote management of virtualized server instances within their proprietary infrastructure. With the growing popularity of these cloud offerings, an ecosystem of tools is emerging that can be used to transform an organization’s existing infrastructure into a public cloud. Technologies, such as Globus Nimbus or Eucalyptus, provide an open-source implementation of cloud-like public interfaces, and projects, such as RESERVOIR, are developing open-source toolkits for building any cloud architecture.
The standardization of a public cloud interface is the aim of the OGF Open Cloud Computing Interface Working Group. OCCI-WG is delivering an API specification for remote management of cloud computing infrastructure, allowing for the development of interoperable tools for common tasks on public clouds including deployment, autonomic scaling and monitoring. Main consumers of this API would be service management platforms, technologies for building hybrid clouds, or service providers. The working group keeps a complete list of existing cloud APIs and a list of references to studies comparing the APIs. The requirements for the new specification are being extracted from a collection of use cases contributed by the community. The working group is being supported by relevant companies and open-source initiatives in the cloud computing ecosytem.
Interoperability is not only about standardization of interfaces, but also about portability of virtual machines. The DMTF Open Virtualization Format (OVF) can be used as a means for customers of an IaaS provider to express their infrastructural needs. OVF was not designed with cloud computing in mind, so there are issues that need to be solved when applied to this environment, in particular, on automatic elasticity, self-configuration and deployment constraints. In any case, standards for cloud interoperability (OCCI) and virtual machine portability (OVF) are imminent and many providers are planning to adopt them.
Interfaces for Private Cloud Computing
On the other hand, there is a growing interest in tools for leasing compute capacity from the local infrastructure. The aim of these deployments is not to expose to the world a cloud interface to sell capacity over the Internet, but to provide local users with a flexible and agile private infrastructure to run service workloads within the administrative domain. This private or enterprise cloud model is not new, since datacenter management has been around for a while. In fact, I would venture that future datacenters will look like private clouds. Platform VM Orchestrator, VMware VSphere, Citrix Cloud Center, and Red Hat Enterprise Virtualization Manager are commercial tools for managament of virtualized services on the datacenter, so aimed at building private clouds. OpenNebula Virtual Infrastructure Engine (now part of Ubuntu) is an open-source alternative for private cloud computing, also supporting hybrid cloud deployments to supplement local infrastructure with computing capacity from an external cloud.
Private cloud interfaces should so allow the integration of the virtualized distributed infrastructure in the data-center management stack, including user and administration support. A private cloud interface should provide rich enough semantics, far beyond of that provided by public clouds, to ease this integration. Such interface should provide additional functionality for virtualization, networking, image and physical resource configuration, management, monitoring and accounting, not exposed by pubic cloud interfaces.
The standardization of a private cloud interface may be the aim of the new DMTF Cloud Computing Incubator, given that, according to its charter, one of its benefits is to enable the use of cloud computing within enterprises. The DMTF Open Cloud Standards Incubator Leadership Board currently includes most of main providers and integrators of private cloud solutions. On the other hand, although conceived as a library to interface with different virtualization technologies, the libvirt virtualization API could be also used as interface for private cloud computing. This is the approach represented by the libvirt implementation of OpenNebula. The implementation of libvirt on top of a virtual infrastructure manager provides an abstraction of a whole cluster of resources (each one with its hypervisor), so a whole cluster can be managed as any other libvirt node.
About Using Public Interfaces for Private Cloud Deployments
The usage of public cloud interfaces to access the local infrastructure would reduce the cost of learning a new interface when moving from a private to a public; but at the expense of providing local users with limited functionality, losing the comfort and control of data center operations, and using, within the administration domain, communication protocols and security mechanisms originally created for remote management. Moreover, several local cloud technologies support cloudbursting to build hybrid clouds, so combining local infrastructure with public cloud-based infrastructure and enabling highly scalable hosting environments.
That does not mean, of course, that you can not expose a public interface on top of your private cloud solution. For example if you want to provide partners or external users with access to your infrastructure, or to sell your overcapacity. Obviously, a local cloud solution is the natural back-end for any public cloud.
MEET US AT
TechDay Sofia, 14 May 2018
TechDay Cambridge MA, 22 May 2018
TechDay Barcelona, 24 May 2018
TechDay Santa Clara CA, 30 August 2018
TechDay Frankfurt, 26 September 2018
OpenNebulaConf Amsterdam, 12-13 Nov 2018