This upcoming November 5-8, VMworld 2018 will be held in Barcelona. This is a must-attend event where anyone with an interest in virtualization and cloud computing will be in attendance, networking with industry experts. The OpenNebula team will be there in Barcelona, ready to showcase OpenNebula’s integration with VMware Cloud on AWS, as well as, the new features of both OpenNebula 5.6 and vOneCloud 3.2.1.

Join us in Barcelona, make sure to register, and don’t forget to stop by our booth, E422. We can provide a live-demo of how a VMware-based infrastructure can be easily turned into a cloud, with a fully-functional self-service portal – all in the matter of minutes! At the same time, we will be available to answer any questions you may have, and discuss ongoing developments. We hope to see you there!

We’ve seen over the last several years the explosive value brought to the market of cloud computing, and the ever-growing shift toward establishing centralized data centers to support all scales of business processing. The cloud infrastructure of today has provided an extremely effective and economical platform for flexing with the persistent need for increased storage and computing for businesses. With the rapid growth of data, comes the corresponding growth for the need to process that data. Up until now, the modern paradigm has been to have the swift and agile ability to grow one’s data center to handle that growing need for processing power. Virtualized Data Centers and Cloud infrastructures have been foundational tools.

However, with the Internet of Things (IoT) and the forthcoming explosion of “everything connected”, we are seeing that the centralized Cloud infrastructure, on its own, will not be a silver bullet. These mobile devices, which ironically enough, we continue to call “phones”, continue to evolve, providing an ever-growing range of capabilities and a burgeoning power to compute and process. Homes, offices, public buildings, and automobiles are now collecting and generating huge amounts of data, which as we walk by with our phones, or drive by in our automobiles, we’ll have the need and expectation for a much more complete, and almost inherent, interaction. And this is where the current cloud model falls short.

As this explosion of connected data and IoT grows, and interactions between things need to almost mimic human-nature, the basic paradigm shifts from a need to scale, to a need for speed. The importance of latency in these types of “connected” interactions becomes paramount. And here is where we see bringing cloud capabilities closer to the consumer – closer to “the Edge” – as a developing model.

At OpenNebula Systems, we’ve focused, over the last decade, to bring a simple, yet flexible and comprehensive, Virtual Data Center and Cloud Management solution to the market – in OpenNebula. And as the demands have developed, and user needs have changed, we have continued to innovate.  Within the last month, we have released the first version of a prototype solution with cloud disaggregation capabilities. This is the first step in our focus to integrate edge computing, while ultimately maintaining an integrated experience of cloud orchestration and resource management.

With this prototype, we have carried out a simple, but illustrative, use case, demonstrating the value that can be achieved by being able to “disaggregate” one’s cloud infrastructure – (for now, we have introduced support for both Packet and AWS EC2 bare-metal containers) – and bringing it closer to the user.

We assumed that a fictitious company, ACME Corporation, was located in Sacramento, California, where we instantiated an OpenNebula node, to emulate an on-premise private cloud for the company. The case here begins with ACME realizing that it is getting a lot of system traffic, not only within the California region, but also from users in France. And with OpenNebula and the newly introduced Host Provisioning capabilities, ACME Corporation can now:

  • deploy new physical hosts on selected bare-metal cloud providers
  • install and configure them as KVM hypervisors
  • and add them into existing OpenNebula clusters as an independent host.

all within minutes.

In terms of Host Provisioning, for this exercise, we utilized bare-metal containers from Packet.  Here we deployed and configured two separate edge nodes – one in Los Angeles, California, and the other in Marseilles, France.

Edge Node / Location Deployment time Configuration time
Node 1 – Los Angeles, CA 5 minutes 3 minutes
Node 2 – Marseilles, France 5 minutes 7 minutes

Essentially, within a period of 8 minutes and 12 minutes, respectively, we were able to deploy and install two physical hosts on a physical, bare-metal container, and configure each of them as KVM hypervisors.

Then, the next step was to deploy a Virtual Machine.  In this case, we utilized Alpine Linux virtual router appliances with a physical size of 71 MiB. (Deployment time takes into account the total time between the deploy order and the VM entering running state, without taking into account the initial image transfer time, which is required only the first time the VM is deployed on a new location.)

Edge Node / Location Deployment time Image transfer time
Node 1 – Los Angeles, CA 1 seconds 3 seconds
Node 2 – Marseilles, France 9 seconds 15 seconds

So, within a matter of a few minutes, ACME Corporation was able to deploy two separate virtual nodes – all controlled within the single, centrally-managed OpenNebula private cloud. And here is where the “rubber meets the road”. We then measured the latency across the nodes:

We measured latencies for the following situations to demonstrate the centralized cloud use case:

Use Case Infrastructure arrangement Latency
User in Los Angeles, CA Between the user and the on-premise cloud (node in Sacramento, CA) 12 milliseconds
User in Marseille, France Between the user and the on-premise cloud (node in Sacramento, CA) 174 milliseconds

We then measured latencies for the following disaggregated cloud infrastructure:

Use Case Infrastructure arrangement Latency
User in Los Angeles, CA Between the user and the edge (node in Los Angeles, CA) 9 milliseconds
User in Marseille, France Between the user and the edge (node in Marseille, France) 10 milliseconds
User in Paris, France Between the user and the edge (node in Marseille, France) 12 milliseconds

The result it simple. By utilizing OpenNebula’s capability to easily provision a separate, fully functional node on a bare-metal container, such as Packet, that is geographically closer to the end-user, one can achieve a significant improvement in latency. In this case, ACME Corporation was able to reduce the latency for the user in France from 174 milliseconds to 10 milliseconds. And in the world with increased focus on connected data, gaming, and IoT, this will be more and more critical.

While this OpenNebula Host Provisioning prototype is an initial step in our focused development in Edge Computing and Disaggregated Clouds, OpenNebula Systems is also heavily involved in building out similar capability in its collaboration with the telecommunications giant, Telefónica, and their Central Office Re-architected as a Datacenter (CORD) initiative, called “OnLife”.  Read here for additional details about Telefónica’s “OnLife” initiative.

Stay connected with developments at OpenNebula Systems. Don’t forget to join our Newsletter, or reach out to me directly (mabdou@opennebula.systems) for any questions or suggestions. We maintain and nurture a strong Community of Users, and we’d love to hear your feedback and insight.

We want to let you know that OpenNebula Systems has just announced the availability of vOneCloud version 3.2.1.

vOneCloud 3.2.1 is based in OpenNebula 5.6.1 and as such it includes all the bug fixes and functionalities introduced in 5.6.1: OpenNebula 5.6.1 Release Notes.

vOneCloud 3.2.1 is a maintenance release with the following minor improvements:

  • Order of elements in list API calls can be selected (ascending or descending).
  • XMLRPC calls can report the client IP and PORT.
  • New quotas for VMS allow you to configure limits for VMs “running”.
  • The Virtual Machines that are associated to a Virtual Router have all actions allowed except nic-attach/dettach.

Also 3.2.1 features the following bugfixes:

  • User quotas error.
  • Migrate vCenter machines provide feedback to oned.
  • Fixed problem migrating vCenter machines to a cluster with a lot of ESX.
  • Improve feedback for ‘mode’ option in Sunstone server.
  • Accounting data does not display.
  • Spurios syntax help on onehost delete.
  • No way for hide Lock/Unlock button for VM in Sunstone view.
  • Update LDAP driver to use new escaping functionality (and issue).
  • Start script base64 enconding fails when using non utf8 characters.
  • Error when creating a vnet from Sunstone using advanced mode.
  • Restricted attributes not enforced on attach disk operation.
  • Improve the dialog when attach nic or instanciated vm in network tab.
  • VNC on ESXi can Break Firewall.
  • Slow monitoring of the live migrating VMs on destination host.
  • onehost sync should ignore vCenter hosts.
  • NIC Model is ignored on VM vCenter Template.
  • Unable to query VMs with non ASCII character.
  • vCenter unimported resources cache not working as expected.
  • Wild importation from vCenter host refactor.
  • Removing CD-ROM from vCenter imported template breaks the template.
  • Error with restricted attributes when instantiating a VM.
  • Onevcenter cli tool few improvements and examples added.
  • OPENNEBULA_MANAGED deleted when updating a VM Template.
  • Unable to update the Running Memory quota.
  • Monitoring VMs fails when there is not datastore associated.

Relevant Links

 

A reminder here that the OpenNebulaConf 2018 will take place in the city of Amsterdam from November 12-13, and we would like nothing more than for you to be part of our team of Sponsors and to take advantage from being an official supporter of the event!

In previous editions of OpenNebulaConf, we offered agendas packed with Hands-on Deployment and Operations Tutorials, Developer working sessions, Networking sessions and talks covering OpenNebula case studies and much more.  We enjoyed presentations from notable OpenNebula users and industry leaders like Akamai, Produban – Santander Bank, CentOS, Runtastic, Puppet Labs, Cloudweavers, RedHat, Deutsche Post, Unity Technologies, BlackBerry, Rental, Citrix, LRZ, FermiLab, Harvard, Trivago and European Space Agency. Typically, we get attendance from an international audience that is highly interested in the Open Source community, and one that is highly networked.  Take advantage, and reap the benefits of sponsorship!

 

What you will get by becoming a Sponsor

Having a presence at OpenNebulaConf 2018 is a great way to get your company in front of the OpenNebula community. There are three available levels of sponsorship: Platinum, Gold, and Silver. The table below shows the cost of each sponsorship package and what is included.


 Do you have further questions?
Have a look at the OpenNebula Conf web page or write us an email

Thank you for your interest in sponsoring OpenNebulaConf 2018!

 

Our monthly newsletter contains the major achievements of the OpenNebula project and its community during the month of September 2018.

Technology

This month the team released OpenNebula 5.6.1 – a new maintenance release of the 5.6 “Blue Flash” series, which addresses several bug fixes, as well as incorporating various feature enhancements.  A few examples include:

  • List subcommands use pagination when in an interactive shell.
  • Order of elements in list API calls can be selected.
  • XMLRPC calls report the client IP and PORT
  • New quotas for VMS allows to configure limits for running VMs “running”.
  • Update Host hook triggers to include all possible states.
  • ‘onezone set’ should allow temporary zone changes.
  • VMs associated to a Virtual Router now feature all lifecycle actions.

OpenNebula 5.6.1 is the first version which includes “Enterprise Add-ons”, which are extended capabilities available for customers of OpenNebula Systems with an active support subscription.

For more details of what is included in OpenNebula 5.6.1, check the Release Notes.

We also closed out the month with the Release of our Host Provisioning Prototype!  This is a project on which we have been working in order to bring additional flexibility and improved efficiency to data center configurations. And it’s just the first step in OpenNebula’s focus to support the deployment of Edge Computing environments.  You can review here the details surrounding this “oneProvision” prototype.

We continue to move ahead with OpenNebula 5.8.  One of the key features that will be included in this upcoming version is OpenNebula’s support for managing LXD containers (Operating System-level Virtualization).  Some of the details and key benefits are outlined in this earlier post.

For vOneCloud, we are currently working on vOne 3.2.1, which will incorporate the same functionalities and advances which we included in our release of OpenNebula 5.6.1.

Lastly, our push for completeness and flexibility continues to grow as we focus on incorporating Python binding, (in addition to Ruby and Java) within OpenNebula.

Community

This section is where we, at OpenNebula, get most excited!  As we all continue to work and focus on bringing real value to the market through the OpenNebula platform, one of the key goals we maintain is to foster genuine interest and engagement amongst our Community.  So much of what are able to achieve depends on you, the OpenNebula users and contributors. So when we see involvement and commitment across the Community, we know that the OpenNebula project is bound for continued growth.

For example, check out this great blog contribution by Inoreader, outlining their step-by-step evolution from a Bare-Metal server architecture to completely virtualizing their infrastructure using OpenNebula and KVM for virtualization, along with StorPool for storage.

Simon Haly, the CEO of LizardFS published a brief about their collaboration with Nodeweaver to create a plugin “scale [your] OpenNebula cloud to Petabytes and beyond.” https://www.prurgent.com/2018-09-18/pressrelease447786.htm

Additionally, in a tweet earlier this month – both simple and direct – LINBIT highlighted the ease with which one can use the OpenNebula image driver,…and then posted a video demo.

The image driver for @opennebula makes #DRBD volumes easily to place VMs on.

Even beyond the direct impacts of our code, many contributors from the OpenNebula Community provided feedback and insight into a recent European Union publication on how to bring Standards and the Open Source community closer together.

And lastly, in the Community space, OpenNebula introduced Michael Abdou, their new Community and Customer Success Manager…read here.

Outreach

In the last week of September, we collaborated with LINBIT as they hosted an OpenNebula TechDay in Frankfurt, Germany!  In addition to offering a FREE Hands-on Tutorial of OpenNebula, we saw presentations and demos carried out by LINBIT, Mellanox, and Hystax.  Here you can check out how things fared at the Frankfurt TechDay.

Keep an eye out for upcoming TechDays, as we try to schedule these sessions periodically throughout the year, in various locations – all free of charge. Or even partner with us to host one of your own!!  OpenNebula TechDay info.

Don’t forget!  The OpenNebula Conference 2018 is right around the corner.  On November 12-13 we will be in Amsterdam, and there will be plenty of which you can take advantage!  From Hands-on Tutorials and Keynotes to Lightning Talks and Community discussions. Take a look at the agenda – plan on joining us in Amsterdam!  http://2018.opennebulaconf.com/

Join our Sponsor Program and be sure to maximize your benefits from the Conference.  The OpenNebula Conference 2018 is currently sponsored by StorPool, LINBIT and NTS as Platinum Sponsors and Virtual Cable SLU and ROOT as Silver Sponsors.  We would love to see you be a part of our team of sponsors!

One more item to put on your calendar is VMworld in Barcelona, on November 5-8, 2018.  We will be there with an exhibit for all things OpenNebula and vOneCloud.  Make sure to swing by booth E422!

More to come in October!!  Stay connected!

Recently, we have shared with you our vision with the disaggregated cloud approach in the blog post A Sneak Preview of the Upcoming Features for Cloud Disaggregation.  Today we are very happy to announce the initial release of the Host Provision prototype, which provides the capability to deploy new physical hosts on selected bare-metal cloud providers, to install and configure them as KVM hypervisors, and to add them into existing OpenNebula clusters as an independent host.

Everything included in a single run of the new command “oneprovision”!

For the initial release, we support the Packet Host and Amazon EC2 (i3.metal instance type) bare-metal providers. The tool is distributed as an operating system package for RHEL/CentOS 7, Debian 9, Ubuntu 16.04 and 18.04, or the source code archive. The package should be installed on your OpenNebula front-end and used under the privileges of user oneadmin.

Source code and packages are ready to download from our GitHub repository.

Detailed documentation which covers how to install, use or customize the tooling is available here.

Last, but not least – it’s all open source!

We would love to hear all your comments, suggestions or help with the problems you may experience.  Please use our community forum to get in touch with us.

Here’s a quick “Thank you” to LINBIT for hosting the OpenNebula TechDay in Frankfurt.  Thank you for your continued partnership and support!   It was a great opportunity for everyone to meet up and share insights and experiences.

We are glad to let everyone know that the slide presentations are now available!

Here’s a link to the complete agenda.

Tino kicking off the OpenNebula TechDay in Frankfurt.

 

Sergio walking the group through the Hands-On Tutorial.

 

A live-demo of OpenNebula and LINSTOR integration done by Philipp Reisner.

The OpenNebula team is pleased to announce the availability of OpenNebula 5.6.1, a new maintenance release of the 5.6 ‘Blue Flash’ series.This version fixes multiple bugs and add some minor features such as:

  • List subcommands use pagination when in an interactive shell.
  • Order of elements in list API calls can be selected.
  • XMLRPC calls report the client IP and PORT
  • New quotas for VMS allows to configure limits for running VMs “running”.
  • Update Host hook triggers to include all possible states.
  • ‘onezone set’ should allow temporary zone changes.
  • VMs associated to a Virtual Router now feature all lifecycle actions.

Check the release notes for the complete set of changes, including bugfixes.

OpenNebula 5.6.1 features the first release of the enterprise addons, available for customers of OpenNebula Systems with an active support subscription.

Relevant Links

Hello OpenNebula Community.

I want to take a brief moment to introduce myself, as I have recently joined the OpenNebula project and I will be working very closely with you.  My name is Michael Abdou, and I am the new Community and Customer Success Manager at OpenNebula. I am extremely thrilled and eager to join this Community, and to have the opportunity to help foster a dynamic and collaborative environment, and to be a part of this effort to bring value and innovation to the marketplace.  I have worked most of my career in the United States for a Fortune 100 Insurance company in the IT Delivery space. I initially started out as a BI developer, and later moved into the management track, leading various teams across Development, Analysis, and Quality Assurance. In the last few years, a lot of my focus began to shift from the “conventional, mainstream” technologies and delivery to a more courageous eye on the innovative and emerging technologies – in my case, helping to make the transition to Big Data architecture, as well as developing Platform as a Service (PaaS) solutions.

A main focus of my job here at OpenNebula will be to help foster an environment of pride and passion about our project, to make sure that everyone has a practical and convenient channel to contribute, to promote and cultivate our spirit of collaboration, and to keep our focus on the growth and success of the OpenNebula project always within sight.  I am here to support you, the Community, and to make sure we all have an exceptional user experience. What I ask of you is that you continue to be curious and open-minded. Share your experiences and insights. In the long run, this will only help our Community to grow. If you have any questions, concerns, or suggestions, please always feel free to reach out to me.

I really look forward to working together with you.

Best regards,
Michael


Prolog

Building and maintaining a cloud RSS reader requires resources. Lots of them! Behind the deceivingly simple user interface there is a complex backend with huge datastore that should be able to fetch millions of feeds in time, store billions of articles indefinitely and make any of them available in just milliseconds – either by searching or simply by scrolling through lists. Even calculating the unread counts for millions of users is enough of a challenge that it deserves a special module for caching and maintaining. The very basic feature that every RSS reader should have – being able to filter only unread articles, requires so much resource power that it contributes to around 30% of the storage pressure on our first-tier databases.

Until recently we were using bare-metal servers to operate our infrastructure, meaning we deployed services like database and application servers directly on the operating system of the server. We were not using virtualization except for some really small micro-services and it was practically one physical server with local storage broken down into several VMs. Last year we have reached a point where we had a 48U (rack-units) rack full of servers. More than half of those servers were databases, each with its own storage. Usually 4 to 8 spinning disks in RAID-10 mode with expensive RAID controllers equipped with cache modules and BBUs. All this was required to keep up with the needed throughput.

There is one big issue with this setup. Once a database server fills up (usually at around 3TB) we buy another one and this one becomes read-only. CPUs and memory on those servers remain heavily underutilized while the storage is full. For a long time we knew we have to do something about it, otherwise we would soon need to rent a second rack, which would have doubled our bill. The cost was not the primary concern. It just didn’t feel right to have a rack full of expensive servers that we couldn’t fully utilize because their storage was full.

Furthermore redundancy was an issue too. We had redundancy on the application servers, but for databases with this size it’s very hard to keep everything redundant and fully backed up. Two years ago we had a major incident that almost cost us an entire server with 3TB of data, holding several months worth of article data. We have completely recovered all data, but that was close.

 

Big changes were needed!

While the development of new features is important, we had to stop for a while and rethink our infrastructure. After some long sessions and meetings with vendors we have made a final decision:

We will completely virtualize our infrastructure and we will use OpenNebula + KVM for virtualization and StorPool for distributed storage.

 

 

Cloud Management

We have chosen this solution not only because it is practically free if you don’t need enterprise support but also because it is proven to be very effective. OpenNebula is now mature enough and has so many use cases it’s hard to ignore. It is completely open source with big community of experts and has an optional enterprise support. KVM is now used as primary hypervisor for EC2 instances in Amazon EWS. This alone speaks a lot and OpenNebula is primarily designed to work with KVM too. Our experience with OpenNebula in the past few months didn’t make us regret this decision even once.

 

Storage

Now a crucial part of any virtualized environment is the storage layer. You aren’t really doing anything if you are still using the local storage on your servers. The whole idea of virtualization is that your physical servers are expendable. You should be able to tolerate a server outage without any data loss or service downtime. How do you achieve that? With a separate, ultra-high performance fault-tolerant storage connected to each server via redundant 10G network.

There’s EMC‘s enterprise solution, which can cost millions and uses proprietary hardware, so it’s out of our league. Also big vendors doesn’t usually play well with small clients like us. There’s a chance that we will just have to sit and wait for a ticket resolution if something breaks, which contradicts our vision.

Then there’s RedHat’s Ceph, which comes completely free of charge, but we were a bit afraid to use it since nobody at the team had the required expertise to run it in production without any doubt that in any event of a crash we will be able to recover all our data. We were on a very tight schedule with this project, so we didn’t have any time to send someone for trainings. Performance figures were also not very clear to us and we didn’t know what to expect. So we decided not to risk with it for our main datacenter. We are now using Ceph in our backup datacenter, but more on that later.

Finally there’s a one still relatively small vendor, that just so happens to be located some 15 minutes away from us – StorPool. They were recommended to us by colleagues running similar services and we had a quick kick-start meeting with them. After the meeting it was clear to us that those guys know what they are doing at the lowest possible level.
Here’s what they do in a nutshell (quote from their website):

StorPool is a block-storage software that uses standard hardware and builds a storage system out of this hardware. It is installed on the servers and creates a shared storage pool from their local drives in these servers. Compared to traditional SANs, all-flash arrays, or other storage software StorPool is faster, more reliable and scalable.

Doesn’t sound very different from Ceph, so why did we chose them? Here are just some of the reasons:

  • They offer full support for a very reasonable monthly fee, saving us the need to have a trained Ceph expert onboard.
  • They promise higher performance than ceph.
  • They have their own OpenNebula storage addon (yeah, Ceph does too, I know)
  • They are a local company and we can always pick up the phone and resolve any issues in minutes rather than hours or days like it usually ends up with big vendors.

 

The migration

You can read the full story of or migration with pictures and detailed explanations in our blog.

I will try to keep it short and tidy here. Basically we managed to slim down our inventory to half of the previous rack-space. This allowed us to reduce our costs, create enough room for later expansion, which immediately and greatly increasing our compute and storage capacities. We have mostly reused our old servers in the process with some upgrades to make the whole OpenNebula cluster homogenous – same CPU model and memory across all servers, which allowed us to use “host=passthrough” to improve VM performance without the risk of VM crash during a live migration. The process took us less than 3 months with the actual migration happening in around two weeks. While we waited for the hardware to arrive we had enough time to play with OpenNebula in different scenarios, try out VM migrations, different storage drivers and overall try to break it while it’s still in test environment.

 

The planning phase

So after we made our choice for virtualization it was time to plan the project. This happened in November 2017, so not very far from now. We have rented a second rack in our datacenter. The plan was to install the StorPool nodes there and gradually move servers and convert them into hypervisors. Once we move everything we will remove the old rack.

We have ordered 3 servers for the StorPool storage. Each of those servers have room for 16 hard-disks. We have only ordered half of the needed hard-disks, because we knew that once we start virtualizing servers, we will salvage a lot of drives that won’t be needed otherwise.

We have also ordered the 10G network switches for the storage network and new Gigabit switches for the regular network to upgrade our old switches. For the storage network we chose Quanta LB8. Those beasts are equipped with 48x10G SFP+ ports, which is more than enough for a single rack. For the regular Gigabit network, we chose Quanta LB4-M. They have additional 2x10G SFP+ modules, which we used to connect the two racks via optic cable.

We also ordered a lot of other smaller stuff like 10G network cards and a lot of CPUs and DDR memory.  Initially we didn’t plan to upgrade the servers before converting them to hypervisors in order to cut costs. However after some benchmarking we found that our current CPUs were not up to the task. We were using mostly dual CPU servers with Intel Xeon E5-2620 (Sandy Bridge) and they were already dragging even before the Meltdown patches. After some research we chose to upgrade all servers to E5-2650 v2 (Ivy Bridge), which is a 16-core (with Hyper-threading) CPU with a turbo frequency of 3.4 GHz. We already had two of these and benchmarks showed two-fold increase in performance compared to E5-2620.

We also decided to boost all servers to 128G of RAM. We had different configurations, but most servers were having 16-64GB and only a handful were already at 128G. So we’ve made some calculations and ordered 20+ CPUs and 500+GB of memory.

After we placed all orders we had about a month before everything arrive, so we used that time to prepare what we can without additional hardware.

 

The preparation phase

We used the whole December and part of January while waiting for our equipment to arrive to prepare for the coming big migration. We learned how OpenNebula works, tried everything that came to our minds to break it and to see how it behaves in different scenarios. This was a very important part to avoid production mistakes and downtime later.
We didn’t wait for our hardware to arrive. Instead we purchased one old but still powerful server with lots of memory to temporarily hold some virtual machines. The idea was to free up some physical servers, so we can shut them down, upgrade them and convert them into hypervisors in the new rack.

 

The execution phase

After the hardware arrived it was time to install it in the new rack. We started with the StorPool nodes and the network. This way we were able to bring up the storage cluster prior to adding any hypervisor hosts.
      
Now it was time for StorPool to finalize the configuration of the storage cluster and to give us green light to connect our first hypervisor to it. Needless to say, they were quick about it and on the next day we were able to bring in two servers from the old rack and to start our first real OpenNebula instance with StorPool as a storage.

After we had our shiny new OpenNebula cluster with StorPool storage fully working it was time to migrate the virtual machines that were still running on local storage. The guys from StorPool helped us a lot here by providing us with a migration strategy that we had to execute for each VM. If there is interest we can post the whole process in a separate post.

From here on we were gradually migrating physical servers to virtual machines. The strategy was different for each server, some of them were databases, others application and web servers. We’ve managed to migrated all of them with several seconds to no downtime at all. At first we didn’t have much space for virtual machines, since we had only two hypervisors, but at each iteration we were able to convert more and more servers at once.

     

After that each server went through a complete change. CPUs were upgraded to 2x E5-2650 v2 and memory was bumped to 128GB. The expensive RAID controllers were removed from the expansion slots and in their place we installed 10G network cards. Large (>2TB) hard drives were removed and smaller drives were installed just for the OS. After the servers were re-equipped, they were installed in the new rack and connected to the OpenNebula cluster. The guys from StorPool configured each server to have a connection to the storage and verified that it is ready for production use. The first 24 leftover 2TB hard drives were immediately put to work into our StorPool.

 

The result

In just couple of weeks of hard work we have managed to migrate everything!

In the new rack we have a total of 120TB of raw storage, 1.5TB of RAM and 400 CPU cores. Each server is connected to the network with 2x10G network interfaces.

That’s roughly 4 times the capacity and 10 times the network performance of our old setup with only half the physical servers!

The flexibility of OpenNebula and StorPool allows us to use the hardware very efficiently. We can spin up virtual machines in seconds with any combination of CPU, memory, storage and network interfaces and later we can change any of those parameters just as easy. It’s the DevOps heaven!

This setup will be enough for our needs for a long time and we have more than enough room for expansion if need arise.

 

Our OpenNebula cluster

We now have more than 60 virtual machines because we have split some physical servers into several smaller VMs with load balancers for better load distribution and we have allocated more than 38TB of storage.

We have 14 hypervisors with plenty of resources available on each of them. All of them are using the same model CPU, which gives us the ability to use the “host=passthrough” setting of QEMU to improve VM performance without the risk of VM crash during a live migration.

We are very happy with this setup. Whenever we need to start a new server, it only takes minutes to spin up a new VM instance with whatever CPU and memory configuration we need. If a server crashes, all VMs will automatically migrate to another server. OpenNebula makes it really easy to start new VMs, change their configurations, manage their lifecycle and even completely manage your networks and IP address pools. It just works!

StorPool on the other hand takes care that we have all the needed IOPS at our disposal whenever we need them.

 

Goodies

We are using Graphite + Grafana to plot some really nice graphs for our cluster.

We have borrowed the solution from here. That’s what’s so great about open software!

Our team is constantly informed for the health and utilization of our cluster. A glance at our wall-mounted TV screen is enough to tell that everything is alright. We can see both our main and backup data centers, both running OpenNebula. It’s usually all green :)

 

StorPool is also using Grafana for their performance monitoring and they have also provided us with access to it, so we can get insights about what the storage is doing at the moment, which VMs are the biggest consumers, etc. This way we can always know when a VM has gone rogue and is stealing our precious IOPS.

 

Epilog

If you made it this far – Congratulations! You have geeked out as much as we did building this infrastructure with the latest and greatest technologies like OpenNebula and StorPool.