Not so long ago, the guys from LINBIT presented their new SDS solution – Linstor. This is a fully free storage based on proven technologies: DRBD, LVM, ZFS. Linstor combines simplicity and well-developed architecture, which allows to achieve stability and quite impressive results.
Today I would like to tell you a little about it and show how easy it can be integrated with OpenNebula using linstor_un – a new driver that I developed specifically for this purpose.
Linstor in combination with OpenNebula will allow you to build a high-performance and reliable cloud, which you can easily deploy on your own infrastructure.
Linstor is not a file system nor a block storage by itself. Linstor is an orchestrator that provides you an abstraction layer to automate the creation of volumes on LVM or ZFS, and replicate them using DRBD9.
But wait, DRBD? – Why automate it and how will it work at all?
Let’s remember the past, when DRBD8 was quite popular and its standard usage implied creating one large block device and cutting it to a lot of small pieces using the same LVM. A lot like mdadm RAID-1 but with network replication.
This approach is not without drawbacks, and therefore, with the advent of DRBD9, the principles of building storage have changed. Now a separate DRBD device is created for each new virtual machine.
The approach with independent block devices allows better utilization of space in the cluster, as well as adds a number of additional features. For example, for each such device, you can determine a number of replicas, their location and individual settings. They are easy to create/delete, make snapshots, resize, enable encryption and much more. It is worth noting that DRBD9 also supports quorum.
Resources and backends
Creating a new block device, Linstor places the necessary number of replicas on different nodes in the cluster. Each such replica will be called a DRBD-resource.
There are two types of resources:
- Data-resource — is a DRBD-device created on a node and backed by LVM or ZFS volume. At the moment there is support for several backends and their number is constantly growing. There is support for LVM, ThinLVM and ZFS, the last two allow you to create and use snapshots.
- Diskless-resource — is a DRBD-device created on a node without any backend, but allows you to use it like a regular block device, all read/write operations will be redirected to data-replicas. The closest analogue to Diskless-resource is iSCSI LUN.
Each DRBD resource can have up to 8 replicas, and only one of them can be active by default — Primary, all others will be Secondary and they will be impossible to use as long as there is at least one Primary resource existing, so they will just replicate all data between themselves.
By mounting a DRBD-device, it automatically becomes Primary, so even a Diskless resource can be Primary according DRBD terminology.
So why is Linstor needed?
As it delegates all resource-intensive tasks to the kernel, Linstor is essentially a regular Java application that allows you to easily automate the creation and management of DRBD resources. At the same time, each resource created by Linstor will be an independent DRBD cluster, which can work independently of the state of the control-plane and other DRBD resources.
Linstor consists of only two components:
- Linstor-controller— The main controller that provides an API for creating and managing resources. It also communicates with the satellites, checking the free space and schedules a new resources on them. It is working in single instance and uses a database that can be either internal (H2) or external (PostgreSQL, MySQL, MariaDB)
- Linstor-satellite— It is installed on all storage nodes and provides an information about free space to the controller, as well as performs the tasks received from the controller to create and delete new volumes and DRBD devices on top of them.
Linstor uses the following key terms:
- Node— a physical server, where DRBD-resources will be created and used.
- Storage Pool— LVM or ZFS pool, created on the node, it will be used for place new DRBD-resources. A diskless pool is also possible – this pool can contain only diskless resources.
- Resource Definition— Resource definition is essentially a prototype of resource that describes its name and all of its properties.
- Volume Definition— Each resource can consist of several volumes, each volume should be of size. These parameters should be described in volume definition.
- Resource— The created instance of the block device, each resource must be placed on certain node and in some storage pool.
I recommend to use Ubuntu for the main system, because it have ready PPA-repository:
add-apt-repository ppa:linbit/linbit-drbd9-stack apt-get update
Or Debian, where Linstor can be installed from the official repository for Proxmox:
wget -O- https://packages.linbit.com/package-signing-pubkey.asc | apt-key add - PVERS=5 && echo "deb http://packages.linbit.com/proxmox/ proxmox-$PVERS drbd-9.0" > \ /etc/apt/sources.list.d/linbit.list apt-get update
Everything is simple here:
apt-get install linstor-controller linstor-client systemctl enable linstor-controller systemctl start linstor-controller
Currently, the Linux kernel comes with an in-tree kernel module DRBD8, unfortunately it does not suit us and we need to install DRBD9:
apt-get install drbd-dkms
As practice shows, most of the difficulties arise precisely because the DRBD8 module is loaded into the system, not DRBD9. However, this is easily checked by executing:
modprobe drbd cat /proc/drbd
If you see version: 9 then everything is fine, unlike if you see version: 8 then something went wrong and you need to take additional steps to find out where is the problem.
Now install linstor-satelliteand drbd-utils:
apt-get install linstor-satellite drbd-utils systemctl enable linstor-satellite systemctl start linstor-satellite
Storage pools and nodes
Let’s take ThinLVM for a backend because it is the easiest and supports snapshots.
Install lvm2 package, if you haven’t already done this, and create a ThinLVM pool on all the storage nodes:
sudo vgcreate drbdpool /dev/sdb sudo lvcreate -L 800G -T drbdpool/thinpool
All further actions should be performed directly on the controller:
Add the nodes:
linstor node create node1 127.0.0.11 linstor node create node2 127.0.0.12 linstor node create node3 127.0.0.13
Create storage pools:
linstor storage-pool create lvmthin node1 data drbdpool/thinpool linstor storage-pool create lvmthin node2 data drbdpool/thinpool linstor storage-pool create lvmthin node3 data drbdpool/thinpool
Now, let’s check created pools:
linstor storage-pool list
If everything is done correctly, then we should see something like:
+-------------------------------------------------------------------------------------------------------+ | StoragePool | Node | Driver | PoolName | FreeCapacity | TotalCapacity | SupportsSnapshots | |-------------------------------------------------------------------------------------------------------| | data | node1 | LVM_THIN | drbdpool/thinpool | 64 GiB | 64 GiB | true | | data | node2 | LVM_THIN | drbdpool/thinpool | 64 GiB | 64 GiB | true | | data | node3 | LVM_THIN | drbdpool/thinpool | 64 GiB | 64 GiB | true | +-------------------------------------------------------------------------------------------------------+
Now let’s try to create our new DRBD-resource:
linstor resource-definition create myres linstor volume-definition create myres 1G linstor resource create myres --auto-place 2
Check created resources:
linstor resource list
Fine! – we see that the resource was created on the first two nodes. We can also try to create a diskless resource on the third one:
linstor resource create --diskless node3 myres
You can always find this device on the nodes under
This is how Linstor works, you can get more information from the official documentation.
Now I’ll show how to integrate it with OpenNebula.
I will not go deep into the process of setting up OpenNebula, because all steps are described in detail in official documentation, to which I recommend you to turn. I will tell only about the integration of OpenNebula with Linstor.
To reach this goal, I wrote my own driver — linstor_un. At the moment it is available as a add-on and installed separately.
The entire installation is done on the frontend OpenNebula nodes, and does not require additional actions on the compute nodes.
First of all, we need to make sure that we have jqand linstor-clientinstalled:
apt-get install jq linstor-client
linstor node listmust display a list of nodes. All OpenNebula compute nodes must be added to the Linstor cluster.
Download and install the addon:
curl -L https://github.com/OpenNebula/addon-linstor_un/archive/master.tar.gz | tar -xzvf - -C /tmp mv /tmp/addon-linstor_un-master/vmm/kvm/* /var/lib/one/remotes/vmm/kvm/ mkdir -p /var/lib/one/remotes/etc/datastore/linstor_un mv /tmp/addon-linstor_un-master/datastore/linstor_un/linstor_un.conf /var/lib/one/remotes/etc/datastore/linstor_un/linstor_un.conf mv /tmp/addon-linstor_un-master/datastore/linstor_un /var/lib/one/remotes/datastore/linstor_un mv /tmp/addon-linstor_un-master/tm/linstor_un /var/lib/one/remotes/tm/linstor_un rm -rf /tmp/addon-linstor_un-master
Now we need to add it into the OpenNebula config, follow simple steps described herefor achieve this.
Then restart OpenNebula:
systemctl restart opennebula
And add our datastores, system one:
cat > system-ds.conf <<EOT NAME="linstor-system" TYPE="SYSTEM_DS" STORAGE_POOL="data" AUTO_PLACE="2" CLONE_MODE="snapshot" CHECKPOINT_AUTO_PLACE="1" BRIDGE_LIST="node1 node2 node3" TM_MAD="linstor_un" EOT onedatastore create system-ds.conf
And images one:
cat > images-ds.conf <<EOT NAME="linstor-images" TYPE="IMAGE_DS" STORAGE_POOL="data" AUTO_PLACE="2" BRIDGE_LIST="node1 node2 node3" DISK_TYPE="BLOCK" DS_MAD="linstor_un" TM_MAD="linstor_un" EOT onedatastore create images-ds.conf
AUTO_PLACEdescribes the amount of data-replicas that will be created for each new image in OpenNebula.
CLONE_MODEdescribes mechanism for clone images during virtual machine creation,
snapshot— will create a snapshot of the image and then deploy a virtual machine from this snapshot,
copy— will create a full copy of the image for each virtual machine.
BRIDGE_LISTit is recommended to specify all nodes that will be used to perform image cloning operations.
The full list of supported options is given in project’s READMEfile.
Installation is finished, now you can download some appliance from the official OpenNebula Marketplaceand instantiate VMs from it.
Link to the project: