Libvirt - The Unsung Hero of Cloud Computing
Initially my intention was to write an article on Round up of open source Cloud Management Platforms (CMP), but while doing research found one piece of software library so fundamental, that it holds the key to very existence of Cloud Computing services and platforms as we know it today (that includes Amazon AWS, OpenStack and CloudStack). So I decided to postpone my idea and started to write an article on this Unsung hero of Cloud Computing called libvirt , which I believe many people won’t have heard of. Obviously talking about software library tend to be technical, so this article will have a technical tone, but I will attempt to make it readable for everyone interested in Cloud Computing.
Libvirt is an open source API, daemon and management tool for managing platform virtualzation and these APIs are widely used in the orchestration layer of Cloud Management Platform. Libvirt makes it possible to control and manage millions of compute nodes, storage and network devices via common programmable interface. Its like being able to control and orchestrate fleet of millions of car irrespective of different manufacturer, model, or engine through a common interface from a single car (it can be more then one car for redundancy and high availability). What started as a management API for Xen, today has been extended to support major components of Cloud Computing platforms.
Libvirt Goals & Architecture
Libvirt defines following terms for its goals :
- Node is a single physical machine.
- Hypervisor is a layer of softeare allowing to virtualize a node in a set of physical machines with possible different configurations that the node itself.
- Domain is an instance of an operating system (or subsystem in case of container virtualization like OpenVZ and lxc) running on a virtualized machine provided by the hypervisor.
Based on above terms “The goal of libvirt is to provide a common and stable layer sufficient to securely manage domains on a node, possibly remote”. So libvirt should provide all APIs needed to do the management, such as: provision, create, modify, monitor, control, migrate and stop the domains - within the limits of the support of the hypervisor for those operations. This implies following sub-goals:
- All API can be carried remotely though secure APIs
- While most API will be generic in term of hypervisor or Host OS, some API may be targeted to a single virtualization environment as long as the semantic for the operations from a domain management perspective is clear
- the API should allow to do efficiently and cleanly all the operations needed to manage domains on a node, including resource provisioning and setup
- the API will not try to provide high level virtualization policies or multi-nodes management features like load balancing, but the API should be sufficient so they can be implemented on top of libvirt
- stability of the API is a big concern, libvirt should isolate applications from the frequent changes expected at the lower level of the virtualization framework
- the node being managed may be on a different physical machine than the management program using libvirt, to this effect libvirt supports remote access, but should only do so by using secure protocols.
- libvirt will provide APIs to enumerate, monitor and use the resources available on the managed node, including CPUs, memory, storage, networking, and NUMA partitions.
So libvirt is intended to be a building block for higher level management tools and for applications focusing on virtualization of a single node (the only exception being domain migration between node capabilities which involves more than one node).
Libvirt Driver Based Architecture
Libvirt to support wide variety of hypervisor implements a driver-based architecture. Based on car analogy, it means delegating the actual implementation of control of different cars to the drivers specifically designed for make, model and engine of the specific car. Libvirt currently supports:
- LXC - Linux Containers
- Test - Used for testing
- UML - User Mode Linux
- VMware ESX
- VMware Workstation/Player
- Microsoft Hyper-V
- IBM PowerVM (phyp)
- Remote - Accessing libvirt on remote node through libvirtd (libvirt daemon)
- Directory backend
- Local filesystem backend
- Network filesystem backend
- Logical Volume Manager (LVM) backend
- Disk backend
- iSCSI backend
- SCSI backend
- Multipath backend
- RBD (RADOS Block Device) backend
- Sheepdog backend
- VEPA (Virtual Ethernet Port Aggregator)
Libvirt API structure 
The figure above shows the five main objects exported by the API:
- Represents the connection to a hypervisor. Use one of the virConnectOpen functions to obtain connection to the hypervisor which is then used as a parameter to other connection API’s.
- Represents one domain either active or defined (i.e. existing as permanent config file and storage but not currently running on that node). The function virConnectListAllDomains lists all the domains for the hypervisor.
- Represents one network either active or defined (i.e. existing as permanent config file and storage but not currently activated). The function virConnectListAllNetworks lists all the virtualization networks for the hypervisor.
- Represents one storage volume generally used as a block device available to one of the domains. The function virStorageVolLookupByPath finds the storage volume object based on its path on the node.
- Represents a storage pool, which is a logical area used to allocate and store storage volumes. The function virConnectListAllStoragePools lists all of the virtualization storage pools on the hypervisor. The function virStoragePoolLookupByVolume finds the storage pool containing a given storage volume.
These names follow C conventions, but developers of cloud computing platforms and applications do not need to use C directly, there are language bindings available for major languages. Currently libvirt API language bindings  are available for C#, Java, OCaml, Perl, PHP, Python, Ruby.
Domain Management Architecture
There are two distinct means for domain management using libvirt API.
1. Single node domain management
As illustrated in the figure above in this mode applications (cloud management platform i.e. CMP applications) and domains exist on the same node. In this scenario applications directly works through the libvirt api on the host operating system (os) to control and manage the local domains.
2. Multi node domain management
As shown in the figure above, applications (CMP applications) using libvirt API and the domains to manage or control are on separate nodes. In this mode a special domain called libvirtd (libvirt daemon) needs to run on remote nodes. The management application nodes use the nodes underlying network communicattion to communicate with remote libvirtd through the local libvirt using custom protocol. Actually libvirt uses Remote  driver for communicating with remote node and remote API calls are handled synchronously. Remote driver for libvirt supports a range of transports like:
- TLS 1.0 (SSL 3.1) authenticated and encrypted TCP/IP socket, usually listening on a public port number. To use this you will need to generate client and server certificates. The standard port is 16514. This is the default transport, if no other is specified.
- nix domain socket. Since this is only accessible on the local machine, it is not encrypted, and uses Unix permissions or SELinux for authentication. The standard socket names are /var/run/libvirt/libvirt-sock and /var/run/libvirt/libvirt-sock-ro (the latter for read-only connections).
- Transported over an ordinary ssh (secure shell) connection. Requires Netcat (nc) installed and libvirtd should be running on the remote machine. You should use some sort of ssh key management (eg. ssh-agent) otherwise programs which use this transport will stop to ask for a password.
- Any external program which can make a connection to the remote machine by means outside the scope of libvirt.
- nencrypted TCP/IP socket. Not recommended for production use, this is normally disabled, but an administrator can enable it for testing or use over a trusted network. The standard port is 16509.
- Transport over the SSH protocol using libssh2 instead of the OpenSSH binary. This transport uses the libvirt authentication callback for all ssh authentication calls and therefore supports keyboard-interactive authentication even with graphical management applications. As with the classic ssh transport netcat is required on the remote side.
Libvirt Project 
According to statistics on ohloh libvirt in a nutshell:
- 15,188 commits made by 331 contributors representing 481,506 lines of code
- Mostly written in C
- Established, mature codebase maintained by a very large development team with increasing year on year commits.
- Estimated 128 years of efforts (COCOMO model)
Libvirt is one very important library on whose giant shoulders cloud computing services and platforms like Amazon AWS, Google Compute Engine, OpenStack, CloudStack, Eucalyptus and numberous others are standing. Also this API enables developers and companies to build new and innovative cloud computing services or platforms and build awesome applications or services on top of it. Libvirt started in 2005 and with growing popularity of cloud computing, this project will continue to grow. But in most of the conferences, talks and papers related to Cloud Computing I did not find much coverage of libvirt, so while researching Cloud Management platform thought of writing and article on it. Kudos to all the libvirt code contributors for building a beautiful abstraction layer and making life easier for cloud computing services and platform developers. In spite of not getting as much press and coverage as mainstream cloud computing platforms like OpenStack, CloudStack or Eucalyptus the growth of the project, community and code commits are heartening.
|||libvirt - VIRTUALIZATION API|
|||libvirt - Terminology and goals|
|||libvirt - API concepts|
|||libvirt - Bindings for other languages|
|||libvirt project statistics on ohloh|
|||libvirt - Remote Support|