Unikernels are the next step in virtualized computing. There is a lot of hubbub right now in the tech-o-sphere about unikernels. In fact, in many circles unikernels are thought to be the Next Big Thing. There is a lot of recent news to justify this perspective. Docker, a key player in Linux Containers, has acquired Unikernel Systems. MirageOS, the library that you use to create unikernels has solid backing from Xen and the Linux Foundation as an incubator project.
Unikernels are not going anyway soon. Having a clear understanding of unikernel technology is important to anybody working in large scale, distributed computing. Thus, I share. In this article I am going to explain the basic concepts behind unikernels and I am going to describe how they sit in the landscape of virtualized computing. Also I am going to talk about an approach to logging from within unikernels that I learned from industry expert, Adam Wick, Ph.D..
So let’s begin.
The Path From Virtual Machines to Unikernels.
A unikernel is a unit of binary code that runs directly on a hypervisor, very much in the same way that a virtual machine (VM) runs on a hypervisor. However, there is a difference. When you create a virtual machine, you are for all intents and purposes creating an abstract computer. As with any computer, your virtual machine will have RAM, disk storage, CPUs and network I/O. (Please see Figure 1, below.)
When you create a virtual machine, the resources you allocate from the host computer are static. For example, as shown above, when you create a VM with 16GB of RAM, a 4 core CPU and 1 TB of storage, that configuration is set. The VM owns the resources and will not give them back. Also, the operating system on a VM can be different than the OS on the host. For example, if your host computer is using Ubuntu, creating a VM that uses Windows is no problem at all.
The important thing to remember is that a VM is a full blown computer with dedicated resources. However, there is a drawback. At runtime, should your 4 core, 16 GB VM use only 1 CPU, 1 GB of RAM, the other resources will sit around unused. Remember, there’s no giving back. Your VM owns the resources whether it uses them or not. Thus, there is the possibility that you can be paying for computing that is never used. Inefficient use of resources is a very real hazard on the Virtual Machine landscape.
While full blown VMs may be useful for some work, there are situations where you don’t need all the overhead that a VM requires. Let’s say you publish a web service that makes a recommendation of stock to buy given hourly activity on the Internet. What do you need? You can get by with threading capability, a web client, a web server and some code that provides analysis intelligence. That’s it. This is where a unikernel comes in.
As mentioned above, a unikernel is a unit of binary code that runs directly on a hypervisor. Think of a unikernel as an application that has been pulled into the operating system’s kernel. However, a unikernel is a special type of application. It’s compiled code that contains not only application code, but also the parts of the operating system that are essential to its purpose. A unikernel thinks it is the only show in town. And because it is compiled, you don’t have to go through a whole lot of installation hocus pocus. It’s a single file. (Please see Figure 2, below.)
Unikernels are very small. For example, a DNS server weighs in at around 449 KB; a web server at ~674 KB. A unikernel contains only the pieces of the operating system required, no more, no less. There are no unnecessary drivers, utilities or graphical rendering components.
Unikernels load into memory very fast. Start times can be measured in milliseconds. Thus, they are well suited to provide transient microservice functionality. A transient microservice is a microservice that is loaded and unloaded into memory to meet an immediate need. The service is not meant to have a long lifespan. It’s loaded, work gets done and then the service is terminated
Given the small size of a unikernel and the fast loading speed, you can deploy thousands of them on a single machine, bringing them on and offline as the need requires.. And because they are compiled, binary artifacts, there are less susceptible to security breaches provided the base compilation is clean.
A unikernel can be assigned a unique IP address. A unique IP address means that the unikernel is discoverable on Internet. The implication is that you can have thousands of unikernels running on a machine each communicating with each other via standard network protocols such as HTTP.
What about Containers?
A container is a virtualization technology in which components are assembled into a single deployment unit called a container. You do not have to use yum or apt-get to install application dependencies. Everything the container needs exists in the container.
A container is an isolated process. Thus, conceptually a container is like a VM in that it thinks that it’s the only show in town.
A container leverages the operating system of the host computer. Hence, there is no mixing and matching. You cannot have a Windows host computer running a Linux container.
Similar to a unikernel a container uses only the resources it needs. They are highly efficient.
You can read more about containers here.
As mentioned above, a microservice lives behind an IP address that makes it available to other services. Typically the microservice provides a single piece of functionality that is stateless. Unlike VMs and Containers that can provide their own storage and authentication mechanisms, the architectural practice for unikernels is to have a strong boundary around the unikernel’s area of concern. The best practice is to have a unikernel use other services to meet needs beyond its concern boundary. For example, going back to the stock analysis service we described above, the purpose of that unikernel is to recommend stocks. If we want to make sure that only authorized parties are using the service we’d make it so our Stock Recommender unikernel uses an authorization/authentication service to allow users access to the service. (Please see Figure 3, below.)
Unikernel technology is best suited for immutable microservices. The internals of a unikernel are written in stone. You do not update parts; there are not external parts. When it comes time to change a unikernel, you create a new binary and deploy that binary.
So far, so good? Excellent! Now that you have an understanding about the purpose and nature of unikernels, let’s move onto logging from within a unikernel.
The Importance of Logging
A unikernel is a compiled binary similar to low level compilation unit such as a C++ executable. Therefore, as of this writing, debugging is difficult. Whereas in interpreted languages such as Java and C#, where you can decompile the deployment unit, (class/jar for Java, dll/exe for C#) to take a look at the code and debug it if necessary, in a unikernel things are locked up tight. If anything goes wrong, there is no way to peek inside. The only way you have to get a sense of what is going on or what’s gone wrong is to read the logs. So, the question becomes how do you log from within a unikernel?
The answer is, carefully. I know, it’s a smartass answer. But it’s an answer that’s appropriate. Unikernel technology is still emerging. There is a lot of activity around unikernel loggers, but as of this writing no off-the-shelf technology on the order of Log4J, Log4Net or NLog have come forward. However, there are some people working in the space and I had the privilege to talk to one of them, Adam Wick, Ph.D.
Adam is Research Lead at Galois, Inc. The folks at Adam’s company solve big problems for organizations such as DARPA and the Department of Homeland Security. Galois does work that is strategic and complex.
Adam has been working with unikernels for a while and shared some wisdom with me. The first thing that Adam told me is that they stick to the Separation of Concerns mantra that I described above. The unikernels they implement do only one thing and when it comes to logging they write loggers that do no more then send log entries to a log service. All their loggers do is create log data. Storing log data and analysis is done by a third party service. (Logentries is one the logging services that Adam tests his logger against.) (Please see Figure 4, below.)
In terms of creating the logger for the unikernel, Adam acknowledges that at the moment, it’s still pretty much a roll your own affair. Nonetheless, Adam recommends that no matter what logger you create, you will do well to support the syslog protocol. To quote Adam,
“syslog is a tremendously easy protocol to implement. You basically just need to prepare a simple string in the right format and send it to the right place. Not a lot of tricky bits.”
The syslog protocol defines a log entry format that supports a predefined set of fields. The protocol has gone through a revision from RFC1364 to RFC5424, and there are some alternative versions out there. So there is a bit of confusion around the implementation. Still, there are usual fields that you can plan to support when implementing log emission. Table 1 below describes these usual fields.
Table 1: The fields of the Syslog protocol
|Facility||Refers to the source of the message, such as a hardware device, a protocol, or a module of the system software.||Log Alert, Line Printer subsystem|
|Severity||The usual severity tags||Debug, Informational, Notice, Warning, Error, Critical, Alert, Emergency|
|Hostname||The host generating the entry||188.8.131.52, MyHostName|
|Timestamp||The timestamp is the local time, in MMM DD HH:MM:SS format|
|Message||A free form message that provides additional information relevant to the log entry||“I am starting up now.”|
Once you get logging implemented in your unikernel, the question becomes what to log. Of course you’ll log typical events that are important to the service your unikernel represents. For example, if your microservice does image manipulation, you’ll want to log when and how the images are processed. If your unikernel is providing data analysis, you’ll want to log the what and when events relevant to the analysis.
However, remember that your unikernel will be one of thousands sitting in a single computer. More is required. To quote Adam Wick again:
“… the problem that we run into is not logging enough information.
I suspect this is true of many systems — in fact, I know I’ve seen at least a few conference talks on it — but understanding complex distributed systems is just a very hard problem. The more information you have to understand why something failed, or why something bogged down, the better. For example, in some of our early work, we didn’t log a very simple message: a “hello, I’m booted and awake!” message. This turned out to be a problem, later, because we were seeing some weird behavior, and it wasn’t clear if it was because of some sort of deep logic problem, or if it was just because our DHCP server was slow.
In hindsight this is probably obvious, but having information like ‘I’m up and my IP configuration is … ‘ and ‘I’m about to crash.’ have proved to be very useful in a couple different situations. Most high-level programming languages have a mechanism to catch exceptions that occur. Catching and sending them out on a log seems obvious, but we did have to remember to make it a practice.”
Another area that Adam covered is ensuring storage availability of your log data. Remember, your unikernel will not store data, only emit it. Thus, as we’ve mentioned, we need to rely upon a service to store the data for later analysis. Logentries allows you to store your data for days or months so you can quickly analyze it and make alerts on your unikernel events. If you need to store the data for longer, you can use Logentries S3 archiving features to maintain a backup of your log data for years (or indefinitely). And, not only does Logentries provide virtually unlimited storage for an unlimited amount of time, but the service is highly available. It’s always there at the IP endpoint. Internally, if a node within Logentries goes down, the system is designed to transfer traffic to another node immediately without any impact to the service consumer. High availability of the logging service is a critical. Remember, the logging service is all your unikernel has to store log data.
Given the closed, compiled nature of unikernels, logging within a unikernel is important practice. In fact, it’s a ripe opportunity for an entrepreneurial developer. It’s not unreasonable to think that as the technology spreads, someone will publish an easy to use unikernel logger. Who knows, that someone might be you.
Putting it all together
Unikernels are going to be with us for a while. They are are small and load fast, making them a good choice for implementing transient microservices. And, because the are small, you can load thousands into a machine. Each unikernel has an unique IP address which means that you can cluster them to a common purpose behind a load balancer. Also, the IP address allows you to use standard communication protocols to have the unikernels talk to each other.
But, for as much activity that is going on in the space, unikernels are still an emerging technology. The toolset to create unikernels is growing. Players such as Docker and Xen are active players in unikernel development.
Given that a unikernel is a compiled binary that is nearly impossible to debug, logging information out of your unikernel implementation becomes critical. However, unikernel loggers are still a roll your own undertaking, for now anyway. Those in the know suggest that when you make a logging client, you emit log data according to the Syslog protocol and use logging services such Logentries to store log data for later analysis,
As I’ve mentioned over and over, unikernels are going to be with us for a while. Those who understand the details of unikernel technology and have made a few mission critical microservices using unikernels will be greatly in demand in no time at all. Unikernels are indeed the next big thing and the next big opportunity.