Linux Containers and the Future Cloud

Linux-based container infrastructure is an emerging cloud technology based on fast and lightweight process virtualization. It provides its users an environment as close as possible to a standard Linux distribution. As opposed to para-virtualization solutions (Xen) and hardware virtualization solutions (KVM), which provide virtual machines (VMs), containers do not create other instances of the operating system kernel. Due to the fact that containers are more lightweight than VMs, you can achieve higher densities with containers than with VMs on the same host (practically speaking, you can deploy more instances of containers than of VMs on the same host).

Another advantage of containers over VMs is that starting and shutting down a container is much faster than starting and shutting down a VM. All containers under a host are running under the same kernel, as opposed to virtualization solutions like Xen or KVM where each VM runs its own kernel. Sometimes the constraint of running under the same kernel in all containers under a given host can be considered a drawback. Moreover, you cannot run BSD, Solaris, OS/x or Windows in a Linux-based container, and sometimes this fact also can be considered a drawback.

The idea of process-level virtualization in itself is not new, and it already was implemented by Solaris Zones as well as BSD jails quite a few years ago. Other open-source projects implementing process-level virtualization have existed for several years. However, they required custom kernels, which was often a major setback. Full and stable support for Linux-based containers on mainstream kernels by the LXC project is relatively recent, as you will see in this article. This makes containers more attractive for the cloud infrastructure. More and more hosting and cloud services companies are adopting Linux-based container solutions. In this article, I describe some open-source Linux-based container projects and the kernel features they use, and show some usage examples. I also describe the Docker tool for creating LXC containers.

The underlying infrastructure of modern Linux-based containers consists mainly of two kernel features: namespaces and cgroups. There are six types of namespaces, which provide per-process isolation of the following operating system resources: filesystems (MNT), UTS, IPC, PID, network and user namespaces (user namespaces allow mapping of UIDs and GIDs between a user namespace and the global namespace of the host). By using network namespaces, for example, each process can have its own instance of the network stack (network interfaces, sockets, routing tables and routing rules, netfilter rules and so on).

Creating a network namespace is very simple and can be done with the following iproute command: ip netns add myns1. With the ip netns command, it also is easy to move one network interface from one network namespace to another, to monitor the creation and deletion of network namespaces, to find out to which network namespace a specified process belongs and so on. Quite similarly, when using the MNT namespace, when mounting a filesystem, other processes will not see this mount, and when working with PID namespaces, you will see by running the ps command from that PID namespace only processes that were created from that PID namespace.

The cgroups subsystem provides resource management and accounting. It lets you define easily, for example, the maximum memory that a process may use. This is done by using cgroups VFS operations. The cgroups project was started by two Google developers, Paul Menage and Rohit Seth, back in 2006, and it initially was called “process containers”. Neither namespaces nor cgroups intervene in critical paths of the kernel, and thus they do not incur a high performance penalty, except for the memory cgroup, which can incur significant overhead under some workloads.

Linux-Based Containers

Basically, a container is a Linux process (or several processes) that has special features and that runs in an isolated environment, configured on the host. You might sometimes encounter terms like Virtual Environment (VE) and Virtual Private Server (VPS) for a container.

The features of this container depend on how the container is configured and on which Linux-based container is used, as Linux-based containers are implemented differently in several projects. I mention the most important ones in this article:

  • OpenVZ: the origins of the OpenVZ project are in a proprietary server virtualization solution called Virtuozzo, which originally was started by a company called SWsoft, founded in 1997. In 2005, a part of the Virtuozzo product was released as an open-source project, and it was called OpenVZ. Later, in 2008, SWsoft merged with a company called Parallels. OpenVZ is used for providing hosting and cloud services, and it is the basis of the Parallels Cloud Server. Like Virtuozzo, OpenVZ also is based on a modified Linux kernel. In addition, it has command-line tools (primarily vzctl) for management of containers, and it makes use of templates to create containers for various Linux distributions. OpenVZ also can run on some unmodified kernels, but with a reduced feature set. The OpenVZ project is intended to be fully mainlined in the future, but that could take quite a long time.
  • Google containers: in 2013, Google released the open-source version of its container stack, lmctfy (which stands for Let Me Contain That For You). Right now, it’s still in the beta stage. The lmctfy project is based on using cgroups. Currently, Google containers do not use the kernel namespaces feature, which is used by other Linux-based container projects, but using this feature is on the Google container project roadmap.
  • Linux-VServer: an open-source project that was first publicly released in 2001, it provides a way to partition resources securely on a host. The host should run a modified kernel.
  • LXC: the LXC (LinuX Containers) project provides a set of userspace tools and utilities to manage Linux containers. Many LXC contributors are from the OpenVZ team. As opposed to OpenVZ, it runs on an unmodified kernel. LXC is fully written in userspace and supports bindings in other programming languages like Python, Lua and Go. It is available in most popular distributions, such as Fedora, Ubuntu, Debian and more. Red Hat Enterprise Linux 6 (RHEL 6) introduced Linux containers as a technical preview. You can run Linux containers on architectures other than x86, such as ARM (there are several how-tos on the Web for running containers on Raspberry PI, for example).

I also should mention the libvirt-lxc driver, with which you can manage containers. This is done by defining an XML configuration file and then running virsh startvirsh console and visrh destroy to run, access and destroy the container, respectively. Note that there is no common code between libvirt-lxc and the userspace LXC project.

LXC Container Management

First, you should verify that your host supports LXC by running lxc-checkconfig. If everything is okay, you can create a container by using one of several ready-made templates for creating containers. In lxc-0.9, there are 11 such templates, mostly for popular Linux distributions. You easily can tailor these templates according to your requirements, if needed. So, for example, you can create a Fedora container called fedoraCT with:


lxc-create -t fedora -n fedoraCT

 

The container will be created by default under /var/lib/lxc/fedoraCT. You can set a different path for the generated container by adding the --lxcpath PATH option.

The -t option specifies the name of the template to be used, (fedora in this case), and the -n option specifies the name of the container (fedoraCT in this case). Note that you also can create containers of other distributions on Fedora, for example of Ubuntu (you need the debootstrap package for it). Not all combinations are guaranteed to work.

You can pass parameters to lxc-create after adding --. For example, you can create an older release of several distributions with the -R or -r option, depending on the distribution template. To create an older Fedora container on a host running Fedora 20, you can run:


lxc-create -t fedora -n fedora19 -- -R 19

 

You can remove the installation of an LXC container from the filesystem with:


lxc-destroy -n fedoraCT

 

For most templates, when a template is used for the first time, several required package files are downloaded and cached on disk under /var/cache/lxc. These files are used when creating a new container with that same template, and as a result, creating a container that uses the same template will be faster next time.

You can start the container you created with:


lxc-start -n fedoraCT

 

And stop it with:


lxc-stop -n fedoraCT

 

The signal used by lxc-stop is SIGPWR by default. In order to use SIGKILL in the earlier example, you should add -k to lxc-stop:


lxc-stop -n fedoraCT -k

 

You also can start a container as a dæmon by adding -d, and then log on into it with lxc-console, like this:


lxc-start -d -n fedoraCT
lxc-console -n fedoraCT

 

The first lxc-console that you run for a given container will connect you to tty1. If tty1 already is in use (because that’s the second lxc-console that you run for that container), you will be connected to tty2 and so on. Keep in mind that the maximum number of ttys is configured by the lxc.tty entry in the container configuration file.

You can make a snapshot of a non-running container with:


lxc-snapshot -n fedoraCT

 

This will create a snapshot under /var/lib/lxcsnaps/fedoraCT. The first snapshot you create will be called snap0; the second one will be called snap1 and so on. You can time-restore the snapshot at a later time with the -r option—for example:


lxc-snapshot -n fedoraCT -r snap0 restoredFdoraCT

 

You can list the snapshots with:


lxc-snapshot -L -n fedoraCT

 

You can display the running containers by running:


lxc-ls --active

 

Managing containers also can be done via scripts, using scripting languages. For example, this short Python script starts the fedoraCT container:


#!/usr/bin/python3

import lxc

container = lxc.Container("fedoraCT")
container.start()

 

Container Configuration

A default config file is generated for every newly created container. This config file is created, by default, in /var/lib/lxc/<containerName>/config, but you can alter that using the --lxcpath PATH option. You can configure various container parameters, such as network parameters, cgroups parameters, device parameters and more. Here are some examples of popular configuration items for the container config file:

  • You can set various cgroups parameters by setting values to the lxc.cgroup.[subsystem name] entries in the config file. The subsystem name is the name of the cgroup controller. For example, configuring the maximum memory a container can use to be 256MB is done by setting lxc.cgroup.memory.limit_in_bytes to be 256MB.
  • You can configure the container hostname by setting lxc.utsname.
  • There are five types of network interfaces that you can set with the lxc.network.type parameter: emptyvethvlan,macvlan and phys. Using veth is very common in order to be able to connect a container to the outside world. By using phys, you can move network interfaces from the host network namespace to the container network namespace.
  • There are features that can be used for hardening the security of LXC containers. You can avoid some specified system calls from being called from within a container by setting a secure computing mode, or seccomp, policy with the lxc.seccomp entry in the configuration file. You also can remove capabilities from a container with the lxc.cap.drop entry. For example, setting lxc.cap.drop = sys_module will create a container without the CAP_SYS_MDOULE capability. Trying to run insmod from inside this container will fail. You also can define Apparmor and SELinux profiles for your container. You can find examples in the LXC README and inman 5 lxc.conf.

Docker

Docker is an open-source project that automates the creation and deployment of containers. Docker first was released in March 2013 with Apache License Version 2.0. It started as an internal project by a Platform-as-a-Service (PaaS) company called dotCloud at the time, and now called Docker Inc. The initial prototype was written in Python; later the whole project was rewritten in Go, a programming language that was developed first at Google. In September 2013, Red Hat announced that it will collaborate with Docker Inc. for Red Hat Enterprise Linux and for the Red Hat OpenShift platform. Docker requires Linux kernel 3.8 (or above). On RHEL systems, Docker runs on the 2.6.32 kernel, as necessary patches have been backported.

Docker utilizes the LXC toolkit and as such is currently available only for Linux. It runs on distributions like Ubuntu 12.04, 13.04; Fedora 19 and 20; RHEL 6.5 and above; and on cloud platforms like Amazon EC2, Google Compute Engine and Rackspace.

Docker images can be stored on a public repository and can be downloaded with the docker pull command—for example, docker pull ubuntu or docker pull busybox.

To display the images available on your host, you can use thedocker images command. You can narrow the command for a specific type of images (fedora, for example) with docker images fedora.

On Fedora, running a Fedora docker container is simple; after installing the docker-io package, you simply start the docker dæmon with systemctl start docker, and then you can start a Fedora docker container with docker run -i -t fedora /bin/bash.

Docker has git-like capabilities for handling containers. Changes you make in a container are lost if you destroy the container, unless you commit your changes (much like you do in git) withdocker commit <containerId> <containerName/containerTag>. These images can be uploaded to a public registry, and they are available for downloading by anyone who wants to download them. Alternatively, you can set a private Docker repository.

Docker is able to create a snapshot using the kernel device mapper feature. In earlier versions, before Docker version 0.7, it was done using AUFS (union filesystem). Docker 0.7 adds “storage plugins”, so people can switch between device mapper and AUFS (if their kernel supports it), so that Docker can run on RHEL releases that do not support AUFS.

You can create images by running commands manually and committing the resulting container, but you also can describe them with a Dockerfile. Just like a Makefile will compile code into a binary executable, a Dockerfile will build a ready-to-run container image from simple instructions. The command to build an image from a Dockerfile is docker build. There is a tutorial about Dockerfiles and their command syntax on the Docker Web site. For example, the following short Dockerfile is for installing the iperfpackage for a Fedora image:


FROM fedora
MAINTAINER Rami Rosen
RUN yum install -y iperf

 

You can upload and store your images for free on the Docker public index. Just like with GitHub, storing public images is free and just requires you to register an account.

The Checkpoint/Restore Feature

The CRIU (Checkpoint/Restore in userspace) project is implemented mostly in userspace, and there are more than 100 little patches scattered in the kernel for supporting it. There were several attempts to implement Checkpoint/Restore in kernel space solely, some of them by the OpenVZ project. The kernel community rejected all of them though, as they were too complex.

The Checkpoint/Restore feature enables saving a process state in several image files and restoring this process from the point at which it was frozen, on the same host or on a different host at a later time. This process also can be an LXC container. The image files are created using Google’s protocol buffer (PB) format. The Checkpoint/Restore feature enables performing maintenance tasks, such as upgrading a kernel or hardware maintenance on that host after checkpointing its applications to persistent storage. Later on, the applications are restored on that host.

Another feature that is very important in HPC is load balancing using live migration. The Checkpoint/Restore feature also can be used for creating incremental snapshots, which can be used after a crash occurs. As mentioned earlier, some kernel patches were needed for supporting CRIU; here are some of them:

  • A new system call named kcmp() was added; it compares two processes to determine if they share a kernel resource.
  • A socket monitoring interface called sock_diag was added to UNIX sockets in order to be able to find the peer of a UNIX domain socket. Before this change, the ss tool, which relied on parsing of /proc entries, did not show this information.
  • A TCP connection repair mode was added.
  • procfs entry was added (/proc/PID/map_files).

Let’s look at a simple example of using the criu tool. First, you should check whether your kernel supports Checkpoint/Restore, by running criu check --ms. Look for a response that says "Looks good."

Basically, checkpointing is done by:


criu dump -t <pid>

 

You can specify a folder where the process state files will be saved by adding -D folderName.

You can restore with criu restore <pid>.

Summary

In this article, I’ve described what Linux-based containers are, and I briefly explained the underlying cgroups and namespaces kernel features. I have discussed some Linux-based container projects, focusing on the promising and popular LXC project. I also looked at the LXC-based Docker engine, which provides an easy and convenient way to create and deploy LXC containers. Several hands-on examples showed how simple it is to configure, manage and deploy LXC containers with the userspace LXC tools and the Docker tools.

Due to the advantages of the LXC and the Docker open-source projects, and due to the convenient and simple tools to create, deploy and configure LXC containers, as described in this article, we presumably will see more and more cloud infrastructures that will integrate LXC containers instead of using virtual machines in the near future. However, as explained in this article, solutions like Xen or KVM have several advantages over Linux-based containers and still are needed, so they probably will not disappear from the cloud infrastructure in the next few years.

Acknowledgements

Thanks to Jérôme Petazzoni from Docker Inc. and to Michael H. Warfield for reviewing this article.

Resources

Google Containers: https://github.com/google/lmctfy

OpenVZ: http://openvz.org/Main_Page

Linux-VServer: http://linux-vserver.org

LXC: http://linuxcontainers.org

libvirt-lxc: http://libvirt.org/drvlxc.html

Docker: https://www.docker.io

Docker Public Registry: https://index.docker.io

(Via LinuxJournal.com)

AWS OpsWorks in the Virtual Private Cloud

I am pleased to announce support for using AWS OpsWorks with Amazon Virtual Private Cloud (Amazon VPC). AWS OpsWorks is a DevOps solution that makes it easy to deploy, customize and manage applications. OpsWorks provides helpful operational features such as user-based ssh management, additional CloudWatch metrics for memory and load, automatic RAID volume configuration, and a variety of application deployment options. You can optionally use the popular Chef automation platform to extend OpsWorks using your own custom recipes. With VPC support, you can now take advantage of the application management benefits of OpsWorks in your own isolated network. This allows you to run many new types of applications on OpsWorks.

For example, you may want a configuration like the following, with your application servers in a private subnet behind a public Elastic Load Balancer (ELB). This lets you control access to your application servers. Users communicate with the Elastic Load Balancer which then communicates with your application servers through the ports you define. The NAT allows your application servers to communicate with the OpsWorks service and with Linux repositories to download packages and updates.

To get started, we’ll first create this VPC. For a shortcut to create this configuration, you can use a CloudFormation template. First, navigate to theCloudFormation console and select Create Stack.  Give your stack a name, provide the template URL http://cloudformation-templates-us-east-1.s3.amazonaws.com/OpsWorksinVPC.template and select Continue. Accept the defaults and select Continue. Create a tag with a key of “Name” and a meaningful value. Then create your CloudFormation stack.

When your CloudFormation stack’s status shows “CREATE_COMPLETE”, take a look at the outputs tab; it contains several IDs that you will need later, including the VPC and subnet IDs.

You can now create an OpsWorks stack to deploy a sample app in your new private subnet. Navigate to the AWS OpsWorks console and click Add Stack. Select the VPC and private subnet that you just created using the CloudFormation template.

Next, under Add your first layer, click Add a layer. For Layer type box, select PHP App Server. Select the Elastic Load Balancer created in by the CloudFormation template to the Layer and then click Add layer.

Next, in the layer’s Actions column click Edit. Scroll down to the Security Groups section and select the Additional Group with OpsWorksSecurityGroup in the name. Click the + symbol, then click Save.

Next, in the navigation pane, click Instances, accept the defaults, and then click Add an Instance. This creates the instance in the default subnet you set when you created the stack.

Under PHP App Server, in the row that corresponds to your instance, click start in the Actions column.

You are now ready to deploy a sample app to the instance you created. An app represents code you want to deploy to your servers. That code is stored in a repository, such as Git or Subversion. For this example, we’ll use the SimplePHPApp application from the Getting Started walkthrough.  First, in the navigation pane, click Apps. On the Apps page, click Add an app. Type a name for your app and scroll down to the Repository URL and set Repository URL to git://github.com/amazonwebservices/opsworks-demo-php-simple-app.git, and Branch/Revision to version1. Accept the defaults for the other fields.

When all the settings are as you want them, click Add app. When you first add a new app, it isn’t deployed yet to the instances for the layer. To deploy your App to the instance in PHP App Server layer, under Actions, click Deploy.

Once your deployment has finished, in the navigation pane, click Layers. Select the Elastic Load Balancer for your PHP App Server layer. The ELB page shows the load balancer’s basic properties, including its DNS name and the health status of the associated instances. A green check indicates the instance has passed the ELB health checks (this may take a minute). You can then click on the DNS name to connect to your app through the load balancer.

You can try these new features with a few clicks of the AWS Management Console. To learn more about how to launch OpsWorks instances inside a VPC, see the AWS OpsWorks Developer Guide.

You may also want to sign up for our upcoming AWS OpsWorks Webinar on September 12, 2013 at 10:00 AM PT. The webinar will highlight common use cases and best practices for how to set up AWS OpsWorks and Amazon VPC.

– Chris Barclay, Senior Product Manager

(source: Amazon Web Services blog)

 

More Database Power – 20,000 IOPS for MySQL With the CR1 Instance

If you are a regular reader of this blog, you know that I am a huge fan of the Amazon Relational Database Service (RDS). Over the course of the last couple of years, I have seen that my audiences really appreciate the way that RDS obviates a number of tedious yet mission-critical tasks that are an inherent responsibility when running a relational database. There’s no joy to be found in keeping operating systems and database engines current, creating and restoring backups, scaling hardware up and down, or creating an architecture that provides high availability.

Today we are making RDS even more powerful by adding a new high-end database instance class. The new db.cr1.8xlarge instance type gives you plenty of memory, CPU power, and network throughput to allow your MySQL 5.6 applications to perform at up to 20,000 IOPS. This is a 60% improvement over the previous high-water mark of 12,500 IOPS and opens the door to database-driven applications that are even more demanding than before. Here are the specs:

  • 64-bit platform
  • 244 GB of RAM
  • 88 ECU (16 hyperthreaded virtual cores each delivering 2.75 ECU)
  • High-performance networking

This new instance type is available in the US East (Northern Virginia), US West (Oregon), EU (Ireland), and Asia Pacific (Tokyo) Regions and you can start using it today!

(source: Amazon Web Services blog)