Docker – a Linux Container.

(Today I read a article a Docker. It’s a good thing and I want share it to you)

About Docker

Docker is an open-source engine that automates the deployment of any application as a lightweight, portable, self-sufficient container that will run virtually anywhere.

Docker containers can encapsulate any payload, and will run consistently on and between virtually any server. The same container that a developer builds and tests on a laptop will run at scale, in production, on VMs, bare-metal servers, OpenStack clusters, public instances, or combinations of the above.

Common use cases for Docker include:

  • Automating the packaging and deployment of applications
  • Creation of lightweight, private PAAS environments
  • Automated testing and continuous integration/deployment
  • Deploying and scaling web apps, databases and backend services

Table of contents

Background

Fifteen years ago, virtually all applications were written using well defined stacks of services and deployed on a single monolithic, proprietary server. Today, developers build and assemble applications using a multiplicity of the best available services, and must be prepared for those applications to be deployed across a multiplicity of different hardware environments, included public, private, and virtualized servers.

Figure 1: The Evolution of IT

This sets up the possibility for:

  • Adverse interactions between different services and “dependency hell”
  • Challenges in rapidly migrating and scaling across different hardware* The impossibility of managing a matrix of multiple different services deployed across multiple different types of hardware

Figure 2: The Challenge of Multiple Stacks and Multiple Hardware Environments

Or, viewed as a matrix, we can see that there is a huge number of combinations and permutations of applications/services and hardware environments that need to be considered every time an application is written or rewritten. This creates a difficult situation for both the developers who are writing applications and the folks in operations who are trying to create a scalable, secure, and highly performance operations environment.

Figure 3: Dynamic Stacks and Dynamic Hardware Environments Create an NxN Matrix

How to solve this problem? A useful analogy can be drawn from the world of shipping. Before 1960, most cargo was shipped break bulk. Shippers and carriers alike needed to worry about bad interactions between different types of cargo (e.g. if a shipment of anvils fell on a sack of bananas). Similarly, transitions between different modes of transport were painful. Up to half the time to ship something could be taken up as ships were unloaded and reloaded in ports, and in waiting for the same shipment to get reloaded onto trains, trucks, etc. Along the way, losses due to damage and theft were large. And, there was an n X n matrix between a multiplicity of different goods and a multiplicity of different transport mechanisms.

Figure 4: Analogy: Shipping Pre-1960

Fortunately, an answer was found in the form of a standard shipping container. Any type of goods, from pistachios to Porsches, can be packaged inside a standard shipping container. The container can then be sealed, and not re-opened until it reaches its final destination. In between, the containers can be loaded and unloaded, stacked, transported, and efficiently moved over long distances. The transfer from ship to gantry crane to train to truck can be automated, without requiring a modification of the container. Many authors credit the shipping container with revolutionizing both transportation and world trade in general. Today, 18 million standard containers carry 90% of world trade.

Figure 5: Solution to Shipping Challenge Was a Standard Container

To some extent, Docker can be thought of as an intermodal shipping container system for code.

Figure 6: The Solution to Software Shipping is Also a Standard Container System

Docker enables any application and its dependencies to be packaged up as a lightweight, portable, self-sufficient container. Containers have standard operations, thus enabling automation. And, they are designed to run on virtually any Linux server. The same container that that a developer builds and tests on a laptop will run at scale, in production, on VMs, bare-metal servers, OpenStack clusters, public instances, or combinations of the above.

In other words, developers can build their application once, and then know that it can run consistently anywhere. Operators can configure their servers once, and then know that they can run any application.

Why Should I Care (For Developers)

Build once…run anywhere

“Docker interests me because it allows simple environment isolation and repeatability. I can create a run-time environment once, package it up, then run it again on any other machine. Furthermore, everything that runs in that environment is isolated from the underlying host (much like a virtual machine). And best of all, everything is fast and simple.”

Why Should I Care (For Devops)

Configure once…run anything

  • Make the entire lifecycle more efficient, consistent, and repeatable
  • Increase the quality of code produced by developers
  • Eliminate inconsistencies between development, test, production, and customer environments
  • Support segregation of duties
  • Significantly improves the speed and reliability of continuous deployment and continuous integration systems
  • Because the containers are so lightweight, address significant performance, costs, deployment, and portability issues normally associated with VMs

What are the Main Features of Docker

It is useful to compare the main features of Docker to those of shipping containers. (See the analogy above).

Physical Containers Docker
Content Agnostic The same container can hold almost any kind of cargo Can encapsulate any payload and its dependencies
Hardware Agnostic Standard shape and interface allow same container to move from ship to train to semi-truck to warehouse to crane without being modified or opened Using operating system primitives (e.g. LXC) can run consistently on virtually any hardware – VMs, bare metal, openstack, public IAAS, etc. – without modification
Content Isolation and Interaction No worry about anvils crushing bananas. Containers can be stacked and shipped together Resource, network, and content isolation. Avoids dependency hell
Automation Standard interfaces make it easy to automate loading, unloading, moving, etc. Standard operations to run, start, stop, commit, search, etc. Perfect for devops: CI, CD, autoscaling, hybrid clouds
Highly efficient No opening or modification, quick to move between waypoints Lightweight, virtually no perf or start-up penalty, quick to move and manipulate
Separation of duties Shipper worries about inside of box, carrier worries about outside of box Developer worries about code, Ops worries about infrastructure.

Figure 7: Main Docker Features

For a more technical view of features, please see the following:

  • Filesystem isolation: each process container runs in a completely separate root filesystem.
  • Resource isolation: system resources like cpu and memory can be allocated differently to each process container, using cgroups.
  • Network isolation: each process container runs in its own network namespace, with a virtual interface and IP address of its own.
  • Copy-on-write: root filesystems are created using copy-on-write, which makes deployment extremely fast, memory-cheap and disk-cheap.
  • Logging: the standard streams (stdout/stderr/stdin) of each process container is collected and logged for real-time or batch retrieval.
  • Change management: changes to a container’s filesystem can be committed into a new image and re-used to create more containers. No templating or manual configuration required.
  • Interactive shell: docker can allocate a pseudo-tty and attach to the standard input of any container, for example to run a throwaway interactive shell.

What are the Basic Docker Functions

Docker makes it easy to build, modify, publish, search, and run containers. The diagram below should give you a good sense of the Docker basics. With Docker, a container comprises both an application and all of its dependencies. Containers can either be created manually or, if a source code repository contains a DockerFile, automatically. Subsequent modifications to a baseline Docker image can be committed to a new container using the Docker Commit Function and then Pushed to a Central Registry.

Containers can be found in a Docker Registry (either public or private), using Docker Search. Containers can be pulled from the registry using Docker Pull and can be run, started, stopped, etc. using Docker Run commands. Notably, the target of a run command can be your own servers, public instances, or a combination.

Figure 8: Basic Docker Functions

For a full list of functions, please go to: http://docs.docker.io/en/latest/commandline/

Docker runs three ways: * as a daemon to manage LXC containers on your Linux host (sudo docker -d) * as a CLI which talks to the daemon’s REST API (docker run …) * as a client of Repositories that let you share what you’ve built (docker pull, docker commit).

How Do Containers Work? (And How are they Different From VMs)

A container comprises an application and its dependencies. Containers serve to isolate processes which run in isolation in userspace on the host’s operating system.

This differs significantly from traditional VMs. Traditional, hardware virtualization (e.g. VMWare, KVM, Xen, EC2) aims to create an entire virtual machine. Each virtualized application contains not only the application (which may only be 10′s of MB) along with the binaries and libraries needed to run that application, and an entire Guest operating System (which may measure in 10s of GB).

The picture below captures the difference

Figure 9: Containers vs. Traditional VMs

Since all of the containers share the same operating system (and, where appropriate, binaries and libraries), they are significantly smaller than VMs, making it possible to store 100s of VMs on a physical host (versus a strictly limited number of VMs). In addition, since they utilize the host operating system, restarting a VM does not mean restarting or rebooting the operating system. Thus, containers are much more portable and much more efficient for many use cases.

With Docker Containers, the efficiencies are even greater. With a traditional VM, each application, each copy of an application, and each slight modification of an application requires creating an entirely new VM.

As shown above, a new application on a host need only have the application and its binaries/libraries. There is no need for a new guest operating system.

If you want to run several copies of the same application on a host, you do not even need to copy the shared binaries.

Finally, if you make a modification of the application, you need only copy the differences.

Figure 10: Mechanism to Make Docker Containers Lightweight

This not only makes it efficient to store and run containers, it also makes it extremely easy to update applications. As shown in the next figure, updating a container only requires applying the differences.

Figure 11: Modfiying and Updating Containers

What is the Relationship between Docker and dotCloud?

Docker is an open-source implementation of the deployment engine which powers dotCloud, a popular Platform-as-a-Service. It benefits directly from the experience accumulated over several years of large-scale operation and support of hundreds of thousands of applications and databases. dotCloud is the chief sponsor of the Docker project, and dotCloud CTO is the original architect and current, overall maintainer. While several dotCloud employees work on Docker full-time, Docker is a true community project, with hundreds of non-Docker contributors and a complete open design philosophy. All pulls, pushes, forks, bugs, issues, and roadmaps are available for viewing, editing, and commenting on GitHub.

What Are Some Cool Use Cases For Docker?

Docker is a powerful tool for many different use cases. Here are some great early use cases for Docker, as described by members of our community.

Use Case Examples Link
Build your own PaaS Dokku – Docker powered mini-Heroku. The smallest PaaS implementation you’ve ever seen http://bit.ly/191Tgsx
Web Based Environment for Instruction JiffyLab – web based environment for the instruction, or lightweight use of, Python and UNIX shell http://bit.ly/12oaj2K
Easy Application Deployment Deploy Java Apps With Docker = Awesome http://bit.ly/11BCvvu
Running Drupal on Docker http://bit.ly/15MJS6B
Installing Redis on Docker http://bit.ly/16EWOKh
Create Secure Sandboxes Docker makes creating secure sandboxes easier than ever http://bit.ly/13mZGJH
Create your own SaaS Memcached as a Service http://bit.ly/11nL8vh
Automated Application Deployment Push-button Deployment with Docker http://bit.ly/1bTKZTo
Continuous Integration and Deployment Next Generation Continuous Integration & Deployment with dotCloud’s Docker and Strider http://bit.ly/ZwTfoy
Lightweight Desktop Virtualization Docker Desktop: Your Desktop Over SSH Running Inside Of A Docker Container http://bit.ly/14RYL6x

More things you would like to know:

Getting started with Docker

Click here to get started, including full instructions, code, and documentation. We’ve also prepared an interactive tutorial to help you get started.

Getting a copy of the source code

The Docker project is hosted on GitHub. Click here to visit the repository.

Contribute to the docker community

Head on over to our community page

(via Docker.io)

 

Advanced Hard Drive Caching Techniques

With the introduction of the solid-state Flash drive, performance came to the forefront for data storage technologies. Prior to that, software developers and server administrators needed to devise methods for which they could increase I/O throughput to storage, most of which resulted in low capacity caching to random access memory (RAM) or a RAM drive. Although not as fast as RAM, the Flash drive was almost a dream come true, but it had its limitations—one of which was its low capacities packaged in the NAND-based chips. The traditional spinning disk drive provided users’ desired capacities but lacked in speedy accessibility. Even with the 6Gb SATA protocol, sequential data access at best performed at approximately 150MB per second (or MB/s) for both read and write operations, while random access varied between 2–5MB/s as the seeking across multiple sectors laid out in multiple tracks across multiple spinning platters proved to be an extremely disruptive bottleneck. The solid-state drive (SSD) with no movable components significantly decreased these access latencies, thus rendering this bottleneck almost nonexistent.

Even today, the consumer SSD cannot compare to the capacities provided by the magnetic hard disk drive (or HDD), which is why in this article I intend to introduce readers to proven methods for obtaining near SSD performance with the traditional HDD. Multiple open-source projects exist that can achieve this, all but one of which utilizes an SSD as a caching node, and the other caches to RAM. The device drivers I cover here are dm-cache, FlashCache and the RapidDisk/RapidCache suite; I also briefly discuss bcache and EnhanceIO.

Note:

To build the kernel modules shown in this article, you need to have either the full kernel source or the kernel headers installed for your current kernel image revision.

In my examples, I am using a commercial SATA III (6Gbps) SSD with an average performance of the following:

  • Sequential read: 231MB/s
  • Sequential write: 74MB/s
  • Random read: 230MB/s
  • Random write: 72MB/s

This SSD provides the caching layer for a slower mechanical SATA III HDD that performs at the following:

  • Sequential read: 115MB/s
  • Sequential write: 72MB/s
  • Random read: 2MB/s
  • Random write: 2MB/s

In my environment, the SSD is labeled as /dev/sdb, and the HDD is /dev/sda3. These are non-intrusive transparent caching solutions intended to achieve the performance benefits of SSDs. They can be added and removed to existing storage targets without issue or data loss (assuming that all cached data has been flushed to disk successfully). Also, all the examples here showcase a write-back caching scheme with the exception of RapidCache, which instead will be used in write-through mode. In write-back mode, newly written data is cached but not immediately written to the destination target. Write-through mode always will write new data to the target while still maintaining it in cache for future reads.

Note:

The benchmarks shown here were obtained by using FIO, a file I/O benchmarking and test tool designed for data storage technologies. It is maintained by Linux kernel developer Jens Axboe. Unless noted otherwise, all captured I/O is written at the typical 4KB page size, asynchronously to the storage target 32 transfers at a time (that is, queue depth).

dm-cache

dm-cache has been around for quite some time—at least since 2006. It originally made its debut as a research project developed by Dr Ming Zhao through his summer internship at IBM research. The dm-cache module just recently was integrated into the Linux kernel tree as of version 3.9. Whether you choose to enable it in a recently downloaded kernel or compile it from the official project site, the results will be the same. To load the module, you need to invoke modprobe or insmod:

$ sudo modprobe dm-cache

Now that the module is loaded, you need to inform that module about which drive to point to for the cache and which to point to for the destination. The dm-cache project site provides a Perl script to simplify this process called dmc-setup.pl. For example, if I wanted to use the entire SSD in write-back caching mode with a 4KB block size, I would type:

$ sudo perl dmc-setup.pl -o /dev/sda3 -c /dev/sdb -n cache -b 8 -w 

This script is a wrapper to the equivalent dmsetup command below:

$ echo 0 20971520 cache /dev/sda3 /dev/sdb 0 8 65536 16 1 | 
 ↪dmsetup create cache

The dm-cache documentation hosted on the project site provides details on each parameter field, so I don’t cover them here.

You may notice that in both examples, I named the mapping to both drives “cache”. So, when I need to access the drive mapping, I must refer to it as “cache”.

The following mapping passes all data requests to the caching driver, which in turn performs the necessary magic to process the requests either by handling it entirely out of cache or both the cache and the slower device:

$ ls -l /dev/mapper
total 0
lrwxrwxrwx 1 root root       7 Jun 30 12:10 cache -> ../dm-0
crw------- 1 root root 10, 236 Jun 30 11:52 control

Just like with any other device-mapper-enabled target, I also can pull up detailed mapping data:

$ sudo dmsetup status cache
0 20971520 cache stats: reads(83), writes(0), 
 ↪cache hits(0, 0.0),replacement(0), replaced dirty blocks(0)

$ sudo dmsetup table cache
0 20971520 cache conf: capacity(256M), associativity(16), 
 ↪block size(4K), write-back

If the target drive already is formatted with data on it, you just need to mount it; otherwise, format it to your specified filesystem:

$ sudo mke2fs -F /dev/mapper/cache 

Remember, these solutions are non-intrusive, so if you have existing data that needs to remain on that disk drive, skip the above step and go straight to mounting it for data accessibility:

    $ sudo mount /dev/mapper/cache /mnt/cache
    $ df|grep cache  /dev/mapper/cache  10321208 1072632   8724288  11% /mnt/cache

Using a benchmarking utility, the numbers will vary. On read operations, it is wholly dependent on whether the desired data resides in cache or whether the module needs to retrieve it from the slower disk. On write operations, it depends on the Flash technology itself, and whether it needs to go through a typical programmable erase (PE) cycle to write the new data. Regardless of this, the random read/write access to the slower drive has been increased significantly:
  • Sequential read: 105MB/s
  • Sequential write: 50MB/s
  • Random read: 67MB/s
  • Random write: 51MB/s

You can continue monitoring the cache status by typing:

$ sudo dmsetup status cache 
0 20971520 cache stats: reads(301319), writes(353216), 
 ↪cache hits(24485, 0.3),replacement(345972), 
 ↪replaced dirty blocks(92857)

To remove the cache mapping, unmount the drive and invoke dmsetup:

$ sudo umount /mnt/cache
$ sudo dmsetup remove cache

FlashCache

FlashCache is a project developed and maintained by Facebook. It was inspired by dm-cache. Much like dm-cache, it too is built from the device-mapper framework. It currently is hosted on GitHub and can be cloned from there. The repository encompasses the kernel module and administration utilities. Once built and installed, load the kernel module and in a similar fashion to the previous examples, create a mapping of the SSD and HDD:

$ sudo modprobe flashcache
$ sudo flashcache_create -p back -b 8 cache /dev/sdb /dev/sda3
cachedev cache, ssd_devname /dev/sdb, disk_devname /dev/sda3 
 ↪cache mode WRITE_BACK block_size 8, md_block_size 8, 
 ↪cache_size 0
FlashCache metadata will use 223MB of your 3944MB main memory

The flashcache_create administration utility is similar to the dmc-setup.pl Perl script used for dm-cache. It is a wrapper utility designed to simplify the dmsetup process. As with the dm-cache module, once the mapping has been created, you can view mapping details by typing:

$ sudo dmsetup table cache
0 20971520 flashcache conf:
    ssd dev (/dev/sdb), disk dev (/dev/sda3) cache mode(WRITE_BACK)
    capacity(57018M), associativity(512), data block size(4K) 
     ↪metadata block size(4096b)
    skip sequential thresh(0K)
    total blocks(14596608), cached blocks(83), cache percent(0)
    dirty blocks(0), dirty percent(0)
    nr_queued(0)
Size Hist: 4096:83 
$ sudo dmsetup status cache
0 20971520 flashcache stats: 
    reads(83), writes(0)
    read hits(0), read hit percent(0)
    write hits(0) write hit percent(0)
    dirty write hits(0) dirty write hit percent(0)
    replacement(0), write replacement(0)
    write invalidates(0), read invalidates(0)
    pending enqueues(0), pending inval(0)
    metadata dirties(0), metadata cleans(0)
    metadata batch(0) metadata ssd writes(0)
    cleanings(0) fallow cleanings(0)
    no room(0) front merge(0) back merge(0)
    disk reads(83), disk writes(0) ssd reads(0) ssd writes(83)
    uncached reads(0), uncached writes(0), uncached IO requeue(0)
    disk read errors(0), disk write errors(0) ssd read errors(0) 
     ↪ssd write errors(0)
    uncached sequential reads(0), uncached sequential writes(0)
    pid_adds(0), pid_dels(0), pid_drops(0) pid_expiry(0)

Mount the mapping for file accessibility:

$ sudo mount /dev/mapper/cache /mnt/cache

Using the same benchmarking utility, observe the differences between FlashCache and the previous module:

  • Sequential read: 284MB/s
  • Sequential write: 72MB/s
  • Random read: 284MB/s
  • Random write: 71MB/s

The numbers look more like the native SSD performance. However, I want to note that this article is not intended to prove that one solution performs better than the other, but instead to enlighten readers of the many methods you can use to accelerate data access to existing and slower configurations.

To unmount and remove the drive mapping, type the following in the terminal:

$ sudo umount /mnt/cache
$ sudo dmsetup remove /dev/mapper/cache

RapidDisk and RapidCache

Currently at version 2.9, RapidDisk is an advanced Linux RAM disk whose features include the capabilities to allocate RAM dynamically as a block device, use it as standalone disk drives, or even map it as caching nodes to slower local disk drives via RapidCache (the latter of which was inspired by FlashCache and uses the device-mapper framework). RAM is being accessed to handle the data storage by allocating memory pages as they are needed. It is a volatile form of storage, so if power is removed or if the computer is rebooted, all data stored within RAM will not be preserved. This is why the RapidCache module was designed to handle only read-through/write-through caching, which means that whatever is intended to be written to the slower storage device will be cached to RapidCache and written immediately to the hard drive. And, if data is being requested from the hard drive and it does not pre-exist in the RapidCache node, it will read the data from the slower device and then cache it to the RapidCache node. This method will retain the same write performance as the hard drive, but significantly increase sequential and random access read performance to cached data.

Once the package, which consists of two kernel modules and an administration utility, is built and installed, you need to insert the modules by typing the following on the command line:

$ sudo modprobe rxdsk
$ sudo modprobe -r rxdsk

Let’s assume that you’re running on a computer that contains 4GB of RAM, and you confidently can say that at least 1GB of that RAM is never used by the operating system and its applications. Using RapidDisk to create a RAM drive of 1GB in size, you would type:

$ sudo rxadm --attach 1024

Remember, RapidDisk will not pre-allocate this storage. It will allocate RAM only as it is used.

A quick benchmark test of just the RAM drive produces some overwhelmingly fast results with 4KB I/O transfers:

  • Sequential read: 1.6GB/s
  • Sequential write: 1.6GB/s
  • Random read: 1.3GB/s
  • Random write: 1.1GB/s

It produces the following with 1MB I/O transfers:

  • Sequential read: 4.9GB/s
  • Sequential write: 4.3GB/s
  • Random read: 4.9GB/s
  • Random write: 4.0GB/s

Impressive, right? To utilize such a speedy RAM drive as a caching node to a slower drive, a mapping must be created, where /dev/rxd0 is the node used to access the RAM drive, and /dev/mapper/rxc0 is the node used to access the mapping of the two drives:

$ sudo rxadm --rxc-map rxd0 /dev/sda3 4

You can get a list of attached devices and mappings by typing:

$ sudo rxadm --list
rxadm 2.9
Copyright 2011-2013 Petros Koutoupis

List of rxdsk device(s):

 RapidDisk Device 1: rxd0
    Size: 1073741824

List of rxcache mapping(s):

 RapidCache Target 1: rxc0
0 20971519 rxcache conf:
    rxd dev (/dev/rxd0), disk dev (/dev/sda3) mode (WRITETHROUGH)
    capacity(1024M), associativity(512), block size(4K)
    total blocks(262144), cached blocks(0)
 Size Hist: 512:663 

As with the previous device-mapper-based solutions, you even can list detailed information of the mapping by typing:

$ sudo dmsetup table rxc0
0 20971519 rxcache conf:
    rxd dev (/dev/rxd0), disk dev (/dev/sda3) mode (WRITETHROUGH)
    capacity(1024M), associativity(512), block size(4K)
    total blocks(262144), cached blocks(0)
 Size Hist: 512:663 

$ sudo dmsetup status rxc0
0 20971519 rxcache stats: 
    reads(663), writes(0)
    cache hits(0) replacement(0), write replacement(0)
    read invalidates(0), write invalidates(0)
    uncached reads(663), uncached writes(0)
    disk reads(663), disk writes(0)
    cache reads(0), cache writes(0)

Format the mapping if needed and mount it:

$ sudo mount /dev/mapper/rxc0 /mnt/cache

A benchmark test produces the following results:

  • Sequential read: 794MB/s
  • Sequential write: 70MB/s
  • Random read: 901MB/s
  • Random write: 2MB/s

Notice that the write performance is not very great, and that’s because it is not meant to be. Write-through mode promises only faster read performance of cached data and consistent write performance to the original drive. The read performance, however, shows significant improvement when accessing cached data.

To remove the mapping and detach the RAM drive, type the following:

$ sudo umount /mnt/cache
$ sudo rxadm --rxc-unmap rxc0
$ sudo rxadm --detach rxd0

Other Solutions Worth Mentioning

bcache:

bcache is relatively new to the hard drive caching scene. It offers all the same features and functionalities as the previous solutions with the exception of its capability to map one or more SSDs as the cache for one or more HDDs instead of one volume to one volume. The project’s maintainer does, however, tout its superiority over the other solutions when it comes to data access performance from the cache. From what I can tell, bcache is unlike the previous solutions where it does not rely on the device-mapper framework and instead is a standalone module. At the time of this writing, it is set to be integrated into release 3.10 of the Linux kernel tree. Unfortunately, I haven’t had the opportunity or the appropriate setup to test bcache. As a result, I haven’t been able to dive any deeper into this solution and benchmark its performance.

EnhanceIO:

EnhanceIO is an SSD caching solution produced by STEC, Inc., and hosted on GitHub. It was greatly inspired by the work done by Facebook for FlashCache, and although it’s open-source, a commercial version is offered by the company for those seeking additional support. STEC did not simply modify a few lines of code of FlashCache and republish it. Instead, STEC rewrote the write-back caching logic while also improving other areas, such as memory footprint, failure handling and more. As with bcache, I haven’t had the opportunity to install and test EnhanceIO.

Summary

These solutions are intended to provide users with near SSD speeds and HDD capacities at a significantly reduced cost. From the data center to your home office, these solutions can be deployed almost anywhere. They also can be tuned to operate more appropriately in their intended environments. Some of them even offer a variety of caching algorithm options, such as Least Recently Used (LRU), Most Recently Used (MRU), hybrids of the two or just a simple first-in first-out (FIFO) caching scheme. The first three options can be expensive regarding performance, as they require the tracking of cached data sets for what has been accessed and how recently in order to determine whether to discard it. FIFO, however, functions as a circular buffer in which the oldest cached data set will be discarded first. With the exception of RapidCache, the SSD-focused modules also preserve metadata of the cache to ensure that any disruptions, including power cycles/outages, don’t compromise the integrity of the data.

Resources

dm-cache: http://visa.cs.fiu.edu/tiki/dm-cache

FlashCache: https://github.com/facebook/flashcache

EnhanceIO: https://github.com/stec-inc/EnhanceIO

bcache: http://bcache.evilpiepirate.org

RapidDisk: http://www.rapiddisk.org

FIO Git Repository: http://git.kernel.dk/?p=fio.git;a=summary

Wikipedia Page on Caching Algorithms:http://en.wikipedia.org/wiki/Cache_algorithms

 (source: LinuxJournal.com)

Time-Saving Tricks on the Command Line

I remember the first time a friend of mine introduced me to Linux and showed me how I didn’t need to type commands and path names fully—I could just start typing and use the Tab key to complete the rest. That was so cool. I think everybody loves Tab completion because it’s something you use pretty much every minute you spend in the shell. Over time, I discovered many more shortcuts and time-saving tricks, many of which I have come to use almost as frequently as Tab completion.

In this article, I highlight a set of tricks for common situations that make a huge difference for me:

  • Working in screen sessions: core features that will get you a long way.
  • Editing the command line: moving around quickly and editing quickly.
  • Viewing files or man pages using less.
  • E-mailing yourself relevant log snippets or alerts triggered by events.

While reading the article, it would be best to have a terminal window open so you can try using the tips right away. All the tips should work in Linux, UNIX and similar systems without any configuration.

Working in Screen Sessions

Screen has been covered in Linux Journal before (see Resources), but to put it simply, screen lets you have multiple “windows” within a single terminal application. The best part is that you can detach and reattach to a running screen session at any time, so you can continue your previous work exactly where you left off. This is most useful when working on a remote server.

Luckily, you really don’t need to master screen to benefit from it greatly. You already can enjoy its most useful benefits by using just a few key features, namely the following:

  • screen -R projectx: reattach to the screen session named “projectx” or create it fresh now.
  • Ctrl-a c: create a new window.
  • Ctrl-a n: switch to the next window.
  • Ctrl-a p: switch to the previous window.
  • Ctrl-a 0: switch to the first window; use Ctrl-a 1 for the second window, and so on.
  • Ctrl-a w: view the list of windows.
  • Ctrl-a d: detach from this screen session.
  • screen -ls: view the list of screen sessions.

Note: in the above list, “Ctrl-a c” means pressing the Ctrl and a keys at the same time, followed by c. Ctrl-a is called the command key, and all screen commands start with this key sequence.

Let me show all of these in the context of a realistic example: debugging a Django Web site on my remote hosting server, which usually involves the following activities:

  • Editing the configuration file.
  • Running some commands (performing Django operations).
  • Restarting the Web site.
  • Viewing the Web site logs.

Of course, I could do all these things one by one, but it’s a lot more practical to have multiple windows open for each. I could use multiple real terminal windows, but reopening them every time I need to do this kind of work would be tedious and slow. Screen can make this much faster and easier.

Starting Screen:

Before you start screen, it’s good to navigate to the directory where you expect to do most of your work first. This is because new windows within screen will all start in that directory. In my example, I first navigate to my Django project’s directory, so that when I open new screen windows, the relevant files will be right there in front of me.

There are different ways of starting screen, but I recommend this one:


screen -R mysite

 

When you run this the first time, it creates a screen session named “mysite”. Later you can use this same command to reconnect to this session again. (The -R flag stands for reattach.)

Creating Windows:

Now that I’m in screen, let’s say I start editing the configuration of the Django Web site:


vim mysite/settings.py

 

Let’s say I made some changes, and now I want to restart the site. I could exit vim or put it in the background in order to run the command to restart the site, but I anticipate I will need to make further changes right here. It’s easier just to create a new window now, using the screen command Ctrl-a c.

It’s easy to create another window every time you start doing something different from your current activity. This is especially useful when you need to change the directory between commands. For example, if you have script files in /some/long/path/scripts and log files in /other/long/path/logs, then instead of jumping between directories, just keep a separate window for each.

In this example, first I started looking at the configuration files. Next, I wanted to restart the Web site. Then I wanted to run some Django commands, and then I wanted to look at the logs. All these are activities I tend to do many times per debugging session, so it makes sense to create a separate window for each activity.

The cost of creating a new window is so small, you can do it without thinking. Don’t interrupt your current activity; fire up another window with Ctrl-a c and rock on.

Switching between Windows:

The windows you create in screen are numbered starting from zero. You can switch to a window by its number—for example, jump to the first window with Ctrl-a 0, the second window with Ctrl-a 1 and so on. It’s also very convenient to switch to the next and previous windows with Ctrl-a n and Ctrl-a p, respectively.

Listing Your Windows:

If you’re starting to lose track of which window you are in, check the list of windows with Ctrl-a w or Ctrl-a “. The former shows the list of windows in the status line (at the bottom) of the screen, showing the current window marked with a *. The latter shows the list of windows in a more user-friendly format as a menu.

Detaching from and Reattaching to a Session:

The best time-saving feature of screen is reattaching to existing sessions. You can detach cleanly from the current screen session with Ctrl-a d. But you don’t really need to. You could just as well simply close the terminal window.

The great thing about screen sessions is that whatever way you disconnected from them, you can reattach later. At the end of the day, you can shut down your local PC without closing a remote screen session and come back to it the next day by running the same command you used to start it, as in this example with screen -R mysite.

You might have multiple screen sessions running for different purposes. You can list them all with:


screen -ls

 

If you are disconnected from screen abruptly, sometimes it may think you are still in an attached state, which will prevent you from reattaching with the usual command screen -R label. In that case, you can append a -D flag to force detach from any existing connections—for example:


screen -R label -D

 

Learning More about Screen:

If you want to learn more, see the man page and the links in the Resources section. The built-in cheat sheet of shortcuts also comes handy, and you can view it with Ctrl-a ?.

I also should mention one of screen’s competitor: tmux. I chose screen in this article because in my experience, it is more available in systems I cannot control. You can do everything I covered above with tmux as well. Use whichever is available in the remote system in which you find yourself.

Finally, you can get the most out of screen when working on a remote system—for example, over an SSH session. When working locally, it’s probably more practical to use a terminal application with tabs. That’s not exactly the same thing, but probably close enough.

Editing the Command Line

Many highly practical shortcuts can make you faster and more efficient on the command line in different ways:

  • Find and re-run or edit a long and complex command from the history.
  • Edit much more quickly than just using the backspace key and retyping text.
  • Move around much faster than just using the left- and right-arrow keys.

Finding a Command in the History:

If you want to repeat a command you executed recently, it may be easy enough just to press the up-arrow key a few times until you find it. If the command was more than only a few steps ago though, this becomes unwieldy. Very often, it’s much more practical to use the Ctrl-r shortcut instead to find a specific command by a fragment.

To search for a command in the past, press Ctrl-r and start typing any fragment you remember from it. As you type, the most recent matching line will appear on the command line. This is an incremental search, which means you can keep typing or deleting letters, and the matched command will change dynamically.

Let’s try this with an example. Say I ran these commands yesterday, which means they are still in my recent history but too far away simply to use the up arrow:


...
cd ~/dev/git/github/bashoneliners/
. ~/virtualenv/bashoneliners/bin/activate
./run.sh pip install --upgrade django
git push beta master:beta
git push release master:release
git status
...

 

Let’s say I want to activate the virtualenv again. That’s a hassle to type again, because I have to type at least a few characters at each path segment, even with Tab completion. Instead, it’s a lot easier to press Ctrl-r and start typing “activate”.

For a slightly more complex example, let’s say I want to run a git push command again, but I don’t remember exactly which one. So I press Ctrl-r and start typing “push”. This will match the most recent command, but I actually want the one before that, and I don’t remember a better fragment to type. The solution is to press Ctrl-r again, in the middle of my current search, as that jumps to the next matching command.

This is really extremely useful, saving not only the time of typing, but also often the time of thinking too. Imagine one of those long one-liners where you processed a text file through a long sequence of pipes with sed, awk, Perl and whatnot; or an rsync command with many flags, filters and exclusions; or complex loops using “for” and “while”. You can bring those back to your command line quickly using Ctrl-r and some fragment you remember from them.

Here are a few other things to note:

  • The search is case-sensitive.
  • You can abort the search with Ctrl-c.
  • To edit the line before running it, press any of the arrow keys.

This trick can be even more useful if you pick up some new habits. For example, when referring to a path you use often, type the absolute path rather than a relative path. That way, the command will be reusable later from any directory.

Moving Around Quickly and Editing Quickly:

Basic editing on the command line involves moving around with the arrow keys and deleting characters with Backspace or Delete. When there are more than only a few characters to move or delete, using these basic keys is just too slow. You can do the same much faster by knowing just a handful of interesting shortcuts:

  • Ctrl-w: cut text backward until space.
  • Esc-Backspace: cut one word backward.
  • Esc-Delete: cut one word forward.
  • Ctrl-k: cut from current position until the end of the line.
  • Ctrl-y: paste the most recently cut text.

Not only is it faster to delete portions of a line chunk by chunk like this, but an added bonus is that text deleted this way is saved in a register so that you can paste it later if needed. Take, for example, the following sequence of commands:


git init --bare /path/to/repo.git
git remote add origin /path/to/repo.git

 

Notice that the second command uses the same path at the end. Instead of typing that path twice, you could copy and paste it from the first command, using this sequence of keystrokes:

  1. Press the up arrow to bring back the previous command.
  2. Press Ctrl-w to cut the path part: “/path/to/repo.git”.
  3. Press Ctrl-c to cancel the current command.
  4. Type git remote add origin, and press Ctrl-y to paste the path.

Some of the editing shortcuts are more useful in combination with moving shortcuts:

  • Ctrl-a: jump to the beginning of the line.
  • Ctrl-e: jump to the end of the line.
  • Esc-b: jump one word backward.
  • Esc-f: jump one word forward.

Jumping to the beginning is very useful if you mistype the first words of a long command. You can jump to the beginning much faster than with the left-arrow key.

Jumping forward and backward is very practical when editing the middle part of a long command, such as the middle of long path segments.

Putting It All Together:

A good starting point for learning these little tricks is to stop some old inefficient habits:

  • Don’t clear the command line with the Backspace key. Use Ctrl-c instead.
  • Don’t delete long arguments with the Backspace key. Use Ctrl-w instead.
  • Don’t move to the beginning or the end of the line using the left- and right-arrow keys. Jump with Ctrl-a and Ctrl-e instead.
  • Don’t move over long terms using the arrow keys. Jump over terms with Esc-b and Esc-f instead.
  • Don’t press the up arrow 20 times to find a not-so-recent previous command. Jump to it directly with Ctrl-r instead.
  • Don’t type anything twice on the same line. Copy it once with Ctrl-w, and reuse it many times with Ctrl-y instead.

Once you get the hang of it, you will start to see more and more situations where you can combine these shortcuts in interesting ways and minimize your typing.

Learning More about Command-Line Editing:

If you want to learn more, see the bash man page and search for “READLINE”, “Commands for Moving” and “Commands for Changing Text”.

Viewing Files or man Pages with less

The less command is a very handy tool for viewing files, and it’s the default application for viewing man pages in many modern systems. It has many highly practical shortcuts that can make you faster and more efficient in different ways:

  • Searching forward and backward.
  • Moving around quickly.
  • Placing markers and jumping to markers.

Searching Forward and Backward:

You can search forward for some text by typing / followed by the pattern to search for. To search backward, use ? instead of /. The search pattern can be a basic regular expression. If your terminal supports it, the search results are highlighted with inverted foreground and background colors.

You can jump to the next result by pressing n, and to the previous result by pressing N. The direction of next and previous is relative to the direction of the search itself. That is, when searching forward with /, pressing n will move you forward in the file, and when searching backward with ?, pressing n will move you backward in the file.

If you use the vim editor, you should feel right at home, as these shortcuts work the same way as in vim.

Searching is case-sensitive by default, unless you specify the -i flag when starting less. When reading a file, you can toggle between case-sensitive and insensitive modes by typing -i.

Moving Around Quickly:

Here are a couple shortcuts to help you move around quickly:

  • g: jump to the beginning of the file.
  • G: jump to the end of the file.
  • space: move forward by one window.
  • b: move backward by one window.
  • d: move down by a half-window.
  • u: move up by a half-window.

Using Markers:

Markers are extremely useful in situations when you need to jump between two or more different parts within the same file repeatedly.

For example, let’s say you are viewing a server log with initialization information near the beginning of the file and some errors somewhere in the middle. You need to switch between the two parts while trying to figure out what’s going on, but using search repeatedly to find the relevant parts is very inconvenient.

A good solution is to place markers at the two locations so you can jump to them directly. Markers work similarly as in the vim editor: you can mark the current position by pressing m followed by a lowercase letter, and you can jump to a marker by pressing ‘ followed by the same letter. In this example, I would mark the initialization part with mi and the part with the error with me, so that I could jump to them easily with ‘i and ‘e. I chose the letters as the initials of what the locations represent, so I can remember them easily.

Learning More Shortcuts:

If you are interested in more, see the man page for the less command. The built-in cheat sheet of shortcuts also comes handy, and you can view it by pressing h.

E-mailing Yourself

When working on a remote server, getting data back to your PC can be inconvenient sometimes—for example, when your PC is NAT-ed and the server cannot connect to it directly with rsync or scp. A quick alternative might be sending data by e-mail instead.

Another good scenario for e-mailing yourself is to use alerts triggered by something you were waiting for, such as a crashed server coming back on-line or other particular system events.

E-mailing a Log Snippet:

Let’s say you found the log of errors crashing your remote service, and you would like to copy it to your PC quickly. Let’s further assume the relevant log spans multiple pages, so it would be inconvenient to copy and paste from the terminal window. Let’s say you can extract the relevant part using a combination of theheadtail and grep commands. You could save the log snippet in a file and run rsync on your local PC to copy it, or you could just mail it to yourself by simply piping it to this command:


mailx -s 'error logs' me@example.com

 

Depending on your system, the mailx command might be different, but the parameters are probably the same: -s specifies the subject (optional), the remaining arguments are the destination e-mail addresses, and the standard input is used as the message body.

Triggering an E-mail Alert after a Long Task

When you run a long task, such as copying a large file, it can be annoying to wait and keep checking whether it has finished. It’s better to arrange to trigger an e-mail to yourself when the copying is complete—for example:


the_long_task; date | mailx -s 'job done' me@example.com

 

That is, when the long task has completed, the e-mail command will run. In this example, the message body simply will be the output of the date command. In a real situation, you probably will want to use something more interesting and relevant as the message—for example ls -lh on the file that was copied or even multiple commands grouped together like this:


the_long_task; { df -h; tail some.log; } | \
    mailx -s 'job done' me@example.com

 

Triggering an E-mail Alert by Any Kind of Event:

Have you ever been in one of the following situations?

  • You are waiting for crashed serverX to come back on-line.
  • You are tailing a server log, waiting for a user to test your new evolution, which will trigger a particular entry in the log.
  • You are waiting for another team to deploy an updated .jar file.

Instead of staring at the screen or checking repeatedly whether the event you are waiting for has happened, you could use this kind of one-liner:


while :; do date; CONDITION && break; sleep 300; \
done; MAILME

 

This is essentially an infinite loop, with an appropriate CONDITIONin the middle to exit the loop and, thus, trigger the e-mail command. Inside the loop, I print the date, just so that I can see the loop is alive, and sleep for five minutes (300 seconds) in each cycle to avoid overloading the machine I’m on.

CONDITION can be any shell command, and its exit code will determine whether the loop should exit. For the situations outlined above, you could write the CONDITION like this:

  • ping -c1 serverX: emit a single ping to serverX. If it responds, ping will exit with success, ending the loop.
  • grep pattern /path/to/log: search for the expected pattern in the log. If the pattern is found, grep will exit with success, ending the loop.
  • find /path/to/jar -newer /path/to/jar.marker: this assumes that before starting the infinite loop, you created a marker file like this: touch -r /path/to/jar /path/to/jar.marker in order to save a copy of the exact same timestamp as the .jar file you want to monitor. The findcommand will exit with success after the jar file has been updated.

In short, don’t wait for a long-running task or some external event. Set up an infinite loop, and alert yourself by e-mail when there is something interesting to see.

Conclusion

All the tips in this article are standard features and should work in on Linux, UNIX and similar systems. I have barely scratched the surface here, highlighting the minimal set of features in each area that should provide the biggest bang for your buck. Once you get used to using them, these little tricks will make you a real ninja in the shell, jumping around and getting things done lightning fast with minimal typing.

Resources

Type man screen.

Type man bash, and search for “READLINE”, “Commands for Moving” and “Commands for Changing Text”.

Type man less.

(via Linuxjournal.com)