ejabberd Massive Scalability: 1 Node — 2+ Million Concurrent Users

How Far Can You Push ejabberd?

From our experience, we all get the idea that ejabberd is massively scalable. However, we wanted to provide benchmark results and hard figures to demonstrate our outstanding performance level and give a baseline about what to expect in simple cases.

That’s how we ended up with the challenge of fitting a very large number of concurrent users on a single ejabberd node.

It turns out you can get very far with ejabberd.

Scenario and Platforms

Here is our benchmark scenario: Target was to reach 2,000,000 concurrent users, each with 18 contacts on the roster and a session lasting around 1h. The scenario involves 2.2M registered users, so almost all contacts are online at the peak load. It means that presence packets were broadcast for those users, so there was some traffic as an addition to packets handling users connections and managing sessions. In that situation, the scenario produced 550 connections/second and thus 550 logins per second.

Database for authentication and roster storage was MySQL, running on the same node as ejabberd.

For the benchmark itself, we used Tsung, a tool dedicated to generating large loads to test servers performance. We used a single large instance to generate the load.

Both ejabberd and the test platform were running on Amazon EC2 instances. ejabberd was running on a single node of instance type m4.10xlarge (40 vCPU, 160 GiB). Tsung instance was identical.

Regarding ejabberd software itself, the test was made with ejabberd Community Server version 16.01. This is the standard open source version that is widely available and widely used across the world.

The connections were not using TLS to make sure we were focusing on testing ejabberd itself and not openSSL performance.

Code snippets and comments regarding the Tsung scenario are available for download: tsung_snippets.md

Overall Benchmark Results

ejabberd_hs4

We managed to surpass the target and we support more than2 million concurrent users on a single ejabberd.

For XMPP servers, the main limitation to handle a massive number of online users is usually memory consumption. With proper tuning, we managed to handle the traffic with a memory footprint of 28KB per online user.

The 40 CPUs were almost evenly used, with the exception of the first core that was handling all the network interruptions. It was more loaded by the Operating System and thus less loaded by the Erlang VM.

In the process, we also optimized our XML parser, released now as Fast XML, a high-performance, memory efficient Expat-based Erlang and Elixir XML parser.

Detailed Results

ejabberd Performance

ejabberd_performance-2-1024x640

Benchmark shows that we reached 2 million concurrent users after one hour. We were logging in about 33k users per minute, producing session traffic of a bit more than 210k XMPP packets per minute (this includes the stanzas to do the SASL authentication, binding, roster retrieval, etc). Maximum number of concurrent users is reached shortly after the 2 million concurrent users mark, by design in the scenario. At this point, we still connect new users but, as the first users start disconnecting, the number of concurrent users gets stable.

As we try to reproduce common client behavior we setup Tsung to send “keepalive pings” on the connections. Since each session sends one of such whitespace pings each minute, the number of such requests grows proportionately with the number of connected users. And while idle connections consume few resources on the server, it is important to note that in this scale they start to be noticeable. Once you have 2M users, you will be handling 33K/sec of such pings just from idle connections. They are not represented on the graphs, but are an important part of the real life traffic we were generating.

ejabberd Health

ejabberd_health-1024x640

At all time, ejabberd health was fine. Typically, when ejabberd is overloaded, TCP connection establishment time and authentication tend to grow to an unacceptable level. In our case, both of those operations performed very fast during all bench, in under 10 milliseconds. There was almost no errors (the rare occurrences are artefacts of the benchmark process).

Platform Behavior

ejabberd_platform-1-1024x640

Good health and performance are confirmed by the state of the platform. CPU and memory consumption were totally under control, as shown in the graph. CPU consumption stays far from system limits. Memory grows proportionally to the number of concurrent users.

We also need to mention that values for CPUs are slightly overestimated as seen by the OS, as Erlang schedulers stay a bit of busy waiting when running out of work.

Challenge: The Hardest Part

The hardest part is definitely tuning the Linux system for ejabberd and for the benchmark tool, to overcome the default limitations. By default, Linux servers are not configured to allow you to handle, nor even generate, 2 million TCP sockets. It required quite a bit of network setup not to have problems with exhausted ports on the Tsung side.

On a similar topic, we worked with the Amazon server team, as we have been pushing the limits of their infrastructure like no one before. For example, we had to use a second Ethernet adapter with multiple IP addresses (2 x 15 IP, spread across 2 NICs). It also helped a lot to use latest Enhanced Networking drivers from Intel.

All in all, it was a very interesting process that helped make progress on Amazon Web Services by testing and tuning the platform itself.

What’s Next?

This benchmark was intended to demonstrate that ejabberd can scale to a large volume and serve as a baseline reference for more complex and full-featured platforms.

Next step is to keep on going with our benchmark and optimization iteration work. Our next target is to benchmark Multi-User Chat performance and Pubsub performance. The goal is to find the limits, optimize and demonstrate that massive internet scale can be done with these ejabberd components as well.

(via Blog.process-one.net)

Google Launches Cloud CDN Alpha

Earlier this month, Google announced an Alpha Cloud Content Delivery Network (CDN) offering. The service aims to provide static content closer to end users by caching content in Google’s globally distributed edge caches.  Google provides many more edge caches than it does data centers and as a result content can be provided quicker than making full round trip requests to a Google data center. In total, Google provides over 70 Edge points of presence which will help address customer CDN needs.

In order to use Cloud CDN, you must use Google’s Compute Engine HTTP(s) load balancers on your instances. Enabling an HTTP(S) load balancer is achieved through a simple command.

In a recent post, Google explains the mechanics of the service in the following way: “When a user requests content from your site, that request passes through network locations at the edges of Google’s network, usually far closer to the user than your actual instances. The first time that content is requested, the edge cache sees that it can’t fulfill the request and forwards the request on to your instances. Your instances respond back to the edge cache, and the cache immediately forwards the content to the user while also storing it for future requests. For subsequent requests for the same content that pass through the same edge cache, the cache responds directly to the user, shortening the round trip time and saving your instances the overhead of processing the request.”

The following image illustrates how Google leverages Edge point of presence caches to improve responsiveness.

Architecture

Image Source: https://cloud.google.com/compute/docs/load-balancing/http/cdn

Once the CDN service has been enabled, caching will automatically occur for all cacheable content. Cacheable content is typically defined by requests made through an HTTP GET request.  The service will respect explicit Cache-Control headers taking into account for expiration or make age headers.  Some responses will not be cached including ones that include Set-Cookie headers, message bodies that exceed 4 mb in size or where caching has been explicitly disabled through no-cache directives.  A complete list of cached rules can be found in Google’s documentation.

Google has traditionally partnered with 3rd parties in order to speed up the delivery of content to consumers.  These partnerships include Akamai, Cloudflare, Fastly, Level 3 Communications and Highwinds.

Other cloud providers also have CDN offerings including Amazon’s CloudFront and Microsoft’s Azure CDN.  Google will also see competition from Akamai, one of the aforementioned partners, who has approximately 16.3% CDN market share of the Alexa top 1 million sites.
(via InfoQ.com)

How Wistia Handles Millions Of Requests Per Hour And Processes Rich Video Analytics

This is a guest repost from Christophe Limpalair of his interview with Max Schnur, Web Developer at  Wistia.

Wistia is video hosting for business. They offer video analytics like heatmaps, and they give you the ability to add calls to action, for example. I was really interested in learning how all the different components work and how they’re able to stream so much video content, so that’s what this episode focuses on.

What Does Wistia’s Stack Look Like?

As you will see, Wistia is made up of different parts. Here are some of the technologies powering these different parts:

What Scale Are You Running At?

Wistia has three main parts to it:

  1. The Wistia app (The ‘Hub’ where users log in and interact with the app)
  2. The Distillery (stats processing)
  3. The Bakery (transcoding and serving)

Here are some of their stats:

  • 1.5 million player loads per hour (loading a page with a player on it counts as one. Two players counts as two, etc…)
  • 18.8 million peak requests per hours to their Fastly CDN
  • 740,000 app requests per hour
  • 12,500 videos transcoded per hour
  • 150,000 plays per hour
  • 8 million stats pings per hour

They are running on Rackspace.

What Challenges Come From Receiving Videos, Processing Them, Then Serving Them?

  1. They want to balance quality and deliverability, which has two sides to it:
    1. Encoding derivatives of the original video
    2. Knowing when to play which derivative

     

    Derivatives, in this context, are different versions of a video. Having different quality versions is important, because it allows you to have smaller file sizes. These smaller video files can be served to users with less bandwidth. Otherwise, they would have to constantly buffer. Have enough bandwidth? Great! You can enjoy higher quality (larger) files.

    Having these different versions and knowing when to play which version is crucial for a good user experience.

  2. Lots of I/O. When you have users uploading videos at the same time, you end up having to move many heavy files across clusters. A lot of things can go wrong here.
  3. Volatility. There is volatility in both the number of requests and the number of videos to process. They need to be able to sustain these changes.
  4. Of course, serving videos is also a major challenge. Thankfully, CDNs have different kinds of configurations to help with this.
  5. On the client side, there are a ton of different browsers, device sizes, etc… If you’ve ever had to deal with making websites responsive or work on older IE versions, you know exactly what we’re talking about here.

How Do You Handle Big Changes In The Number Of Video Uploads?

They have boxes which they call Primes. These Prime boxes are in charge of receiving files from user uploads.

If uploads start eating up all the available disk space, they can just spin up new Primes using Chef recipes. It’s a manual job right now, but they don’t usually get anywhere close to the limit. They have lots of space.

22616924913_140f904cba

What Transcoding System Do You Use?

The transcoding part is what they call the Bakery.

The Bakery is comprised of Primes we just looked at, which receive and serve files. They also have a cluster of workers. Workers process tasks and create derivative files from uploaded videos.

Beefy systems are required for this part, because it’s very resource intensive. How beefy?

We’re talking about several hundred workers. Each worker usually performs two tasks at one time. They all have 8 GB of RAM and a pretty powerful CPU.

What kinds of tasks and processing goes on with workers? They encode video primarily in x264 which is a fast encoding scheme. Videos can usually be encoded in about half or a third of the video length.

Videos must also be resized and they need different bitrate versions.

In addition, there are different encoding profiles for different devices–like HLS for iPhones. These encodings need to be doubled for Flash derivatives that are also x264 encoded.

What Does The Whole Process Of Uploading A Video And Transcoding It Look Like?

Once a user uploads a video, it is queued and sent to workers for transcoding, and then slowly transferred over to S3.

Instead of sending a video to S3 right away, it is pushed to Primes in the clusters so that customers can serve videos right away. Then, over a matter of a few hours, the file is pushed to S3 for permanent storage and cleared from Prime to make space.

How Do You Serve The Video To A Customer After It Is Processed?

When you’re hitting a video hosted on Wistia, you make a request to embed.wistia.com/video-url, which is actually hitting a CDN (they use a few different ones. I just tried it and mine went through Akamai). That CDN goes back to Origin, which is their Primes we just talked about.

Primes run HAProxy with a special balancer called Breadroute. Breadroute is a routing layer that sits in front of Primes and balances traffic.

Wistia might have the file stored locally in the cluster (which would be faster to serve), and Breadroute is smart enough to know that. If that is the case, then Nginx will serve the file directly from the file system. Otherwise, they proxy S3.

How Can You Tell Which Video Quality To Load?

That’s primarily decided on the client side.

Wistia will receive data about the user’s bandwidth immediately after they hit play. Before the user even hits play, Wistia receives data about the device, the size of the video embed, and other heuristics to determine the best asset for that particular request.

The decision of whether to go HD or not is only debated when the user goes into full screen. This gives Wistia a chance to not interrupt playback when a user first clicks play.

How Do You Know When Videos Don’t Load? How Do You Monitor That?

They have a service they call pipedream, which they use within their app and within embed codes to constantly send data back.

If a user clicks play, they get information about the size of their window, where it was, and if it buffers (if it’s in a playing state but the time hasn’t changed after a few seconds).

A known problem with videos is slow load times. Wistia wanted to know when a video was slow to load, so they tried tracking metrics for that. Unfortunately, they only knew if it was slow if the user actually waited around for the load to finish. If users have really slow connections, they might just leave before that happens.

The answer to this problem? Heartbeats. If they receive heartbeats and they never receive a play, then the user probably bailed.

What Other Analytics Do You Collect?

They also collect analytics for customers.

Analytics like play, pause, seeks, conversions (entering an email, or clicking a call to action). Some of these are shown in heatmaps. They also aggregate those heatmaps into graphs that show engagement, like the areas that people watched and rewatched.

23218035576_3bd30bb1da

How Do You Get Such Detailed Data?

When the video is playing, they use a video tracker which is an object bound to plays, pauses, and seeks. This tracker collects events into a data structure. Once every 10-20 seconds, it pings back to the Distillery which in turn figures out how to turn data into heatmaps.

Why doesn’t everyone do this? I asked him that, because I don’t even get this kind of data from YouTube. The reason, Max says, is because it’s pretty intensive. The Distillery processes a ton of stuff and has a huge database.

What Is Scaling Ease Of Use?

Wistia has about 60 employees. Once you start scaling up customer base, it can be really hard to make sure you keep having good customer support.

Their highest touch points are playback and embeds. These have a lot of factors that are out of their control, so they must make a choice:

  1. Don’t help customers
  2. Help customers, but it takes a lot of time

Those options weren’t good enough. Instead, they went after the source that caused the most customer support issues–embeds.

Embeds have gone through multiple iterations. Why? Because they often broke. Whether it was WordPress doing something weird to an embed, or Markdown breaking the embed, Wistia has had a lot of issues with this over time. To solve this problem, they ended up dramatically simplifying embed codes.

These simpler embed codes run into less problems on the customer’s end. While it does add more complexity behind the scenes, it results in fewer support tickets.

This is what Max means by scaling ease of use. Make it easier for customers, so that they don’t have to contact customer support as often. Even if that means more engineering challenges, it’s worth it to them.

Another example of scaling ease of use is with playbacks. Max is really interested in implementing a kind of client-side CDN load balancer that determines the lowest latency server to serve content from (similar to what Netflix does).

What Projects Are You Working On Right Now?

Something Max is planning on starting soon is what they’re calling the upload server. This upload server is going to give them the ability to do a lot of cool stuff around uploads.

As we talked about, and with most upload services, you have to wait for the file to get to a server before you can start transcoding and doing things with it.

The upload server will make it possible to transcode while uploading. This means customers could start playing their video before it’s even completely uploaded to their system. They could also get the still capture and embed almost immediately

Conclusion

I have to try and fit as much information in this summary as possible without making it too long. This means I cut out some information that could really help you out one day. I highly recommend you view the full interview if you found this interesting, as there are more hidden gems.

You can watch it, read it, or even listen to it. There’s also an option to download the MP3 so you can listen on your way to work. Of course, the show is also on iTunes and Stitcher.

Thanks for reading, and please let me know if you have any feedback!
– Christophe Limpalair (@ScaleYourCode)

(via HighScalability.com)

Intel will ship Xeon Phi-equipped workstations starting in 2016

Intel has announced that its second-generation Xeon Phi hardware (codenamed Knights Landing) is now shipping to early customers. Knights Landing is built on 14nm process technology, with up to 72 Silvermont-derived CPU cores. While the design is derived from Atom, there are some obvious differences between these cores and the chips Intel uses in consumer hardware. Traditional Atom doesn’t support Hyper-Threading on the consumer side, while Knights Landing supports four threads per core. Knights Landing also supports AVX-512 extensions.

Overview-640x360

The new Xeon Phi runs at roughly 1.3GHz and is split into tiles, as Anandtech reports. There are two cores (eight threads) per tile along with two VPUs (Vector Proc essing Units, aka AVX-512 units). Each tile shares 1MB of L2 cache (36MB cache total). Unlike first-gen Xeon Phi, Knights Landing can actually run the OS natively and out-of-order performance is supposedly much improved compared to the P54C-derived chips that powered Knights Ferry. The chip includes ten memory controllers — two for DDR4 (six channels total) and eight MCDRAM controllers for a total of 16GB of on-chip memory and six channels of DDR4-2400 (up to 386GB total, according to Anandtech.).

HBM-Modes-640x363

Memory accesses can be mapped in different ways, depending on which model best suits the target workload. The 72 CPU cores can treat the entire MCDRAM space as a giant cache, but if they do, accessing main memory incurs a greater penalty in the event of a cache miss. Alternately, data can be flat mapped to both the DDR4 and MCDRAM and accessed that way. Finally, some MCDRAM can be mapped as a cache (with a higher-latency DDR4 fallback) while other MCDRAM is mapped as main memory with less overall latency. The card connects to the rest of the system via 36 PCIe 3.0 lanes. That’s 36GB/s of memory bandwidth in each direction (72GB/s of bandwidth in total) assuming that all 36 lanes can be dedicated to a single co-processor.

Knights_Landing-640x360

The overall image Intel is painting is that of a serious computing powerhouse, with far more horsepower than the previous generation. According to the company, at least some Xeon Phi workstations are going to ship next year. Intel will target researchers who want to work on Xeon Phi but don’t have access to a supercomputer for testing their software. With 3TFLOPS of double precision floating point performance, Xeon Phi can lay fair claim to the title of “Supercomputer on a PCB.” 3TFLOPs might not seem like much compared to the modern TOP500, but it’s more than enough to evaluate test cases and optimizations.

Intel has no plans to offer Xeon Phi in wide release (at least not right now), but if this program proves successful, we could see a limited run of smaller Xeon Phi coprocessors for application acceleration in other contexts. In theory, any well-parallelized workload that can run on x86 should perform well on Xeon Phi, and while we don’t see Intel making a return to the graphics market, it would be interesting to see the chip deployed as a rendering accelerator.

As far as comparisons to Nvidia are concerned, the only Nvidia Tesla that comes close to 3TFLOPS is the dual-GPU K80 GPU compute module. It’s not clear if that solution can match a single Xeon Phi, given that the Nvidia Tesla is scaling across two discrete chips. Future Nvidia products based on Pascal are expected to pack up to 32GB of on-board memory and should substantially improve the relative performance between the two, but we don’t know when that hardware will hit the market.

(Via Extremetech.com)

Why unikernels might kill containers in five years

Sinclair Schuller is the CEO and cofounder of Apprenda, a leader in enterprise Platform as a Service.

Container technologies have received explosive attention in the past year – and rightfully so. Projects like Docker and CoreOS have done a fantastic job at popularizing operating system features that have existed for years by making those features more accessible.

Containers make it easy to package and distribute applications, which has become especially important in cloud-based infrastructure models. Being slimmer than their virtual machine predecessors, containers also offer faster start times and maintain reasonable isolation, ensuring that one application shares infrastructure with another application safely. Containers are also optimized for running many applications on single operating system instances in a safe and compatible way.

So what’s the problem?
Traditional operating systems are monolithic and bulky, even when slimmed down. If you look at the size of a container instance – hundreds of megabytes, if not gigabytes, in size – it becomes obvious there is much more in the instance than just the application being hosted. Having a copy of the OS means that all of that OS’ services and subsystems, whether they are necessary or not, come along for the ride. This massive bulk conflicts with trends in broader cloud market, namely the trend toward microservices, the need for improved security, and the requirement that everything operate as fast as possible.

Containers’ dependence on traditional OSes could be their demise, leading to the rise of unikernels. Rather than needing an OS to host an application, the unikernel approach allows developers to select just the OS services from a set of libraries that their application needs in order to function. Those libraries are then compiled directly into the application, and the result is the unikernel itself.

The unikernel model removes the need for an OS altogether, allowing the application to run directly on a hypervisor or server hardware. It’s a model where there is no software stack at all. Just the app.

There are a number of extremely important advantages for unikernels:

  1. Size – Unlike virtual machines or containers, a unikernel carries with it only what it needs to run that single application. While containers are smaller than VMs, they’re still sizeable, especially if one doesn’t take care of the underlying OS image. Applications that may have had an 800MB image size could easily come in under 50MB. This means moving application payloads across networks becomes very practical. In an era where clouds charge for data ingress and egress, this could not only save time, but also real money.
  2. Speed – Unikernels boot fast. Recent implementations have unikernel instances booting in under 20 milliseconds, meaning a unikernel instance can be started inline to a network request and serve the request immediately. MirageOS, a project led byAnil Madhavapeddy, is working on a new tool named Jitsuthat allows clouds to quickly spin unikernels up and down.
  3. Security – A big factor in system security is reducing surface area and complexity, ensuring there aren’t too many ways to attack and compromise the system. Given that unikernels compile only which is necessary into the applications, the surface area is very small. Additionally, unikernels tend to be “immutable,” meaning that once built, the only way to change it is to rebuild it. No patches or untrackable changes.
  4. Compatibility – Although most unikernel designs have been focused on new applications or code written for specific stacks that are capable of compiling to this model, technology such as Rump Kernels offer the ability to run existing applications as a unikernel. Rump kernels work by componentizing various subsystems and drivers of an OS, and allowing them to be compiled into the app itself.

These four qualities align nicely with the development trend toward microservices, making discrete, portable application instances with breakneck performance a reality. Technologies like Docker and CoreOS have done fantastic work to modernize how we consume infrastructure so microservices can become a reality. However, these services will need to change and evolve to survive the rise of unikernels.

The power and simplicity of unikernels will have a profound impact during the next five years, which at a minimum will complement what we currently call a container, and at a maximum, replace containers altogether. I hope the container industry is ready.
(Via Gigaom.com)

Concurrency in Erlang & Scala: The Actor Model

Applications are becoming increasingly concurrent, yet the traditional way of doing this, threads and locks, is very troublesome. This article highlights Erlang and Scala, two programming languages that use a different approach to concurrency: the actor model.

The continuous increase of computer processor clock rate has recently slowed down, in part due to problems with heat dissipation. As the clock rate of a processor is increased, the heat generated increases too. Increasing current clock rates even further is troublesome, as the heat generated would push computer chips towards the limits of what is physically possible. Instead, processor manufacturers have turned towards multi-core processors, which are processors capable of doing multiple calculations in parallel. There has been an increased interest in software engineering techniques to fully utilize the capabilities offered by these processors.

At the same time, applications are more frequently built in a distributed way. In order to perform efficiently, without wasting most of the time on waiting for I/O operations, applications are forced to do more and more operations concurrently, to obtain maximum efficiency.

However, reasoning about concurrent systems can be far from trivial, as there can be a large number of interacting processes, with no predefined execution order. This paper discusses the concurrency model in Erlang and Scala, two languages that have recently gained in popularity, in part due to their support for scalable concurrency. This first section introduces both languages, by describing their key features.

Erlang

Erlang is a concurrent programming language, originally designed by Ericsson (Erlang is named after A. K. Erlang, a Danish mathematician and Ericsson:Ericsson Language). It was designed to be distributed and fault-tolerant, for use in highly-available (non-stop) soft real-time telecom applications.

Erlang is a pure functional language, it features single assignment and eager evaluation. It also has built-in language constructs for distribution and concurrency. Erlang has a dynamic type system, where the typing of expressions is checked at run-time [1]. Typing information is fully optional and only used by the static type-checker, the run-time environment even allows running applications with invalid typing specifications. Run-time type checking is always performed by looking at the types of the data itself. This usually allows applications to run correctly, even when encountering unexpected data types.

The listings below show a simple program, written in Erlang and its execution in the Erlang shell. This program performs a recursive calculation of the factorial function.

A simple program in Erlang
1
2
3
4
5
-module(factorial).
-export([fact/1]).

fact(0) -> 1;
fact(N) -> N * fact(N-1).
Using the Erlang program
1
2
3
4
5
6
7
8
9
10
$ erl
Erlang (BEAM) emulator version 5.6.3 [source] [async-threads:0]
                                     [hipe] [kernel-poll:false]

Eshell V5.6.3  (abort with ^G)
1> c(factorial).
{ok,factorial}
2> factorial:fact(8).
40320
3>

To allow for non-stop application, Erlang support hot-swapping of code. This means that it is possible to replace code while executing a program, making it possible to do upgrades and maintenance without interrupting the running system.

Scala

Scala, which stands for Scalable Language, is a programming language that aims to provide both object-oriented and functional programming styles, while staying compatible with the Java Virtual Machine (JVM). It was designed in 2001 by Martin Odersky and his group at EPFL in Lausanne, Switzerland, by combining the experiences gathered from designing multiple other languages.

The Scala language has been designed to be scalable [2], as it is supposed to scale along with the needs of its users. Offering functional constructs allows the developer to write short and concise code, whereas having object-oriented concepts allows the language to be used for large complex projects. Scala is fully interoperable with the Java language, which allows developers to use all Java libraries from within Scala. As such, it is not a separate language community, but rather allows you to take advantage of the enormous Java ecosystem. There is also an initiative to build a Scala version that runs on top of the .NET Common Language Runtime (CLR) [3], this ensures the portability of Scala to other underlying platforms.

Scala tries to stay to stay close to pure object-oriented programming and therefore does not have constructs like static fields and methods. Every value in Scala is an object and every operation is a method call [4]. Scala allows you to define your own operators and language constructs. This makes it possible to extend the Scala language according to your specific needs, which again helps it to grow (scale up) along with the project.

Scala uses strict-typing, yet allows most of the typing to be unspecified. When type information is not specified, the compiler will do smart type inference and attempt to infer this information from the code itself. This save programming effort and allows for more generic code. Type information is only required when the compiler cannot prove that correct type usage will happen.

The listing below shows a simple program in Scala, which shows some of its functional features, as well as some object-oriented features. As can be seen, an object with a method is defined. The definition of this method however is done in a pure functional style. There is no need to write a return statement, as the value of the function is considered to be the return statement.

A simple program in Scala
1
2
3
4
5
6
7
8
9
object TestFactorial extends Application {
    def fact(n: Int): Int = {
        if (n == 0) 1
        else n * fact(n-1)
    }

    println("The factorial of 8 is "+fact(8))
    // Output: The factorial of 8 is 40320
}

 

A different way of concurrency: The Actor Model

The problem with threads

The traditional way of offering concurrency in a programming language is by using threads. In this model, the execution of the program is split up into concurrently running tasks. It is as if the program is being executed multiple times, the difference being that each of these copies operated on shared memory.

This can lead to a series of hard to debug problems, as can be seen below. The first problem, on the left, is the lost-update problem. Suppose two processes try to increment the value of a shared object acc. They both retrieve the value of the object, increment the value and store it back into the shared object. As these operations are not atomic, it is possible that their execution gets interleaved, leading to an incorrectly updated value of acc, as shown in the example.

The solution to this problems is the use of locks. Locks provide mutual exclusion, meaning that only one process can acquire the lock at the same time. By using a locking protocol, making sure the right locks are acquired before using an object, lost-update problems are avoided. However, locks have their own share of problems. One of them is the deadlock problem, which is pictured on the right. In this example two processes try to acquire the same two locks A and B. When both do so, but in a different order, a deadlock occurs. Both wait on the other to release the lock, which will never happen.

These are just some of the problems that might occur when attempting to use threads and locks.

Lost Update Problem

Process 1 Process 2
a = acc.get()
a = a + 100 b = acc.get()
b = b + 50
acc.set(b)
acc.set(a)

Deadlock Problem

Process 1 Process 2
lock(A) lock(B)
lock(B) lock(A)

… Deadlock! …

 

Both Erlang and Scala take a different approach to concurrency: the Actor Model. It is necessary to look at the concepts of the actor model first, before studying the peculiarities of the languages itself.

The Actor Model

The Actor Model, which was first proposed by Carl Hewitt in 1973 [5] and was improved, among others, by Gul Agha [6]. This model takes a different approach to concurrency, which should avoid the problems caused by threading and locking.

In the actor model, each object is an actor. This is an entity that has a mailbox and a behaviour. Messages can be exchanged between actors, which will be buffered in the mailbox. Upon receiving a message, the behaviour of the actor is executed, upon which the actor can: send a number of messages to other actors, create a number of actors and assume new behaviour for the next message to be received.

Of importance in this model is that all communications are performed asynchronously. This implies that the sender does not wait for a message to be received upon sending it, it immediately continues its execution. There are no guarantees in which order messages will be received by the recipient, but they will eventually be delivered.

A second important property is that all communications happen by means of messages: there is no shared state between actors. If an actor wishes to obtain information about the internal state of another actor, it will have to use messages to request this information. This allows actors to control access to their state, avoiding problems like the lost-update problem. Manipulation of the internal state also happens through messages.

Each actor runs concurrently with other actors: it can be seen as a small independently running process.

Actors in Erlang

In Erlang, which is designed for concurrency, distribution and scalability, actors are part of the language itself. Due to its roots in the telecom industry, where a very large amount of concurrent processes are normal, it is almost impossible to think of Erlang without actors, which are also used to provide distribution. Actors in Erlang are called processes and are started using the built-in spawn function.

A simple application that uses an actor can be seen below. In this application, an actor is defined which acts as a basic counter. We send 100.000 increment messages to the actor and then request it to print its internal value.

Actors in Erlang
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-module(counter).
-export([run/0, counter/1]).

run() ->
    S = spawn(counter, counter, [0]),
    send_msgs(S, 100000),
    S.

counter(Sum) ->
    receive
        value -> io:fwrite("Value is ~w~n", [Sum]);
        {inc, Amount} -> counter(Sum+Amount)
    end.

send_msgs(_, 0) -> true;
send_msgs(S, Count) ->
    S ! {inc, 1},
    send_msgs(S, Count-1).

% Usage:
%    1> c(counter).
%    2> S = counter:run().
%       ... Wait a bit until all children have run ...
%    3> S ! value.
%    Value is 100000

Lines 1 & 2 defines the module and the exported functions. Lines 4 till 7 contain the run function, which starts a counter process and starts sending increment messages. Sending these messages happens in lines 15 till 18, using the message-passing operator (!). As Erlang is a purely functional language, it has no loop structures. Therefore, this has to be expressed using recursion. These extremely deep recursion stacks would lead to stack overflows in Java, yet Erlang is optimized for these usage patterns. The increment message in this example also carries a parameter, to show Erlangs parameter capabilities. The state of the counter is also maintained using recursion: upon receiving an inc message, the counter calls itself with the new value which causes it to receive the next message. If no messages are available yet, the counter will block and wait for the next message.

Actor scheduling in Erlang

Erlang uses a preemptive scheduler for the scheduling of processes [7]. When they have executed for a too long period of time (usually measured in the amount of methods invoked or the amount of CPU-cycles used), or when they enter a receive statement with no messages available, the process is halted and placed on a scheduling queue.

This allows for a large number of processes to run, with a certain amount of fairness. Long running computations will not cause other processes to become unresponsive.

Starting with release R11B, which appeared in May 2006, the Erlang run-time environment has support for symmetric multiprocessing (SMP) [8]. This means that it is able to schedule processes in parallel on multiple CPUs, allowing it to take advantage of multi-core processors. The functional nature of Erlang allows for easy parallelization. An Erlang lightweight process (actor) will never run in parallel on multiple processors, but using a multi-threaded run-time allows multiple processes to run at the same time. Big performance speedups have been observed using this technique.

Actors in Scala

Actors in Scala are available through the scala.actors library. Their implementation is a great testament for the expressiveness of Scala: all functionality, operators and other language constructs included, is implemented in pure Scala, as a library, without requiring changes to Scala itself.

The same sample application, this time written in Scala can be seen below:

 

Actors in Scala
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import scala.actors.Actor
import scala.actors.Actor._

case class Inc(amount: Int)
case class Value

class Counter extends Actor {
    var counter: Int = 0;

    def act() = {
        while (true) {
            receive {
                case Inc(amount) =>
                    counter += amount
                case Value =>
                    println("Value is "+counter)
                    exit()
            }
        }
    }
}

object ActorTest extends Application {
    val counter = new Counter
    counter.start()

    for (i <- 0 until 100000) {
        counter ! Inc(1)
    }
    counter ! Value
    // Output: Value is 100000
}

 

We can see the following code: Lines 1 & 2 import the abstract Actor class (Note: This is actually a trait, a special composition mechanism used by Scala, but this is out of the scope of this article.) and its members (we need the ! operator for sending messages). Lines 4 & 5 define the Inc and Value} case classes, which will be used as message identifiers. The increment message has a parameter, as an example to demonstrate this ability.

Lines 7 till 21 define the Counter actor, as a subclass of Actor. The act() method is overridden, which provides the behavior of the actor (lines 10-20). This version of the counter actor is written using a more object-oriented style (though Scala fully supports the pure functional way, as shown in the Erlang example too). The state of the actor is maintained in an integer field counter. In the act() method, an endless receive loop is executed. This block processes any incoming messages, either by updating the internal state, or by printing its value and exiting.

Finally, on lines 23 till 32, we find the main application, which first constructs a counter, then sends 100.000 Inc messages and finally send it the Value message. The ! operator is used to send a message to an actor, a notation that was borrowed from Erlang.

The output of this program shows that the counter has been incremented up to a value of 100.000. This means that in this case all our messages were delivered in order. This might not always be the case: recall that there are no guarantees on the order of message delivery in the actor model.

The example above shows the ease with which actors can be used in Scala, even though they are not part of the language itself.

It also shows the similarities between the actors library in Scala and the Erlang language constructs. This is no coincidence, as the Scala actors library was heavily inspired by Erlang. The Scala developers have however expanded upon Erlangs concepts and added a number of features, which will be highlighted in the following sections.

Replies

The authors of scala.actors noticed a recurring pattern in the usage of actors: a request/reply pattern [9]. This pattern is illustrated below. Often, a message is sent to an actor and in that message, the sender is passed along. This allows the receiving actor to reply to the message.

To facilitate this, a reply() construct was added. This removes the need to send the sender along in the message and provides easy syntax for replying.

Normal request/reply
1
2
3
4
5
receive {
    case Msg(sender, value) =>
        val r = process(value)
        sender ! Response(r)
}
Request/reply using reply()
1
2
3
4
5
receive {
    case Msg(value) =>
        val r = process(value)
        reply(Response(r))
}

reply() construct is not present in Erlang, where you are forced to include the sender each time you want to be able to receive replies. This is not a bad thing however: Scala messages always carry the identity of the sender with them to enable this functionality. This causes a tiny bit of extra overhead, which might be too much in performance critical applications.

Synchronous messaging

Scala contains a construct which can be used to wait for message replies. This allows for synchronous invocation, which is more like method invocation. The syntax to do this is shown below, to the right, contrasted by the normal way of writing this to the left.

When entering the receive block, or upon using the !? operator, the actor waits until it receives a message matched by any of the case clauses. When the actor receives a message that is not matched, it will stay in the mailbox of the actor and retried when a new receive block is entered.

Waitin for a reply
1
2
3
4
myService ! Msg(value)
receive {
    case Response(r) => // ...
}
Synchronous invocation using !?
1
2
3
myService !? Msg(value) match {
    case Response(r) => // ...
}

Channels

Messages, as used in the previous examples, are used in a somewhat loosely typed fashion. It is possible to send any kind of message to an actor. Messages that are not matched by any of the case clauses will remain in the mailbox, rather than causing an error.

Scala has a very rich type system and the Scala developers wanted to take advantage of this. Therefore they added the concept of channels [9]. These allow you to specify the type of messages that can be accepted using generics. This enables type-safe communication.

The mailbox of an actor is a channel that accepts any type of message.

Thread-based vs. Event-based actors

Scala makes the distinction between thread-based and event-based actors.

Thread-based actors are actors which each run in their own JVM thread. They are scheduled by the Java thread scheduler, which uses a preemptive priority-based scheduler. When the actor enters a receive block, the thread is blocked until messages arrive. Thread-based actors make it possible to do long-running computations, or blocking I/O operations inside actors, without hindering the execution of other actors.

There is an important drawback to this method: each thread can be considered as being heavy-weight and uses a certain amount of memory and imposes some scheduling overhead. When large amounts of actors are started, the virtual machine might run out of memory or it might perform suboptimal due to large scheduling overhead.

In situations where this is unacceptable, event-based actors can be used. These actor are not implemented by means of one thread per actor, yet instead they run on the same thread. An actor that waits for a message to be received is not represented by a blocked thread, but by a closure. This closure captures the state of the actor, such that it’s computation can be continued upon receiving a message [10]. The execution of this closure happens on the thread of the sender.

Event-based actors provide a more light-weight alternative, allowing for very large numbers of concurrently running actors. They should however not be used for parallelism: since all actors execute on the same thread, there is no scheduling fairness.

A Scala programmer can use event-based actors by using a react block instead of a receive block. There is one big limitation to using event-based actors: upon entering a react block, the control flow can never return to the enclosing actor. In practice, this does not prove to be a severe limitation, as the code can usually be rearranged to fit this scheme. This property is enforced through the advanced Scala type system, which allows specifying that a method never returns normally (the Nothing type). The compiler can thus check if code meets this requirement.

Actor scheduling in Scala

Scala allows programmers to mix thread-based actors and event-based actors in the same program: this way programmers can choose whether they want scalable, lightweight event-based actors, or thread-based actors that allow for parallelism, depending on the situation in which they are needed.

Scala uses a thread pool to execute actors. This thread pool will be resized automatically whenever necessary. If only event-based actors are used, the size of this thread pool will remain constant. When blocking operations are used, like receive blocks, the scheduler (which runs in its own separate thread) will start new threads when needed. Periodically, the scheduler will check if there are runnable tasks in the task queue, it will then check if all worker threads are blocked and start a new worker thread if needed.

Evaluation

The next section will evaluate some of the features in both languages. It aims to show that while some of these features facilitate the implementation, their power also comes with a risk. One should be aware of the possible drawbacks of the chosen technologies, to avoid potential pitfalls.

The dangers of synchronous message passing

The synchronous message passing style available in Scala (using !?) provides programmers with a convenient way of doing messaging round-trips. This allows for a familiar style of programming, similar to remote method invocation.

It should however be used with great care. Due to the very selective nature of the match clause that follows the use of !? (it usually matches only one type of message), the actor is effectively blocked until a suitable reply is received. This is implemented using a private return channel [9], which means that the progress of the actor is fully dependent on the actor from which it awaits a reply: it cannot handle any messages other than the expected reply, not even if they come from the actor from which it awaits reply.

This is a dangerous situation, as it might lead to deadlocks. To see this, consider the following example:

Actor deadlock: Actor A
1
2
3
4
5
6
7
8
9
actorB !? Msg1(value) match {
    case Response1(r) =>
      // ...
}

receive {
    case Msg2(value) =>
      reply(Response2(value))
}
Actor deadlock: Actor B
1
2
3
4
5
6
7
8
9
ctorA !? Msg2(value) match {
    case Response2(r) =>
      // ...
}

receive {
    case Msg1(value) =>
      reply(Response1(value))
}

In this example, each actor sends a message to the other actor. It will never receive an answer, as the actor is first awaiting a different message. If it were implemented using a message loop, like you would generally do in Erlang, no problem would arise. This is shown below. This does not mean that synchronous message passing should be avoided: in certain cases, it is necessary and in these cases the extra syntax makes this much easier to program. It is however important to be aware of the potential problems caused by this programming style.

Safe loop: Actor A
1
2
3
4
5
6
7
8
9
actorB ! Msg1(value)
while (true) {
    receive {
        case Msg2(value) =>
          reply(Response2(value))
        case Response1(r) =>
          // ...
    }      
}
Safe loop: Actor B
1
2
3
4
5
6
7
8
9
actorA ! Msg2(value)
while (true) {
    receive {
        case Msg1(value) =>  
          reply(Response1(value))
        case Response2(r) =>
          // ...
    }      
}

The full source code for these examples can be found online, see the appendix for more information.

Safety in Scala concurrency

Another potential pit-fall in Scala comes from the fact that it mixes actors with object-oriented programming. It is possible to expose the internal state of an actor through publicly available methods for retrieving and modifying this state. When doing so, it is possible to modify an object by directly invoking its methods, that is: without using messages. Doing so means that you no longer enjoy the safety provided by the actor model.

Erlang on the other hand, due to its functional nature, strictly enforces the use of messages between processes: there is no other way to retrieve and update information in other processes.

This illustrates possibly the biggest trade-off between Erlang and Scala: having a pure functional language, like Erlang, is safe, but more difficult for programmers to use. The object-oriented, imperative style of Scala is more familiar and makes programming easier, yet requires more discipline and care to produce safe and correct programs.

Summary

This article described the actor model for the implementation of concurrency in applications, as an alternative to threading and locking. It highlighted Erlang and Scala, two languages with an implementation of the actor model and showed how these languages implement this model.

Erlang is a pure functional language, providing little more than the basic features of functional languages. This should certainly not be seen as a weakness though: this simplicity allows it to optimize specifically for the cases for which it was defined as well as implement more advanced features like hot-swapping of code.

Scala on the other hand uses a mix of object-oriented and functional styles. This makes it easier for a programmer to write code, especially given the extra constructs offered by Scala, but this flexibility comes with a warning: discipline should be used to avoid inconsistencies.

The differences between these languages should be seen and evaluated in their design context. Both however provide an easy to use implementation of the actor model, which greatly facilitates the implementation of concurrency in applications.

 

References

Appendix: Source code + PDF

The source code for the sample programs and a PDF version can be found right here (tar.gz download). It has been tested on Ubuntu Linux 8.10, with Erlang 5.6.3 and Scala 2.7.2 final (installed from Debian packages). It should work on any platform.

(Via rocketeer.be)

CloudI: Bringing Erlang’s Fault-Tolerance to Polyglot Development

Clouds must be efficient to provide useful fault-tolerance and scalability, but they also must be easy to use.

CloudI (pronounced “cloud-e” /klaʊdi/) is an open source cloud computing platform built in Erlang that is most closely related to the Platform as a Service (PaaS) clouds. CloudI differs in a few key ways, most importantly: software developers are not forced to use specific frameworks, slow hardware virtualization, or a particular operating system. By allowing cloud deployment to occur without virtualization, CloudI leaves development process and runtime performance unimpeded, while quality of service can be controlled with clear accountability.

What makes a cloud a cloud?

The word “cloud” has become ubiquitous over the past few years. And its true meaning has become somewhat lost. In the most basic technological sense, these are the properties that a cloud computing platform must have:

And these are the properties that we’d like a cloud to have:

  • Easy integration
  • Simple deployment

My goal in building CloudI was to bring together these four attributes.

It’s important to understand that few programming languages can provide real fault-tolerance with scalability. In fact, I’d say Erlang is roughly alone in this regard.

I began by looking at the Erlang programming language (on top of which CloudI is built). The Erlang virtual machine provides fault-tolerance features and a highly scalable architecture while the Erlang programming language keeps the required source code small and easy to follow.

It’s important to understand that few programming languages can provide real fault-tolerance with scalability. In fact, I’d say Erlang is roughly alone in this regard. Let me take a detour to explain why and how.

toptal-blog-image-1373446088602

What is fault-tolerance in cloud computing?

Fault-tolerance is robustness to error. That is, fault-tolerant systems are able to continue operating relatively normally even in the event of (hopefully isolated) errors.

Here, we see service C sending requests to services A and B. Although service B crashes temporarily, the rest of the system continues, relatively unimpeded.

Erlang tutorial for beginners

Erlang is known for achieving 9x9s of reliability (99.9999999% uptime, so less than 31.536 milliseconds of downtime per year) with real production systems (within the telecommunications industry). Normal web development techniques only achieve 5x9s reliability (99.999% uptime, so about 5.256 minutes of downtime per year), if they are lucky, due to slow update procedures and complete system failures. How does Erlang provide this advantage?

The Erlang virtual machine implements what is called an “Actor model”, a mathematical model for concurrent computation. Within the Actor model, Erlang’s lightweight processes are a concurrency primitive of the language itself. That is, within Erlang, we assume that everything is an actor. By definition, actors perform their actions concurrently; so if everything is an actor, we get inherent concurrency. (For more on Erlang’s Actor model, there’s a longer discussion here.)

As a result, Erlang software is built with many lightweight processes that keep process state isolated while providing extreme scalability. When an Erlang process needs external state, a message is normally sent to another process, so that the message queuing can provide the Erlang processes with efficient scheduling. To keep the Erlang process state isolated, the Erlang virtual machine does garbage collection for each process individually so that other Erlang processes can continue running concurrently without being interrupted.

The Erlang virtual machine garbage collection is an important difference when compared with Java virtual machine garbage collection because Java depends on a single heap, which lacks the isolated state provided by Erlang. The difference between Erlang garbage collection and Java garbage collection means that Java is unable to provide basic fault-tolerance guarantees simply due to the virtual machine garbage collection, even if libraries or language support was developed on top of the Java virtual machine. There have been attempts to develop fault-tolerance features in Java and other Java virtual machine based languages, but they continue to be failures due to the Java virtual machine garbage collection.

toptal-blog-image-1373446107190

Basically, building real-time fault-tolerance support on top of the JVM is by definition impossible, because the JVM itself is not fault-tolerant.

Erlang processes

At a low level, what happens when we get an error in an Erlang process? The language itself uses message passing between processes to ensure that any errors have a scope limited by a concurrent process. This works by storing data types as immutable objects; these objects are copied to limit the scope of the process state (large binaries are a special exception because they are reference counted to conserve memory).

In basic terms, that means that if we want to send variable X to another process P, we have to copy over X as its own immutable variable X’. We can’t modify X’ from our current process, and so even if we trigger some error, our second process P will be isolated from its effects. The end result is low-level control over the scope of any errors due to the isolation of state within Erlang processes. If we wanted to get even more technical, we’d mention that Erlang’s lack of mutability gives it referential transparency unlike, say, Java.

This type of fault-tolerance is deeper than just adding try-catch statements and exceptions. Here, fault-tolerance is about handling unexpected errors, and exceptions are expected. Here, you’re trying to keep your code running even when one of your variables unexpectedly explodes.

Erlang’s process scheduling provides extreme scalability for minimal source code, making the system simpler and easier to maintain. While it is true that other programming languages have been able to imitate thescalability found natively within Erlang by providing libraries with user-level threading (possibly combined with kernel-level threading) and data exchanging (similar to message passing) to implement their own Actor model for concurrent computation, the efforts have been unable to replicate the fault-tolerance provided within the Erlang virtual machine.

This leaves Erlang alone amongst programming languages as being both scalable and fault-tolerant, making it an ideal development platform for a cloud.

Taking advantage of Erlang

So, with all that said, I can make the claim that CloudI brings Erlang’s fault-tolerance and scalability to various other programming languages (currently C++/C, Erlang (of course), Java, Python, and Ruby), implementing services within a Service Oriented Architecture (SOA).

This simplicity makes CloudI a flexible framework for polyglot software development, providing Erlang’s strengths without requiring the programmer to write or even understand a line of Erlang code.

Every service executed within CloudI interacts with the CloudI API. All of the non-Erlang programming language services are handled using the same internal CloudI Erlang source code. Since the same minimal Erlang source code is used for all the non-Erlang programming languages, other programming language support can easily be added with an external programming language implementation of the CloudI API. Internally, the CloudI API is only doing basic serialization for requests and responses. This simplicity makes CloudI a flexible framework for polyglot software development, providing Erlang’s strengths without requiring the programmer to write or even understand a line of Erlang code.

The service configuration specifies startup parameters and fault-tolerance constraints so that service failures can occur in a controlled and isolated way. The startup parameters clearly define the executable and any arguments it needs, along with default timeouts used for service requests, the method to find a service (called the “destination refresh method”), both an allow and deny simple access control list (ACL) to block outgoing service requests and optional parameters to affect how service requests are handled. The fault-tolerance constraints are simply two integers (MaxR: maximum restarts, and MaxT: maximum time period in seconds) that control a service in the same way an Erlang supervisor behavior (an Erlang design pattern) controls Erlang processes. The service configuration provides explicit constraints for the lifetime of the service which helps to make the service execution easy to understand, even when errors occur.

To keep the service memory isolated during runtime, separate operating system processes are used for each non-Erlang service (referred to as “external” services) with an associated Erlang process (for each non-Erlang thread of execution) that is scheduled by the Erlang VM. The Erlang CloudI API creates “internal” services, which are also associated with an Erlang process, so both “external” services and “internal” services are processed in the same way within the Erlang VM.

toptal-blog-image-1373446123352

In cloud computing, it’s also important that your fault-tolerance extends beyond a single computer (i.e., distributed system fault-tolerance). CloudI uses distributed Erlang communication to exchange service registration information, so that services can be utilized transparently on any instance of CloudI by specifying a single service name for a request made with the CloudI API. All service requests are load-balanced by the sending service and each service request is a distributed transaction, so the same service with separate instances on separate computers is able to provide system fault-tolerance within CloudI. If necessary, CloudI can be deployed within a virtualized operating system to provide the same system fault-tolerance while facilitating a stable development framework.

For example, if an HTTP request needs to store some account data in a database it could be made to the configured HTTP service (provided by CloudI) which would send the request to an account data service for processing (based on the HTTP URL which is used as the service name) and the account data would then be stored in a database. Every service request receives a Universally Unique IDentifier (UUID) upon creation, which can be used to track the completion of the service request, making each service request within CloudI a distributed transaction. So, in the account data service example, it is possible to make other service requests either synchronously or asynchronously and use the service request UUIDs to handle the response data before utilizing the database. Having the explicit tracking of each individual service request helps ensure that service requests are delivered within the timeout period of a request and also provides a way to uniquely identify the response data (the service request UUIDs are unique among all connected CloudI nodes).

With a typical deployment, each CloudI node could contain a configured instance of the account data service (which may utilize multiple operating system processes with multiple threads that each have a CloudI API object) and an instance of the HTTP service. An external load balancer would easily split the HTTP requests between the CloudI nodes and the HTTP service would route each request as a service request within CloudI, so that the account data service can easily scale within CloudI.

CloudI in action

CloudI lets you take unscalable legacy source code, wrap it with a thin CloudI service, and then execute the legacy source code with explicit fault-tolerance constraints. This particular development workflow is important for fully utilizing multicore machines while providing a distributed system that is fault-tolerant during the processing of real-time requests. Creating an “external” service in CloudI is simply instantiating the CloudI API object in all the threads that have been configured within the service configuration, so that each thread is able to handle CloudI requests concurrently. A simple service example can utilize the single main thread to create a single CloudI API object, like the following Python source code:

import sys
sys.path.append('/usr/local/lib/cloudi-1.2.3/api/python/')
from cloudi_c import API

class Task(object):
    def __init__(self):
        self.__api = API(0) # first/only thread == 0

    def run(self):
        self.__api.subscribe('hello_world_python/get', self.__hello_world)
        self.__api.poll()

    def __hello_world(self, command, name, pattern, request_info, request,
                      timeout, priority, trans_id, pid):
        return 'Hello World!'

if __name__ == '__main__':
    assert API.thread_count() == 1 # simple example, without threads
    task = Task()
    task.run()

The example service simply returns a “Hello World!” to an HTTP GET request by first subscribing with a service name and a callback function. When the service starts processing incoming CloudI service bus requests within the CloudI API poll function, incoming requests from the “internal” service which provides a HTTP server are routed to the example service based on the service name, because of the subscription. The service could also have returned no data as a response, if the request needed to be similar to a publish message in a typical distributed messaging API that provides publish/subscribe functionality. The example service wants to provide a response so that the HTTP server can provide the response to the HTTP client, so the request is a typical request/reply transaction. With both possible responses from a service, either data or no data, the service callback function controls the messaging paradigm used, instead of the calling service, so the request is able to route through as many services as necessary to provide a response when necessary, within the timeout specified for the request to occur within. Timeouts are very important for real-time event processing, so each request specifies an integer timeout which follows the request as it routes though any number of services. The end result is that real-time constraints are enforced alongside fault-tolerance constraints, to provide a dependable service in the presence of any number of software errors.

The presence of source code bugs in any software should be understood as a clear fact that is only mitigated with fault-tolerance constraints. Software development can reduce the presence of bugs as software is maintained, but it can also add bugs as software features are added. CloudI provides cloud computing which is able to address these important fault-tolerance concerns within real-time distributed systems software development. As I have demonstrated in this CloudI and Erlang tutorial, the cloud computing that CloudI provides is minimal so that efficiency is not sacrificed for the benefits of cloud computing.

(Via toptal.com)