Data Warehouse and Analytics Infrastructure at Viki

At Viki, we use data to power product, marketing and business decisions. We use an in-house analytics dashboard to expose all the data we collect to various teams through simple table and chart based reports. This allows them to monitor all our high level KPIs and metrics regularly.

Data also powers more heavy duty stuff – like our data-driven content recommendation system, or predictive models that help us forecast the value of content we’re looking to license. We’re also constantly looking at data to determine the success of new product features, tweak and improve existing features and even kill stuff that doesn’t work. All of this makes data an integral part of the decision making process at Viki.

To support all these functions, we need a robust infrastructure below it, and that’s our data warehouse and analytics infrastructure.

This post is the first of a series about our data warehouse and analytics infrastructure. In this post we’ll cover the high-level pipeline of the system and goes into details about how we collect and batch-process our data. Do expect a lot of detailed-level discussions.

About Viki: We’re a online TV site, with fan-powered translations in 150+ different languages. To understand more what Viki is, watch this short video (2 minutes).

Part 0: Overview (or TL;DR)

Our analytics infrastructure, following most common sense approach, is broken down into 3 steps:

  • Collect and Store Data
  • Process Data (batch + real-time)
  • Present Data


Collect and Store Data

  1. Logs (events) are sent by different clients to a central log collector
  2. Log collector forward events to a Hydration service, here the events get enriched with more time-sensitive information; the results are stored to S3

Batch Processing Data

  1. There is an hourly job that runs to take data from S3, apply further transformations (read: cleaning up bad data) and store the results to our cloud-based Hadoop cluster
  2. We run multiple MapReduce (Hive) jobs to aggregate data from Hadoop and write them to our central analytics database (Postgres)
  3. Another job takes a snapshot of our production databases and restore into our analytics database

Presenting Data

  1. All analytics/reporting-related activities are then done at our master analytics database. The results (meant for report presentation) are then sent to our Reporting DB
  2. We run an internal reporting dashboard app on top of our Reporting DB; this is where end-users log in to see their reports

Real-time Data Processing

  1. The data from Hydration service is also multiplexed and sent to our real-time processing pipeline (using Apache Storm)
  2. At Storm, we write custom job that does real-time aggregation of important metrics. We also write a real-time alerting system to inform ourselves when traffic goes bad

Part 1: Collecting, Pre-processing and Storing Data


We use fluentd to receive event logs from different platforms (web, Android, iOS, etc) (through HTTP). We set up a cluster of 2 dedicated servers running multiple fluentd instances inside Docker, load-balanced through HAproxy. When the message hits our endpoint, fluentd then buffers the messages and batch-forward them to our Hydration System. At the moment, our cluster is doing 100M messages a day.

A word about fluentd: It’s a robust open-source log collecting software that has a very healthy and helpful community around it. The core is written in C so it’s fast and scalable, with plugins written in Ruby, making it easy to extend.

What data do we collect? We collect everything that we think is useful for the business: a click event, an ad impression, ad request, a video play, etc. We call each record an event, and it’s stored as a JSON string like this:

  "time":1380452846, "event": "video_play",
  "video_id":"1008912v", "user_id":"5298933u",
  "app_id":"100004a", "app_ver":"",
  "device":"iPad", "stream_quality":"720p",
  "ip":"", "country":"ca",
  "city_name":"Toronto", "region_name":"ON"

Pre-processing Data – Hydration Service


Data Hydration ServiceThe data after collected is sent to a Hydration Service for pre-processing. Here, the message is enriched with time-sensitive information.

For example: When a user watches a video (thus a video_play event sent), we want to know if it’s a free user or a paid user. Since the user could be a free user today and upgrade to paid tomorrow, the only way to correctly attribute the play event to free/paid bucket is to inject that status right right into the message when it’s received. In short, the service translates this:

{ "event":"video_play", "user_id":"1234" }

into this:

{ "event":"video_play", "user_id":"1234", "user_status":"free" }

For non time-sensitive operations (fixing typo, getting country from IP, etc), there is different process for that (discussed below)

Storing to S3

From the Hydration Service, the message is buffered and then stored to S3 – our source of truth. The data is gzip-compressed and stored into hour bucket, making it easy and fast to retrieve them per hour time-period.

Part 2: Batch-processing Data

The processing layer has 2 components, the batch-processing and the real-time processing component.

This section focuses mainly on our batch-processing layer – our main process of transforming data for reporting/presentation. We’ll cover our real-time processing layer in another post.


Batch-processing Data Layer

Cleaning Data Before Importing Into Hadoop

Those who have worked with data before know this: Cleaning data takes a lot of time. In fact: Cleaning and preparing data will take most of your time. Not the actual analysis.

What is unclean data (or bad data)? Data that is logged incorrectly. It comes in a lot of different forms. E.g

  • Typo mistake, send ‘clickk’ event instead of ‘click’
  • Clients send event twice, or forgot to send event

When bad data enters your system, it stays there forever, unless you purposely find a way to clean it/take it out

So how do we clean up bad data?

Previously when we receive a record, we write directly to our Hadoop cluster and make a backup to S3. This makes it difficult to correct the bad data, due to the append-only nature of Hadoop.

Now all the data is first stored into S3. And we have hourly process that takes data from S3, apply cleanup/transformations and load them into Hadoop (insert-overwrite).


Storing Data, Before and AfterThe process is similar in nature to the hydration process, but this time we look at 1 hour block at a time, rather than per record. This approach has many great benefits:

  • The pipeline is more linear, thus prevents from the threat of data discrepancy (between S3 and Hadoop).
  • The data is not tied down to being stored in Hadoop. If we want to load our data into other data storage, we’d just write another process that transform S3 data and dump somewhere else.
  • When a bad logging happens causing unclean data, we can modify the transformation code and rerun the data from the point of bad logging. Because the process isidempotent, we can perform the reprocessing as many times as we want without double-logging the data.


Our S3 to Hadoop Transform and Load ProcessIf you’ve studied this article The Log from the Data Engineering folks at LinkedIn, you’d notice that the approach is very similar (replacing Kafka with S3, and per-message processing with per hour processing). Indeed our paradigm is inspired by The Log architecture. However due to our needs, our system chose S3 because:

  1. When it comes to batch (re)processing, we do it in a time-period manner (eg. process 1 hour of data). Kafka use natural number to order message, thus if we use Kafka we’ll have to build another service to translate [beg_timestamp, end_timestamp) into [beg_index, end_index).
  2. Kafka can only retain up to X days due to the disk-space limitation. As we want the ability to reprocess data further back, employing Kafka means we need another strategy to cater to these cases.

Aggregating Data from Hadoop into Postgres

Once the data got into Hadoop, we’d have these daily aggregation jobs to aggregate data into fewer dimensions and port them into our analytics master database (PostgreSQL)

For example, to aggregate a table of video starts data together with some video and user information, we run this Hive query (MapReduce job):

-- The hadoop's events table would contain 2 fields: time (int), v (json)
  SUBSTR(FROM_UNIXTIME(time), 0, 10) AS date_d,
  v['platform'] AS platform,
  v['country'] AS country,
  v['video_id'] AS video_id,
  v['user_id'] AS user_id
  COUNT(1) AS cnt
FROM events
WHERE time >= BEG_TS
  AND time <= END_TS
  AND v['event'] = 'video_start'
GROUP BY 1,2,3,4,5

and load the results into an aggregated.video_starts table in Postgres:

       Table "aggregated.video_starts"
   Column    |          Type          | Modifiers 
 date_d      | date                   | not null
 platform    | character varying(255) | 
 country     | character(3)           | 
 video_id    | character varying(255) | 
 user_id     | character varying(255) | 
 cnt         | bigint                 | not null

Further querying and reporting of video_starts will be done out of this table. If we need more dimensions, we either rebuild this table with more dimensions, or build a new table from Hadoop.

If it’s a one-time ad-hoc analysis, we’d just run the queries directly against Hadoop.

Table Partitioning:



Also, we’re making use of Postgres’ Table Inheritance feature to partition our data into multiple monthly tables with a parent table on top of all. Your query just needs to hit the parent table and the engine will know which underlying monthly tables to hit to get your data.

This makes our data very easy to maintain, with small indexes and better rebuild process. We’d have fast (SSD) drives that host the recent tables, and move the older ones to slower (but bigger) drives for semi-archiving purpose.

Centralizing All Data



We dump our production databases into our master analytics database on a daily basis.

Also, we use a lot of 3rd party vendors (Adroll, DoubleClick, Flurry, GA, etc). For each of these services, we write a process to ping their API and import the data into our master analytics database.

These data, together with the aggregated data from Hadoop, allow us to produce meaningful analysis combining data from multiple sources.

For example, to break down our video starts by different genre, we would write some query that joins data from prod.videos table with aggregated.video_starts table:

-- Video Starts by genre
SELECT V.genre, SUM(cnt) FROM aggregated.video_starts VS
LEFT JOIN prod.videos V ON VS.video_id =
WHERE VS.date_d = '2014-06-01'

The above is made possible because we have both sources of data (event tracking data + production data) in 1 place.

Centralizing data is a very important concept in our pipeline because it makes it simple and pain-free for us to connect, report and corroborate our numbers across many different data sources.

Managing Job Dependencies

We started with simple crontab to schedule our hourly/daily jobs. When the jobs grew complicated, we’d end up with very long crontab:


Crontab also doesn’t support graph-based job flow (e.g run A and B at the sametime, when both finishes, run C)


So we looked around for a solution. We considered Chronos (by Airbnb), but their use-case is more complicated than what we needed, plus the need to setup ZooKeeper and all that.

We ended up using Azkaban by LinkedIn, it has everything we need: Crontab with graph-based job flow, it also tell you the runtime history of your job. And when a job flow fails, you can restart them, running only tasks that failed/haven’t run.

It’s pretty awesome.

Making Sure Your Numbers Tie

One of the things I see being less discussed in analytics infrastructure talk/blog is making sure your data don’t drop half-way during transportation, resulting in data inconsistency in different storages.

We have a process that runs after every data transportation, it counts number of records in both source and destination storage and prints errors when they don’t match. These check-total processes sound a little tedious to do, but it proved crucial to our system; it gives us the confidence in the accurary of the numbers we report to management.

Case in point, we had a process that dumps data from Postgres to CSV, then compresses and uploads to S3 and loads them into Amazon Redshift (using COPY command). So technically we have the exact same table in both Postgres and Redshift. One day our analyst pointed out that the data in Redshift is significantly less than in Postgres. Upon investigation, there was bug that cause CSV file to be truncated and thus not fully loaded into Redshift tables. It was because for this particular process we didn’t have the check-totals in place.

Using Ruby as the Scripting Language

When we started we are primarily a Ruby shop, so going ahead with Ruby was a natural choice. “Isn’t it slow?”, you might say. But we use Ruby not to process data (i.e it rarely holds any large amount of data), but to facilitate and coordinate the process.

We have written an entire library to support doing data pipeline in Ruby. For example, we extended pg gem to make it more object-oriented to Postgres. It allows us to do table creation, table hot-swapping, upsert, insert-overwriting, copying tables between databases, etc, all without having to touch SQL code. It has become a nice, productive abstraction on top of SQL and Postgres. Think ORM for data-warehousing purpose.

Example: The below code will create a data.videos_by_genre table holding the result of a simple aggregation query. The process works on a temporary table and eventually it’ll perform a table hot-swap with the main one; this is to avoid any data disruption being made if we would have done it on the main table from the beginning.

columns = [
  {name: 'genre', data_type: 'varchar'},
  {name: 'video_count', data_type: 'int'},
indexes = [{columns: ['genre']}]
table = 'data.videos_by_genre', columns, indexes

table.with_temp_table do |temp_t|
  temp_t.drop(check_exists: true)
  connection.exec <<-SQL.strip_heredoc
    INSERT INTO #{temp_t.qualified_name}
    SELECT genre, COUNT(1) FROM prod.videos
    GROUP BY 1


(the above example could also be done using a MATERIALIZED VIEW btw)

Having this set of libraries has proven very critical to our data pipeline process, since it allows us to write extensible and maintainable code that perform all sort of data transformations.


We rely mostly on free and open-source technologies. Our stack is:

  • fluentd (open-source) for collecting logs
  • Cloud-based Hadoop + Hive (TreasureData – 3rd party vendor)
  • PostgreSQL + Amazon Redshift as central analytics database
  • Ruby as scripting language
  • NodeJS (worker process) with Redis (caching)
  • Azkaban (job flow management)
  • Kestrel (message queue)
  • Apache Storm (real-time stream processing)
  • Docker for automated deployment
  • HAproxy (load balancing)
  • and lots of SQL (huge thanks to Postgres, one of the best relational databases ever made)


The above post went through the overall architecture of our analytics system. It also went into details the Collecting layer and Batch-processing layer. In later blog posts we’ll cover the remaining, specifically:

  • Our Data Presentation layer. And how Stuti, our analyst, built our funnel analysis, fan-in and fan-out tools, all with SQL. And it updates automatically (very funnel. wow!)
  • Our Real-time traffic alert/monitoring system (using Apache Storm)
  • turing: our feature roll-out, A/B testing framework

( Via Engineering. )

Nifty Architecture Tricks From Wix – Building A Publishing Platform At Scale


Wix operates websites in the long tale. As a HTML5 based WYSIWYG web publishing platform, they have created over 54 million websites, most of which receive under 100 page views per day. So traditional caching strategies don’t apply, yet it only takes four web servers to handle all the traffic. That takes some smart work.

Aviran Mordo, Head of Back-End Engineering at Wix, has described their solution in an excellent talk: Wix Architecture at Scale. What they’ve developed is in the best tradition of scaling is specialization. They’ve carefully analyzed their system and figured out how to meet their aggressive high availability and high performance goals in some most interesting ways.

Wix uses multiple datacenters and clouds. Something I haven’t seen before is that they replicate data to multiple datacenters, to Google Compute Engine, and to Amazon. And they have fallback strategies between them in case of failure.

Wix doesn’t use transactions. Instead, all data is immutable and they use a simple eventual consistency strategy that perfectly matches their use case.

Wix doesn’t cache (as in a big caching layer). Instead, they pay great attention to optimizing the rendering path so that every page displays in under 100ms.

Wix started small, with a monolithic architecture, and has consciously moved to a service architecture using a very deliberate process for identifying services that can help anyone thinking about the same move.

This is not your traditional LAMP stack or native cloud anything. Wix is a little different and there’s something here you can learn from. Let’s see how they do it…


  • 54+ million websites, 1 million new websites per month.

  • 800+ terabytes of static data, 1.5 terabytes of new files per day

  • 3 data centers + 2 clouds (Google, Amazon)

  • 300 servers

  • 700 million HTTP requests per day

  • 600 people total, 200 people in R&D

  • About 50 services.

  • 4 public servers are needed to serve 45 million websites


  • MySQL

  • Google and Amazon clouds

  • CDN

  • Chef


  • Simple initial monolithic architecture. Started with one app server. That’s the simplest way to get started. Make quick changes and deploy. It gets you to a particular point.

    • Tomcat, Hibernate, custom web framework

    • Used stateful logins.

    • Disregarded any notion of performance and scaling.

  • Fast forward two years.

    • Still one monolithic server that did everything.

    • At a certain scale of developers and customers it held them back.

    • Problems with dependencies between features. Changes in one place caused deployment of the whole system. Failure in unrelated areas caused system wide downtime.

  • Time to break the system apart.

    • Went with a services approach, but it’s not that easy. How are you going to break functionality apart and into services?

    • Looked at what users are doing in the system and identified three main parts: edit websites, view sites created by Wix, serving media.

    • Editing web sites includes data validation of data from the server, security and authentication, data consistency, and lots of data modification requests.

    • Once finished with the web site users will view it. There are 10x more viewers than editors. So the concerns are now:

      • high availability. HA is the most important feature because it’s the user’s business.

      • high performance

      • high traffic volume

      • the long tail. There are a lot of websites, but they are very small. Every site gets maybe 10 or 100 page views a day. The long tail make caching not the go to scalability strategy. Caching becomes very inefficient.

    • Media serving is the next big service. Includes HTML, javascript, css, images. Needed a way to serve files the 800TB of data under a high volume of requests. The win is static content is highly cacheable.

    • The new system looks like a networking layer that sits below three segment services: editor segment (anything that edits data), media segment (handles static files, read-only), public segment (first place a file is viewed, read-only).

Guidelines For How To Build Services

  • Each service has its own database and only one service can write to a database.

  • Access to a database is only through service APIs. This supports a separation of concerns and hiding the data model from other services.

  • For performance reasons read-only access is granted to other services, but only one service can write. (yes, this contradicts what was said before)

  • Services are stateless. This makes horizontal scaling easy. Just add more servers.

  • No transactions. With the exception of billing/financial transactions, all other services do not use transactions. The idea is to increase database performance by removing transaction overhead. This makes you think about how the data is modeled to have logical transactions, avoiding inconsistent states, without using database transactions.

  • When designing a new service caching is not part of the architecture. First, make a service as performant as possible, then deploy to production, see how it performs, only then, if there are performance issues, and you can’t optimize the code (or other layers), only then add caching.

Editor Segment

  • Editor server must handle lots of files.

  • Data stored as immutable JSON pages (~2.5 million per day) in MySQL.

  • MySQL is a great key-value store. Key is based on a hash function of the file so the key is immutable. Accessing MySQL by primary key is very fast and efficient.

  • Scalability is about tradeoffs. What tradeoffs are we going to make? Didn’t want to use NoSQL because they sacrifice consistency and most developers do not know how to deal with that. So stick with MySQL.

  • Active database. Found after a site has been built only 6% were still being updated. Given this then these active sites can be stored in one database that is really fast and relatively small in terms of storage (2TB).

  • Archive database. All the stale site data, for sites that are infrequently accessed, is moved over into another database that is relatively slow, but has huge amounts of storage. After three months data is pushed to this database is accesses are low. (one could argue this is an implicit caching strategy).

  • Gives a lot of breathing room to grow. The large archive database is slow, but it doesn’t matter because the data isn’t used that often. On first access the data comes from the archive database, but then it is moved to the active database so later accesses are fast.

High Availability For Editor Segment

  • With a lot of data it’s hard to provide high availability for everything. So look at the critical path, which for a website is the content of the website. If a widget has problems most of the website will still work. Invested a lot in protecting the critical path.

  • Protect against database crashes. Want to recover quickly. Replicate databases and failover to the secondary database.

  • Protect against data corruption and data poisoning.  Doesn’t have to be malicious, a bug is enough to spoil the barrel. All data is immutable. Revisions are stored for everything. Worst case  if corruption can’t be fixed is to revert to version where the data was fine.

  • Protect against unavailability. A website has to work all the time. This drove an investment in replicating data across different geographical locations and multiple clouds. This makes the system very resilient.

    • Clicking save on a website editing session sends a JSON file to the editor server.

    • The server sends the page to the active MySQL server which is replicated to another datacenter.

    • After the page is saved to locally, an asynchronous process is kicked upload the data to a static grid, which is the Media Segment.

    • After data is uploaded to the static grid, a notification is sent to a archive service running on the Google Compute Engine. The archive goes to the grid, downloads a page, and stores a copy on the Google cloud.

    • Then a notification is sent back to the editor saying the page was saved to GCE.

    • Another copy is saved to Amazon from GCE.

    • One the final notification is received it means there are three copies of the current revision of data: one in the database, the static grid, and on GCE.

    • For the current revision there are three copies. For old revision there two revisions (static grid, GCE).

    • The process is self-healing. If there’s a failure the next time a user updates their website everything that wasn’t uploaded will be uploaded again.

    • Orphan files are garbage collected.

Modeling Data With No Database Transactions

  • Don’t want a situation where a user edit two pages and only one page is saved in the database, which is an inconsistent state.

  • Take all the JSON files and stick them in the database one after the other. When all the files are saved another save command is issued which contains a manifest of all the IDs (which is hash of the content which is the file name on the static server) of the saved pages that were uploaded to the static servers.

Media Segment

  • Stores lots of files. 800TB of user media files, 3M files uploaded daily, and 500M metadata records.

  • Images are modified. They are resized for different devices and sharpened. Watermarks can be inserted and there’s also audio format conversion.

  • Built an eventually consistent distributed file system that is multi datacenter aware with automatic fallback across DCs. This is before Amazon.

  • A pain to run. 32 servers, doubling the number every 9 months.

  • Plan to push stuff to the cloud to help scale.

  • Vendor lock-in is a myth. It’s all APIs. Just change the implementation and you can move to different clouds in weeks.

  • What really locks you down is data. Moving 800TB of data to a different cloud is really hard.

  • They broke Google Compute Engine when they moved all their data into GCE. They reached the limits of the Google cloud. After some changes by Google it now works.

  • Files are immutable so the are highly cacheable.

  • Image requests first go to a CDN. If the image isn’t in the CDN the request goes to their primary datacenter in Austin. If the image isn’t in Austin the request then goes to Google Cloud. If it’s not in Google cloud it goes to a datacenter in Tampa.

Public Segment

  • Resolve URLs (45 million of them), dispatch to the appropriate renderer, and then render into HTML, sitemap XML, or robots TXT, etc.

  • Public SLA is that response time is < 100ms at peak traffic. Websites have to be available, but also fast. Remember, no caching.

  • When a user clicks publish after editing a page, the manifest, which contains references to pages, are pushed to Public. The routing table is also published.

  • Minimize out-of-service hops. Requires 1 database call to resolve the route. 1 RPC call to dispatch the request to the renderer. 1 database call to get the site manifest.

  • Lookup tables are cached in memory and are updated every 5 minutes.

  • Data is not stored in the same format as it is for the editor. It is stored in a denormalized format, optimized for read by primary key. Everything that is needed is returned in a single request.

  • Minimize business logic. The data is denormalized and precalculated. When you handle large scale every operation, every millisecond you add, it’s times 45 million, so every operation that happens on the public server has to be justified.

  • Page rendering.

    • The html returned by the public server is bootstrap html. It’s a shell with JavaScript imports and JSON data with references to site manifest and dynamic data.

    • Rendering is offloaded to the client. Laptops and mobile devices are very fast and can handle the rendering.

    • JSON was chosen because it’s easy to parse and compressible.

    • It’s easier to fix bugs on the client. Just redeploy new client code. When rendering is done on the server the html will be cached, so fixing a bug requires re-rendering millions of websites again.

High Availability For Public Segment

  • Goal is to be always available, but stuff happens.

  • On a good day: a browser makes a request, the request goes to a datacenter, through a load balancer, goes to a public server, resolves the route, goes to the renderer, the html goes back to the browser, and the browser runs the javascript. The javascript fetches all media files and the JSON data and renders a very beautiful web site. The browser then make a request to the Archive service. The Archive service replays the request in the same way the browser does and stores the data in a cache.

  • On a bad day a datacenter is lost, which did happen. All the UPSs died and the datacenter was down. The DNS was changed and then all the requests went to the secondary datacenter.

  • On a bad day Public is lost. This happened once when a load balancer got half of a configuration so all the Public servers were gone. Or a bad version can be deployed that starts returning errors. Custom code in the load balancer handles this problem by routing to the Archive service to fetch the cached if the Public servers are not available. This approach meant customers were not affected when Public went down, even though the system was reverberating with alarms at the time.

  • On a bad day the Internet sucks. The browser makes a request, goes to the datacenter, goes to the load balancer, gets the html back. Now the JavaScript code has to fetch all the pages and JSON data. It goes to the CDN, it goes to the static grid and fetches all the JSON files to render the site. In these processes Internet problems can prevent files from being returned. Code in JavaScript says if you can’t get to the primary location, try and get it from the archive service, if that fails try the editor database.

Lessons Learned

  • Identify your critical path and concerns. Think through how your product works. Develop usage scenarios. Focus your efforts on these as they give the biggest bang for the buck.

  • Go multi-datacenter and multi-cloud. Build redundancy on the critical path (for availability).

  • De-normalize data and Minimize out-of-process hops (for performance). Precaluclate and do everything possible to minimize network chatter.

  • Take advantage of client’s CPU power. It saves on your server count and it’s also easier to fix bugs in the client.

  • Start small, get it done, then figure out where to go next. Wix did what they needed to do to get their product working. Then they methodically moved to a sophisticated services architecture.

  • The long tail requires a different approach. Rather than cache everything Wix chose to optimize the heck out of the render path and keep data in both an active and archive databases.

  • Go immutable. Immutability has far reaching consequences for an architecture. It affects everything from the client through the back-end. It’s an elegant solution to a lot of problems.

  • Vendor lock-in is a myth. It’s all APIs. Just change the implementation and you can move to different clouds in weeks.

  • What really locks you down is data. Moving lots of data to a different cloud is really hard.

( Via )

For Understanding Microservices

“Microservices” – yet another new term on the crowded streets of software architecture. Although our natural inclination is to pass such things by with a contemptuous glance, this bit of terminology describes a style of software systems that we are finding more and more appealing. We’ve seen many projects use this style in the last few years, and results so far have been positive, so much so that for many of our colleagues this is becoming the default style for building enterprise applications. Sadly, however, there’s not much information that outlines what the microservice style is and how to do it.

In short, the microservice architectural style [1] is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare mininum of centralized management of these services, which may be written in different programming languages and use different data storage technologies.

To start explaining the microservice style it’s useful to compare it to the monolithic style: a monolithic application built as a single unit. Enterprise Applications are often built in three main parts: a client-side user interface (consisting of HTML pages and javascript running in a browser on the user’s machine) a database (consisting of many tables inserted into a common, and usually relational, database management system), and a server-side application. The server-side application will handle HTTP requests, execute domain logic, retrieve and update data from the database, and select and populate HTML views to be sent to the browser. This server-side application is a monolith – a single logical executable[2]. Any changes to the system involve building and deploying a new version of the server-side application.

Such a monolithic server is a natural way to approach building such a system. All your logic for handling a request runs in a single process, allowing you to use the basic features of your language to divide up the application into classes, functions, and namespaces. With some care, you can run and test the application on a developer’s laptop, and use a deployment pipeline to ensure that changes are properly tested and deployed into production. You can horizontally scale the monolith by running many instances behind a load-balancer.

Monolithic applications can be successful, but increasingly people are feeling frustrations with them – especially as more applications are being deployed to the cloud . Change cycles are tied together – a change made to a small part of the application, requires the entire monolith to be rebuilt and deployed. Over time it’s often hard to keep a good modular structure, making it harder to keep changes that ought to only affect one module within that module. Scaling requires scaling of the entire application rather than parts of it that require greater resource.


Figure 1: Monoliths and Microservices

These frustrations have led to the microservice architectural style: building applications as suites of services. As well as the fact that services are independently deployable and scalable, each service also provides a firm module boundary, even allowing for different services to be written in different programming languages. They can also be managed by different teams .

We do not claim that the microservice style is novel or innovative, its roots go back at least to the design principles of Unix. But we do think that not enough people consider a microservice architecture and that many software developments would be better off if they used it.

Characteristics of a Microservice Architecture

We cannot say there is a formal definition of the microservices architectural style, but we can attempt to describe what we see as common characteristics for architectures that fit the label. As with any definition that outlines common characteristics, not all microservice architectures have all the characteristics, but we do expect that most microservice architectures exhibit most characteristics. While we authors have been active members of this rather loose community, our intention is to attempt a description of what we see in our own work and in similar efforts by teams we know of. In particular we are not laying down some definition to conform to.

Componentization via Services

For as long as we’ve been involved in the software industry, there’s been a desire to build systems by plugging together components, much in the way we see things are made in the physical world. During the last couple of decades we’ve seen considerable progress with large compendiums of common libraries that are part of most language platforms.

When talking about components we run into the difficult definition of what makes a component. Our definition is that a component is a unit of software that is independently replaceable and upgradeable.

Microservice architectures will use libraries, but their primary way of componentizing their own software is by breaking down into services. We define libraries as components that are linked into a program and called using in-memory function calls, while services are out-of-process components who communicate with a mechanism such as a web service request, or remote procedure call. (This is a different concept to that of a service object in many OO programs [3].)

One main reason for using services as components (rather than libraries) is that services are independently deployable. If you have an application [4] that consists of a multiple libraries in a single process, a change to any single component results in having to redeploy the entire application. But if that application is decomposed into multiple services, you can expect many single service changes to only require that service to be redeployed. That’s not an absolute, some changes will change service interfaces resulting in some coordination, but the aim of a good microservice architecture is to minimize these through cohesive service boundaries and evolution mechanisms in the service contracts.

Another consequence of using services as components is a more explicit component interface. Most languages do not have a good mechanism for defining an explicit Published Interface. Often it’s only documentation and discipline that prevents clients breaking a component’s encapsulation, leading to overly-tight coupling between components. Services make it easier to avoid this by using explicit remote call mechanisms.

Using services like this does have downsides. Remote calls are more expensive than in-process calls, and thus remote APIs need to be coarser-grained, which is often more awkward to use. If you need to change the allocation of responsibilities between components, such movements of behavior are harder to do when you’re crossing process boundaries.

At a first approximation, we can observe that services map to runtime processes, but that is only a first approximation. A service may consist of multiple processes that will always be developed and deployed together, such as an application process and a database that’s only used by that service.

Organized around Business Capabilities

When looking to split a large application into parts, often management focuses on the technology layer, leading to UI teams, server-side logic teams, and database teams. When teams are separated along these lines, even simple changes can lead to a cross-team project taking time and budgetary approval. A smart team will optimise around this and plump for the lesser of two evils – just force the logic into whichever application they have access to. Logic everywhere in other words. This is an example of Conway’s Law[5]in action.

Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization’s communication structure.

– Melvyn Conway, 1967


Figure 2: Conway’s Law in action

The microservice approach to division is different, splitting up into services organized around business capability. Such services take a broad-stack implementation of software for that business area, including user-interface, persistant storage, and any external collaborations. Consequently the teams are cross-functional, including the full range of skills required for the development: user-experience, database, and project management.


Figure 3: Service boundaries reinforced by team boundaries

One company organised in this way is Cross functional teams are responsible for building and operating each product and each product is split out into a number of individual services communicating via a message bus.

Large monolithic applications can always be modularized around business capabilities too, although that’s not the common case. Certainly we would urge a large team building a monolithic application to divide itself along business lines. The main issue we have seen here, is that they tend to be organised around too many contexts. If the monolith spans many of these modular boundaries it can be difficult for individual members of a team to fit them into their short-term memory. Additionally we see that the modular lines require a great deal of discipline to enforce. The necessarily more explicit separation required by service components makes it easier to keep the team boundaries clear.

Products not Projects

Most application development efforts that we see use a project model: where the aim is to deliver some piece of software which is then considered to be completed. On completion the software is handed over to a maintenance organization and the project team that built it is disbanded.

Microservice proponents tend to avoid this model, preferring instead the notion that a team should own a product over its full lifetime. A common inspiration for this is Amazon’s notion of “you build, you run it” where a development team takes full responsibility for the software in production. This brings developers into day-to-day contact with how their software behaves in production and increases contact with their users, as they have to take on at least some of the support burden.

The product mentality, ties in with the linkage to business capabilities. Rather than looking at the software as a set of functionality to be completed, there is an on-going relationship where the question is how can software assist its users to enhance the business capability.

There’s no reason why this same approach can’t be taken with monolithic applications, but the smaller granularity of services can make it easier to create the personal relationships between service developers and their users.

Smart endpoints and dumb pipes

When building communication structures between different processes, we’ve seen many products and approaches that stress putting significant smarts into the communication mechanism itself. A good example of this is the Enterprise Service Bus (ESB), where ESB products often include sophisticated facilities for message routing, choreography, transformation, and applying business rules.

The microservice community favours an alternative approach: smart endpoints and dumb pipes. Applications built from microservices aim to be as decoupled and as cohesive as possible – they own their own domain logic and act more as filters in the classical Unix sense – receiving a request, applying logic as appropriate and producing a response. These are choreographed using simple RESTish protocols rather than complex protocols such as WS-Choreography or BPEL or orchestration by a central tool.

The two protocols used most commonly are HTTP request-response with resource API’s and lightweight messaging[6]. The best expression of the first is

Be of the web, not behind the web

– Ian Robinson

Microservice teams use the principles and protocols that the world wide web (and to a large extent, Unix) is built on. Often used resources can be cached with very little effort on the part of developers or operations folk.

The second approach in common use is messaging over a lightweight message bus. The infrastructure chosen is typically dumb (dumb as in acts as a message router only) – simple implementations such as RabbitMQ or ZeroMQ don’t do much more than provide a reliable asynchronous fabric – the smarts still live in the end points that are producing and consuming messages; in the services.

In a monolith, the components are executing in-process and communication between them is via either method invocation or function call. The biggest issue in changing a monolith into microservices lies in changing the communication pattern. A naive conversion from in-memory method calls to RPC leads to chatty communications which don’t perform well. Instead you need to replace the fine-grained communication with a coarser -grained approach.

Decentralized Governance

One of the consequences of centralised governance is the tendency to standardise on single technology platforms. Experience shows that this approach is constricting – not every problem is a nail and not every solution a hammer. We prefer using the right tool for the job and while monolithic applications can take advantage of different languages to a certain extent, it isn’t that common.

Splitting the monolith’s components out into services we have a choice when building each of them. You want to use Node.js to standup a simple reports page? Go for it. C++ for a particularly gnarly near-real-time component? Fine. You want to swap in a different flavour of database that better suits the read behaviour of one component? We have the technology to rebuild him.

Of course, just because you can do something, doesn’t mean you should- but partitioning your system in this way means you have the option.

Teams building microservices prefer a different approach to standards too. Rather than use a set of defined standards written down somewhere on paper they prefer the idea of producing useful tools that other developers can use to solve similar problems to the ones they are facing. These tools are usually harvested from implementations and shared with a wider group, sometimes, but not exclusively using an internal open source model. Now that git and github have become the de facto version control system of choice, open source practices are becoming more and more common in-house .

Netflix is a good example of an organisation that follows this philosophy. Sharing useful and, above all, battle-tested code as libraries encourages other developers to solve similar problems in similar ways yet leaves the door open to picking a different approach if required. Shared libraries tend to be focused on common problems of data storage, inter-process communication and as we discuss further below, infrastructure automation.

For the microservice community, overheads are particularly unattractive. That isn’t to say that the community doesn’t value service contracts. Quite the opposite, since there tend to be many more of them. It’s just that they are looking at different ways of managing those contracts. Patterns like Tolerant Reader and Consumer-Driven Contracts are often applied to microservices. These aid service contracts in evolving independently. Executing consumer driven contracts as part of your build increases confidence and provides fast feedback on whether your services are functioning. Indeed we know of a team in Australia who drive the build of new services with consumer driven contracts. They use simple tools that allow them to define the contract for a service. This becomes part of the automated build before code for the new service is even written. The service is then built out only to the point where it satisfies the contract – an elegant approach to avoid the ‘YAGNI’[9] dilemma when building new software. These techniques and the tooling growing up around them, limit the need for central contract management by decreasing the temporal coupling between services.

Perhaps the apogee of decentralised governance is the build it / run it ethos popularised by Amazon. Teams are responsible for all aspects of the software they build including operating the software 24/7. Devolution of this level of responsibility is definitely not the norm but we do see more and more companies pushing responsibility to the development teams. Netflix is another organisation that has adopted this ethos[11]. Being woken up at 3am every night by your pager is certainly a powerful incentive to focus on quality when writing your code. These ideas are about as far away from the traditional centralized governance model as it is possible to be.

Decentralized Data Management

Decentralization of data management presents in a number of different ways. At the most abstract level, it means that the conceptual model of the world will differ between systems. This is a common issue when integrating across a large enterprise, the sales view of a customer will differ from the support view. Some things that are called customers in the sales view may not appear at all in the support view. Those that do may have different attributes and (worse) common attributes with subtly different semantics.

This issue is common between applications, but can also occur withinapplications, particular when that application is divided into separate components. A useful way of thinking about this is the Domain-Driven Design notion of Bounded Context. DDD divides a complex domain up into multiple bounded contexts and maps out the relationships between them. This process is useful for both monolithic and microservice architectures, but there is a natural correlation between service and context boundaries that helps clarify, and as we describe in the section on business capabilities, reinforce the separations.

As well as decentralizing decisions about conceptual models, microservices also decentralize data storage decisions. While monolithic applications prefer a single logical database for persistant data, enterprises often prefer a single database across a range of applications – many of these decisions driven through vendor’s commercial models around licensing. Microservices prefer letting each service manage its own database, either different instances of the same database technology, or entirely different database systems – an approach called Polyglot Persistence. You can use polyglot persistence in a monolith, but it appears more frequently with microservices.


Decentralizing responsibility for data across microservices has implications for managing updates. The common approach to dealing with updates has been to use transactions to guarantee consistency when updating multiple resources. This approach is often used within monoliths.

Using transactions like this helps with consistency, but imposes significant temporal coupling, which is problematic across multiple services. Distributed transactions are notoriously difficult to implement and and as a consequence microservice architectures emphasize transactionless coordination between services, with explicit recognition that consistency may only be eventual consistency and problems are dealt with by compensating operations.

Choosing to manage inconsistencies in this way is a new challenge for many development teams, but it is one that often matches business practice. Often businesses handle a degree of inconsistency in order to respond quickly to demand, while having some kind of reversal process to deal with mistakes. The trade-off is worth it as long as the cost of fixing mistakes is less than the cost of lost business under greater consistency.

Infrastructure Automation

Infrastructure automation techniques have evolved enormously over the last few years – the evolution of the cloud and AWS in particular has reduced the operational complexity of building, deploying and operating microservices.

Many of the products or systems being build with microservices are being built by teams with extensive experience of Continuous Delivery and it’s precursor, Continuous Integration. Teams building software this way make extensive use of infrastructure automation techniques. This is illustrated in the build pipeline shown below.


Figure 5: basic build pipeline

Since this isn’t an article on Continuous Delivery we will call attention to just a couple of key features here. We want as much confidence as possible that our software is working, so we run lots of automated tests. Promotion of working software ‘up’ the pipeline means we automate deployment to each new environment.

A monolithic application will be built, tested and pushed through these environments quite happlily. It turns out that once you have invested in automating the path to production for a monolith, then deploying more applications doesn’t seem so scary any more. Remember, one of the aims of CD is to make deployment boring, so whether its one or three applications, as long as its still boring it doesn’t matter[12].

Another area where we see teams using extensive infrastructure automation is when managing microservices in production. In contrast to our assertion above that as long as deployment is boring there isn’t that much difference between monoliths and microservices, the operational landscape for each can be strikingly different.


Figure 6: Module deployment often differs

Design for failure

A consequence of using services as components, is that applications need to be designed so that they can tolerate the failure of services. Any service call could fail due to unavailability of the supplier, the client has to respond to this as gracefully as possible. This is a disadvantage compared to a monolithic design as it introduces additional complexity to handle it. The consequence is that microservice teams constantly reflect on how service failures affect the user experience. Netflix’s Simian Army induces failures of services and even datacenters during the working day to test both the application’s resilience and monitoring.

This kind of automated testing in production would be enough to give most operation groups the kind of shivers usually preceding a week off work. This isn’t to say that monolithic architectural styles aren’t capable of sophisticated monitoring setups – it’s just less common in our experience.

Since services can fail at any time, it’s important to be able to detect the failures quickly and, if possible, automatically restore service. Microservice applications put a lot of emphasis on real-time monitoring of the application, checking both architectural elements (how many requests per second is the database getting) and business relevant metrics (such as how many orders per minute are received). Semantic monitoring can provide an early warning system of something going wrong that triggers development teams to follow up and investigate.

This is particularly important to a microservices architecture because the microservice preference towards choreography and event collaboration leads to emergent behavior. While many pundits praise the value of serendipitous emergence, the truth is that emergent behavior can sometimes be a bad thing. Monitoring is vital to spot bad emergent behavior quickly so it can be fixed.

Monoliths can be built to be as transparent as a microservice – in fact, they should be. The difference is that you absolutely need to know when services running in different processes are disconnected. With libraries within the same process this kind of transparency is less likely to be useful.

Microservice teams would expect to see sophisticated monitoring and logging setups for each individual service such as dashboards showing up/down status and a variety of operational and business relevant metrics. Details on circuit breaker status, current throughput and latency are other examples we often encounter in the wild.

Evolutionary Design

Microservice practitioners, usually have come from an evolutionary design background and see service decomposition as a further tool to enable application developers to control changes in their application without slowing down change. Change control doesn’t necessarily mean change reduction – with the right attitudes and tools you can make frequent, fast, and well-controlled changes to software.

Whenever you try to break a software system into components, you’re faced with the decision of how to divide up the pieces – what are the principles on which we decide to slice up our application? The key property of a component is the notion of independent replacement and upgradeability[13] – which implies we look for points where we can imagine rewriting a component without affecting its collaborators. Indeed many microservice groups take this further by explicitly expecting many services to be scrapped rather than evolved in the longer term.

The Guardian website is a good example of an application that was designed and built as a monolith, but has been evolving in a microservice direction. The monolith still is the core of the website, but they prefer to add new features by building microservices that use the monolith’s API. This approach is particularly handy for features that are inherently temporary, such as specialized pages to handle a sporting event. Such a part of the website can quickly be put together using rapid development languages, and removed once the event is over. We’ve seen similar approaches at a financial institution where new services are added for a market opportunity and discarded after a few months or even weeks.

This emphasis on replaceability is a special case of a more general principle of modular design, which is to drive modularity through the pattern of change [14]. You want to keep things that change at the same time in the same module. Parts of a system that change rarely should be in different services to those that are currently undergoing lots of churn. If you find yourself repeatedly changing two services together, that’s a sign that they should be merged.

Putting components into services adds an opportunity for more granular release planning. With a monolith any changes require a full build and deployment of the entire application. With microservices, however, you only need to redeploy the service(s) you modified. This can simplify and speed up the release process. The downside is that you have to worry about changes to one service breaking its consumers. The traditional integration approach is to try to deal with this problem using versioning, but the preference in the microservice world is to only use versioning as a last resort. We can avoid a lot of versioning by designing services to be as tolerant as possible to changes in their suppliers.

Are Microservices the Future?

Our main aim in writing this article is to explain the major ideas and principles of microservices. By taking the time to do this we clearly think that the microservices architectural style is an important idea – one worth serious consideration for enterprise applications. We have recently built several systems using the style and know of others who have used and favor this approach.

Those we know about who are in some way pioneering the architectural style include Amazon, Netflix, The Guardian, the UK Government Digital Service,, Forward and The conference circuit in 2013 was full of examples of companies that are moving to something that would class as microservices – including Travis CI. In addition there are plenty of organizations that have long been doing what we would class as microservices, but without ever using the name. (Often this is labelled as SOA – although, as we’ve said, SOA comes in many contradictory forms. [15])

Despite these positive experiences, however, we aren’t arguing that we are certain that microservices are the future direction for software architectures. While our experiences so far are positive compared to monolithic applications, we’re conscious of the fact that not enough time has passed for us to make a full judgement.

Often the true consequences of your architectural decisions are only evident several years after you made them. We have seen projects where a good team, with a strong desire for modularity, has built a monolithic architecture that has decayed over the years. Many people believe that such decay is less likely with microservices, since the service boundaries are explicit and hard to patch around. Yet until we see enough systems with enough age, we can’t truly assess how microservice architectures mature.

There are certainly reasons why one might expect microservices to mature poorly. In any effort at componentization, success depends on how well the software fits into components. It’s hard to figure out exactly where the component boundaries should lie. Evolutionary design recognizes the difficulties of getting boundaries right and thus the importance of it being easy to refactor them. But when your components are services with remote communications, then refactoring is much harder than with in-process libraries. Moving code is difficult across service boundaries, any interface changes need to be coordinated between participants, layers of backwards compatibility need to be added, and testing is made more complicated.

Another issue is If the components do not compose cleanly, then all you are doing is shifting complexity from inside a component to the connections between components. Not just does this just move complexity around, it moves it to a place that’s less explicit and harder to control. It’s easy to think things are better when you are looking at the inside of a small, simple component, while missing messy connections between services.

Finally, there is the factor of team skill. New techniques tend to be adopted by more skillful teams. But a technique that is more effective for a more skillful team isn’t necessarily going to work for less skillful teams. We’ve seen plenty of cases of less skillful teams building messy monolithic architectures, but it takes time to see what happens when this kind of mess occurs with microservices. A poor team will always create a poor system – it’s very hard to tell if microservices reduce the mess in this case or make it worse.

One reasonable argument we’ve heard is that you shouldn’t start with a microservices architecture. Instead begin with a monolith, keep it modular, and split it into microservices once the monolith becomes a problem. (Although this advice isn’t ideal, since a good in-process interface is usually not a good service interface.)

So we write this with cautious optimism. So far, we’ve seen enough about the microservice style to feel that it can be a worthwhile road to tread. We can’t say for sure where we’ll end up, but one of the challenges of software development is that you can only make decisions based on the imperfect information that you currently have to hand.

( Via )


Introducing Proxygen, Facebook’s C++ HTTP framework

We are excited to announce the release of Proxygen, a collection of C++ HTTP libraries, including an easy-to-use HTTP server. In addition to HTTP/1.1, Proxygen (rhymes with “oxygen”) supports SPDY/3 and SPDY/3.1. We are also iterating and developing support for HTTP/2.

Proxygen is not designed to replace Apache or nginx — those projects focus on building extremely flexible HTTP servers written in C that offer good performance but almost overwhelming amounts of configurability. Instead, we focused on building a high performance C++ HTTP framework with sensible defaults that includes both server and client code and that’s easy to integrate into existing applications. We want to help more people build and deploy high performance C++ HTTP services, and we believe that Proxygen is a great framework to do so. You can read the documentation and send pull requests on our Github site.


Proxygen began as a project to write a customizable, high-performance HTTP(S) reverse-proxy load balancer nearly four years ago. We initially planned for Proxygen to be a software library for generating proxies, hence the name. But Proxygen has evolved considerably since the early days of the project. While there were a variety of software stacks that provided similar functionality to Proxygen at the time (Apache, nginx, HAProxy, Varnish, etc), we opted go in a different direction.

Why build our own HTTP stack?

  • Integration.The ability to deeply integrate with existing Facebook infrastructure and tools was a driving factor. For instance, being able to administer our HTTP infrastructure with tools such as Thrift simplifies integration with existing systems. Being able to easily track and measure Proxygen with systems like ODS, our internal monitoring tool, makes rapid data correlation possible and reduces operational overhead. Building an in-house HTTP stack also meant that we could leverage improvements made to these Facebook technologies.
  • Code reuse. We wanted a platform for building and delivering event-driven networking libraries to other projects. Currently, more than a dozen internal systems are built on top of the Proxygen core code, including parts of systems like Haystack, HHVM, our HTTP load balancers, and some of our mobile infrastructure. Proxygen provides a platform where we can develop support for a protocol like HTTP/2 and have it available anywhere that leverages Proxygen’s core code.
  • Scale. We tried to make some of the existing software stacks work, and some of them did for a long time. Operating a huge HTTP infrastructure built on top of lots of workarounds eventually stopped scaling well for us.
  • Features.There were a number of features that didn’t exist in other software (some still don’t) that seemed quite difficult to integrate in existing projects but would be immediately useful for Facebook. Examples of features that we were interested in included SPDY, WebSockets, HTTP/1.1 (keep-alive), TLS false start, and Facebook-specific load-scheduling algorithms. Building an in-house HTTP stack gave us the flexibility to iterate quickly on these features.

Initially kicked off in 2011 by a few engineers who were passionate about seeing HTTP usage at Facebook evolve, Proxygen has been supported since then by a team of three to four people. Outside of the core development team, dozens of internal contributors have also added to the project. Since the project started, the major milestones have included:

  • 2011 – Proxygen development began and started taking production traffic
  • 2012 – Added SPDY/2 support and started initial public testing
  • 2013 – SPDY/3 development/testing/rollout, SPDY/3.1 development started
  • 2014 – Completed SPDY/3.1 rollout, began HTTP/2 development

There are a number of other milestones that we are excited about, but we think the code tells a better story than we can.

We now have a few years of experience managing a large Proxygen deployment. The library has been battle-tested with many, many trillions of HTTP(S) and SPDY requests. Now we’ve reached a point where we are ready to share this code more widely.


The central abstractions to understand in proxygen/lib are the session, codec, transaction, and handler. These are the lowest level abstractions, and we don’t generally recommend building off of these directly.

When bytes are read off the wire, the HTTPCodec stored inside HTTPSession parses these into higher level objects and associates with it a transaction identifier. The codec then calls into HTTPSessionwhich is responsible for maintaining the mapping between transaction identifier and HTTPTransactionobjects. Each HTTP request/response pair has a separate HTTPTransaction object. Finally,HTTPTransaction forwards the call to a handler object which implements HTTPTransaction::Handler. The handler is responsible for implementing business logic for the request or response.

The handler then calls back into the transaction to generate egress (whether the egress is a request or response). The call flows from the transaction back to the session, which uses the codec to convert the higher level semantics of the particular call into the appropriate bytes to send on the wire.

The same handler and transaction interfaces are used to both create requests and handle responses. The API is generic enough to allow both. HTTPSession is specialized slightly differently depending on whether you are using the connection to issue or respond to HTTP requests.



Moving into higher levels of abstraction, proxygen/httpserver has a simpler set of APIs and is the recommended way to interface with proxygen when acting as a server if you don’t need the full control of the lower level abstractions.

The basic components here are HTTPServer, RequestHandlerFactory, and RequestHandler. AnHTTPServer takes some configuration and is given a RequestHandlerFactory. Once the server is started, the installed RequestHandlerFactory spawns a RequestHandler for each HTTP request.RequestHandler is a simple interface users of the library implement. Subclasses of RequestHandlershould use the inherited protected member ResponseHandler* downstream_ to send the response.

HTTP Server

The server framework we’ve included with the release is a great choice if you want to get set up quickly with a simple, fast-out-of-the-box event driven server. With just a few options, you are ready to go. Check out the echo server example to get a sense of what it is like to work with. For fun, we benchmarked the echo server on a 32 logical core Intel(R) Xeon(R) CPU E5-2670 @ 2.60GHz with 16 GiB of RAM,* *varying the number of worker threads from one to eight. We ran the client on the same machine to eliminate network effects, and achieved the following performance numbers.

    # Load test client parameters:
    # Twice as many worker threads as server
    # 400 connections open simultaneously
    # 100 requests per connection
    # 60 second test
    # Results reported are the average of 10 runs for each test
    # simple GETs, 245 bytes of request headers, 600 byte response (in memory)
    # SPDY/3.1 allows 10 concurrent requests per connection

While the echo server is quite simple compared to a real web server, this benchmark is an indicator of the parsing efficiency of binary protocols like SPDY and HTTP/2. The HTTP server framework included is easy to work with and gives good performance, but focuses more on usability than the absolute best possible performance. For instance, the filter model in the HTTP server allows you to easily compose common behaviors defined in small chunks of code that are easily unit testable. On the other hand, the allocations associated with these filters may not be ideal for the highest-performance applications.


Proxygen has allowed us to rapidly build out new features, get them into production, and see the results very quickly. As an example, we were interested in evaluating the compression format HPACK, however we didn’t have any HTTP/2 clients deployed yet and the HTTP/2 spec itself was still under heavy development. Proxygen allowed us to implement HPACK, use it on top of SPDY, and deploy that to mobile clients and our servers simultaneously. The ability to rapidly experiment with a real HPACK deployment enabled us to quickly understand performance and data efficiency characteristics in a production environment. As another example, the internal configuration systems at Facebook continue to evolve and get better. As these systems are built out, Proxygen is able to quickly take advantage of them as soon as they’re available. Proxygen has made a significant positive impact on our ability to rapidly iterate with our HTTP infrastructure.

Open Source

The Proxygen codebase is under active development and will continue to evolve. If you are passionate about HTTP, high performance networking code, and modern C++, we would be excited to work with you! Please send us pull requests on GitHub.

We’re committed to open source and are always looking for new opportunities to share our learnings and software. The traffic team has now open sourced Thrift as well as Proxygen, two important components of the network software infrastructure at Facebook. We hope that these software components will be building blocks for other systems that can be open sourced in the future.

( Via Facebook)

Google Launches Managed Service For Running Docker-Based Applications On Its Cloud Platform


Google today announced the alpha launch of Google Container Engine, a new managed service for building and running Docker container-based applications on its cloud platform.

Docker is probably the hottest technology in developer circles these days — it’s almost impossible to have a discussion with a developer without it coming up — and Google’s Cloud Platform team has decided to go all in on this technology that makes it easier for developers to run distributed applications.

In essence, this new service is a “Cluster-as-a-Service” platform based on Google’s open source Kubernetes project. Kubernetes, which helps developers manage their container clusters, is based on Google’s own work with containers in its massive data centers. In this new service, Kubernetes dynamically manages the different Docker containers that make up an application for the user.

Google says the combination of “fast booting, efficient VM hosts and seamless virtualized network integration” will make its cloud computing service “the best place to run container-based applications.” The company’s competitors would likely argue with that, but none of them offer a similar service at this point.

Google initially launched support for Docker images in May as part of its new Managed VM service. These managed VMs are coming out of Google’s limited alpha — with the addition of auto-scaling support — and with that, developers can now use Docker containers on Google’s platform without having to jump through any hoops. Managed VMs will remain in beta for now, though, which in Google’s new language means there is no access control and that charges may be waived, but there is no SLA or technical support obligation either.

And remember, because this new service is officially in alpha, it isn’t feature-complete and the whole infrastructure could melt down at any minute.


Announcing Kylin: Extreme OLAP Engine for Big Data

We (from Ebay) are very excited to announce that eBay has released to the open-source community our distributed analytics engine: Kylin ( Designed to accelerate analytics on Hadoop and allow the use of SQL-compatible tools, Kylin provides a SQL interface and multi-dimensional analysis (OLAP) on Hadoop to support extremely large datasets.

Kylin is currently used in production by various business units at eBay. In addition to open-sourcing Kylin, we are proposing Kylin as an Apache Incubator project.


The challenge faced at eBay is that our data volume has become bigger while our user base has become more diverse. Our users – for example, in analytics and business units – consistently ask for minimal latency but want to continue using their favorite tools, such as Tableau and Excel.

So, we worked closely with our internal analytics community and outlined requirements for a successful product at eBay:

  1. Sub-second query latency on billions of rows
  2. ANSI-standard SQL availability for those using SQL-compatible tools
  3. Full OLAP capability to offer advanced functionality
  4. Support for high cardinality and very large dimensions
  5. High concurrency for thousands of users
  6. Distributed and scale-out architecture for analysis in the TB to PB size range

We quickly realized nothing met our exact requirements externally – especially in the open-source Hadoop community. To meet our emergent business needs, we decided to build a platform from scratch. With an excellent team and several pilot customers, we have been able to bring the Kylin platform into production as well as open-source it.

Feature highlights

Kylin is a platform offering the following features for big data analytics:

  • Extremely fast OLAP engine at scale: Kylin is designed to reduce query latency on Hadoop for 10+ billion rows of data.
  • ANSI SQL on Hadoop: Kylin supports most ANSI SQL query functions in its ANSI SQL on Hadoop interface.
  • Interactive query capability: Users can interact with Hadoop data via Kylin at sub-second latency – better than Hive queries for the same dataset.
  • MOLAP cube query serving on billions of rows: Users can define a data model and pre-build in Kylin with more than 10+ billions of raw data records.
  • Seamless integration with BI Tools: Kylin currently offers integration with business intelligence tools such as Tableau and third-party applications.
  • Open-source ODBC driver: Kylin’s ODBC driver is built from scratch and works very well with Tableau. We have open-sourced the driver to the community as well.
  • Other highlights: 
  • Job management and monitoring
  • Compression and encoding to reduce storage
  • Incremental refresh of cubes
  • Leveraging of the HBase coprocessor for query latency
  • Approximate query capability for distinct counts (HyperLogLog)
  • Easy-to-use Web interface to manage, build, monitor, and query cubes
  • Security capability to set ACL at the cube/project level
  • Support for LDAP integration

The fundamental idea

The idea of Kylin is not brand new. Many technologies over the past 30 years have used the same theory to accelerate analytics. These technologies include methods to store pre-calculated results to serve analysis queries, generate each level’s cuboids with all possible combinations of dimensions, and calculate all metrics at different levels.

For reference, here is the cuboid topology:



When data becomes bigger, the pre-calculation processing becomes impossible – even with powerful hardware. However, with the benefit of Hadoop’s distributed computing power, calculation jobs can leverage hundreds of thousands of nodes. This allows Kylin to perform these calculations in parallel and merge the final result, thereby significantly reducing the processing time.

From relational to key-value

As an example, suppose there are several records stored in Hive tables that represent a relational structure. When the data volume grows very large – 10+ or even 100+ billions of rows – a question like “how many units were sold in the technology category in 2010 on the US site?” will produce a query with a large table scan and a long delay to get the answer. Since the values are fixed every time the query is run, it makes sense to calculate and store those values for further usage. This technique is called Relational to Key-Value (K-V) processing. The process will generate all of the dimension combinations and measured values shown in the example below, at the right side of the diagram. The middle columns of the diagram, from left to right, show how data is calculated by leveraging MapReduce for the large-volume data processing.



Kylin is based on this theory and is leveraging the Hadoop ecosystem to do the job for huge volumes of data:

  1. Read data from Hive (which is stored on HDFS)
  2. Run MapReduce jobs to pre-calculate
  3. Store cube data in HBase
  4. Leverage Zookeeper for job coordination


The following diagram shows the high-level architecture of Kylin.



This diagram illustrates how relational data becomes key-value data through the Cube Build Engine offline process. The yellow lines also illustrate the online analysis data flow. The data requests can originate from SQL submitted using a SQL-based tool, or even using third-party applications via Kylin’s RESTful services. The RESTful services call the Query Engine, which determines if the target dataset exists. If so, the engine directly accesses the target data and returns the result with sub-second latency. Otherwise, the engine is designed to route non-matching dataset queries to SQL on Hadoop, enabled on a Hadoop cluster such as Hive.

Following are descriptions of all of the components the Kylin platform includes.

  • Metadata Manager: Kylin is a metadata-driven application. The Metadata Manager is the key component that manages all metadata stored in Kylin, including the most important cube metadata. All other components rely on the Metadata Manager.
  • Job Engine: This engine is designed to handle all of the offline jobs including shell script, Java API, and MapReduce jobs. The Job Engine manages and coordinates all of the jobs in Kylin to make sure each job executes and handles failures.
  • Storage Engine: This engine manages the underlying storage – specifically the cuboids, which are stored as key-value pairs. The Storage Engine uses HBase – the best solution from the Hadoop ecosystem for leveraging an existing K-V system. Kylin can also be extended to support other K-V systems, such as Redis.
  • REST Server: The REST Server is an entry point for applications to develop against Kylin. Applications can submit queries, get results, trigger cube build jobs, get metadata, get user privileges, and so on.
  • ODBC Driver: To support third-party tools and applications – such as Tableau – we have built and open-sourced an ODBC Driver. The goal is to make it easy for users to onboard.
  • Query Engine: Once the cube is ready, the Query Engine can receive and parse user queries. It then interacts with other components to return the results to the user.

In Kylin, we are leveraging an open-source dynamic data management framework called Apache Calcite to parse SQL and plug in our code. The Calcite architecture is illustrated below. (Calcite was previously called Optiq, which was written by Julian Hyde and is now an Apache Incubator project.)



Kylin usage at eBay

At the time of open-sourcing Kylin, we already had several eBay business units using it in production. Our largest use case is the analysis of 12+ billion source records generating 14+ TB cubes. Its 90% query latency is less than 5 seconds. Now, our use cases target analysts and business users, who can access analytics and get results through the Tableau dashboard very easily – no more Hive query, shell command, and so on.

What’s next

  • Support TopN on high-cardinality dimension: The current MOLAP technology is not perfect when it comes to querying on a high-cardinality dimension – such as TopN on millions of distinct values in one column. Similar to search engines (as many researchers have pointed out), the inverted index is the reasonable mechanism to use to pre-build such results.
  • Support Hybrid OLAP (HOLAP): MOLAP is great to serve queries on historical data, but as more and more data needs to be processed in real time, there is a growing requirement to combine real-time/near-real-time and historical results for business decisions. Many in-memory technologies already work on Relational OLAP (ROLAP) to offer such capability. Kylin’s next generation will be a Hybrid OLAP (HOLAP) to combine MOLAP and ROLAP together and offer a single entry point for front-end queries.

Open source

Kylin has already been open-sourced to the community. To develop and grow an even stronger ecosystem around Kylin, we are currently working on proposing Kylin as an Apache Incubator project. With distinguished sponsors from the Hadoop developer community supporting Kylin, such as Owen O’Malley (Hortonworks co-founder and Apache member) and Julian Hyde (original author of Apache Calcite, also with Hortonworks), we believe that the greater open-source community can take Kylin to the next level.

We welcome everyone to contribute to Kylin. Please visit the Kylin web site for more details:

To begin with, we are looking for open-source contributions not only in the core code base, but also in the following areas:

  1. Shell Client
  2. RPC Server
  3. Job Scheduler
  4. Tools

For more details and to discuss these topics further, please follow us on twitter @KylinOLAP and join our Google group:!forum/kylin-olap


Kylin has been deployed in production at eBay and is processing extremely large datasets. The platform has demonstrated great performance benefits and has proved to be a better way for analysts to leverage data on Hadoop with a more convenient approach using their favorite tool. We are pleased to open-source Kylin. We welcome feedback and suggestions, and we look forward to the involvement of the open-source community.

( Via )

New Windows Server containers and Azure support for Docker

In June, Microsoft Azure added support for Docker containers on Linux VMs, enabling the broad ecosystem of Dockerized Linux applications to run within Azure’s industry-leading cloud. Today, Microsoft and Docker Inc. are jointly announcing we are bringing the Windows Server ecosystem to the Docker community, through 1) investments in the next wave of Windows Server, 2) open-source development of the Docker Engine for Windows Server, 3) Azure support for the Docker Open Orchestration APIs and 4) federation of Docker Hub images in to the Azure Gallery and Portal.

Many customers are running a mix of Windows Server and Linux workloads and Microsoft Azure offers customers the most choice of any cloud provider. By supporting Docker containers on the next wave of Windows Server, we are excited to make available Docker open solutions across both Windows Server and Linux. Applications can themselves be mixed; bringing together the best technologies from the Linux ecosystem and the Windows Server ecosystem. Windows Server containers will run in your datacenter, your hosted datacenter, or any public cloud provider – and of course, Microsoft Azure.




Windows Server Containers

Windows Server containers provide applications an isolated, portable and resource controlled operating environment. This isolation enables containerized applications to run without risk of dependencies and environmental configuration affecting the application. By sharing the same kernel and other key system components, containers exhibit rapid startup times and reduced resource overhead. Rapid startup helps in development and testing scenarios and continuous integration environments, while the reduced resource overhead makes them ideal for service-oriented architectures.

The Windows Server container infrastructure allows for sharing, publishing and shipping of containers to anywhere the next wave of Windows Server is running. With this new technology millions of Windows developers familiar with technologies such as .NET, ASP.NET, PowerShell, and more will be able to leverage container technology. No longer will developers have to choose between the advantages of containers and using Windows Server technologies.






Windows Server containers in the Docker ecosystem

Docker has done a fantastic job of building a vibrant open source ecosystem based on Linux container technologies, providing an easy user experience to manage the lifecycle of containers drawn from a huge collection of open and curated applications in Docker Hub. We will bring Windows Server containers to the Docker ecosystem to expand the reach of both developer communities.

As part of this, Docker Engine for Windows Server containers will be developed under the aegis of the Docker open source project, where Microsoft will participate as an active community member. Windows Server container images will also be available in the Docker Hub alongside the 45,000 and growing Docker images for Linux already available.

Finally, we are working on supporting Docker client natively on Windows Server. As a result, Windows customers will be able to use the same standard Docker client and interface on multiple development environments.



You can find more about Microsoft’s work with the Docker open source project on the MS Open Tech blog here.

Docker on Microsoft Azure

Earlier this year, Microsoft released Docker containers for Linux on Azure, offering the first enterprise-ready version of the Docker open platform on Linux Virtual Machines on Microsoft Azure, leveraging the Azure extension model and Azure Cross Platform CLI to deploy the latest and greatest Docker Engine on each requested VM. We have seen lots of excitement from customers deploying Docker containers in Azure as part of our Linux support.

As part of the announcement today, we will be contributing support for multi-container Docker applications on Azure through the Docker Open Orchestration APIs. This will enable users to deploy Docker applications to Azure directly from the Docker client. This results in a dramatically simpler user experience for Azure customers; we are looking forward to demonstrating this new joint capability at Docker’s Global Hack Day as well as at the upcoming Microsoft TechEd Europe conference in Barcelona.

Furthermore, we hope to energize Windows Server and Linux customers by integrating Docker Hub into the Azure Gallery and Management Portal experience. This means that Azure customers will be able to interact directly with repositories and images on Docker Hub, enabling rich composition of content both from the Azure Gallery and Docker Hub.

In summary, today we announced a partnership with Docker Inc. to bring Windows Server to the Docker ecosystem and improve Azure’s support for the Docker Engine and Orchestration APIs and to integrate Docker Hub with the Azure Gallery and Management Portal.

Azure is placing a high priority on developer choice and flexibility including first-class support for Linux and Windows Server. This expanded partnership builds on the Azure’s current support for Docker on Linux and will bring the richness of the Windows Server and .NET ecosystem to the Docker community. It is an exciting time to be in the Azure cloud!

( Via )