Server-Side Architecture. Front-End Servers And Client-Side Random Load Balancing


Chapter by chapter Sergey Ignatchenko is putting together a wonderful book on the Development and Deployment of Massively Multiplayer Games, though it has much broader applicability than games. Here’s a recent chapter from his book.

Enter Front-End Servers

[Enter Juliet]
Thou art as sweet as the sum of the sum of Romeo and his horse and his black cat! Speak thy mind!
[Exit Juliet]

— a sample program in Shakespeare Programming Language



Our Classical Deployment Architecture (especially if you do use FSMs) is not bad, and it will work, but there is still quite a bit of room for improvement for most of the games out there. More specifically, we can add another row of servers in front of the Game Servers, as shown on Fig VI.8:


As you see, compared to the Classical Deployment Architecture (see Fig VI.4 above) we’ve just added a row of Front-End Servers in front of our Game Servers. These additional Front-End Servers are intended to deal with all the communication stuff when it comes from the clients. All those pesky “whether the player is connected or not” questions (including keep-alive handing where applicable, see Chapter [[TODO]] for details on keep-alives), all that client-to-server encryption (if applicable), with all those keys etc., all those rather more-or-less strange reliable-UDP protocols (again, if applicable), and of course, routing messages between the clients and different Game Servers – all the communication with clients is handled here.


In addition, usually these Front-End servers store a copy of relevant Game Worlds when it is necessary, and are acting as “concentrators” for the game world updates

 In addition, usually these Front-End servers store a copy of relevant Game Worlds when it is necessary, and are acting as “concentrators” for the game world updates; i.e. even if a Game Server has 100’000 people watching some game (like final of some tournament or something), it will need to send updates only to a few Front-End servers, and Front-End servers will take care of data distribution to all the 100’000 people. This ability comes really handy when you have some kind of Big Final game, with thousands of people willing to watch it (and you don’t really need to make it a video broadcast, which is not-so-convenient for existing players and damn expensive, but you can do it right within your client). More on it below, and implementation of this “concentrator” paradigm is discussed in more detail in Chapter [[TODO]].

We’ll discuss the implementation of our Front-End servers a bit later, but for now let’s note that most importantly,

Front-End Servers MUST Be Easily Replaceable Without Significant Inconveniences To Players

That is, if any of Front-End Servers fails for whatever reason – the most a player should see, is a disconnect for a few seconds. While still disruptive, it is very much better than scenarios such as “the whole game world went down and we need to restore it from backup”. In other words, whenever Front-End server crashes for whatever reason, all the clients who were connected there, need to detect the crash (or even worse, “black hole”) and automagically reconnect to some other Front-End server; in this case all the player can see, is a momentarily disconnect (which is also a nuisance, but is much better than to see your game hang).

Front-End Servers: Benefits

Whenever we’re adding another layer of complexity, there is always a question “Do we really need it?” From what I’ve seen, having easily replaceable Front-End Servers in front of your Game Servers is very valuable and provides quite a few benefits. More specifically:

  • Front-End Servers take some load off your Game Servers, while being easily replaceable
    • it means that you can have less Game Servers
      • this, combined with the observation that Front-End Servers are easily replaceable, means that you improve reliability of your site as a whole; instances when some of your Game World Servers go down, will occur more rarely (!)
    • having a copy of relevant game world(s) on your Front-End Servers, takes even more load off your Game Servers, and makes Game Server load independent on the number of observers
  • you can use really cheap boxes for your Front-End Servers; strictly speaking, you don’t even need ECC and RAID for them (and you certainly do need them for your Game Servers). As noted above, Front-End Servers are easily replaceable, so if one goes down – its load is automagically redistributed among the others (see Chapter [[TODO]] for further details). If you’re going to deploy into the cloud – you may want to consider cheaper offers for your front-end servers (even if they’re coming from different CSP).1
  • they allow your client to have a single connection point to the whole site; benefits of this approach include better control over player’s “last mile” so that priorities between different data streams can be controlled, eliminating difficult-to-analyze “partial connections”, and hiding more implementation details of your site from the hostile world outside; more on single client connection in Chapter [[TODO]]
  • they allow for trivial client-side load balancing (no hardware load balancers needed, etc. etc.), more discussion on the load balancing below in “On Client-Side Load Balancing and Law of Big Numbers” section below
  • having a copy of relevant game world(s) on your Front-End Servers allows to have virtually unlimited number of observers who want to watch some of the games being played on your site (such as a Big Final or something2) Best of all, this will happenwithout affecting game server’s performance (!). Moreover, usually you won’t need to organize anything for your Big Final, the system (if built properly) can take care of it itself, in (roughly) the following manner:
    • whenever somebody comes to watch a certain game, his client requests this game from the Front End Server
    • if Front End Server doesn’t have a copy of the requested game, it requests it from the relevant Game Server, alongside with updates to the game world state
    • from this point on, Front End Server will keep an “in-sync” copy of the game world, providing it (with updates) to all the clients which have requested it
    • it means that from this point on, even if you have 100’000 observers watching some game on this Game Server, all the additional load is handled by your Front-End Servers, without affecting your Game Server
    • for further details, see Chapter [[TODO]].
  • Front-End Servers allow for better security later on (acting essentially as a kind of DMZ, see Chapter [[TODO]] for details).

1 keep in mind that you still need top-notch connectivity

2 and as Big Finals are a good way to attract attention, this does provide you an edge over your competitors, etc. etc.


Front-End Servers: Latencies And Inter-Player Latency Differences


You can have processing time of your Front End server application-layer of the order of single-digit microseconds.

 As for the negative side of having Front End Servers, I can think only of two such drawbacks. The first one is additional latency introduced by your Front End Server. More specifically, we’re speaking about the time which is necessary for the packet incoming from a client at application layer, to get processed by your Front End Server, to go into TCP stack on Front End Server side,3 to get out of TCP stack on Game Server side, and to reach application layer in your Game Server (plus the time necessary to go in the opposite direction).

Let’s take a look at this additional latency. From my experience, if you’re using a reasonably good communication layer library, you can have processing time of your Front End server application-layer of the order of single-digit microseconds.4 Then, we have an end-to-end TCP connection from your Front End Server to your Game Server; latencies of such a connection (over 10GB Ethernet) have been measured at around 8 µs [Larsen2007]. Adding these two delays together and multiplying it by two to get RTT, would mean that we’re still staying well below 100 µs. However, there are some further considerations (such as switch delays, differences between different operating systems, differences between games, etc.) which make me uncomfortable to say that you will have no problem achieving 100 µs delay (i.e. either you may, or you may not). On the other hand, I am ready to say that if you’re careful enough with your implementation, reducing the delay introduced by Front-End Servers, down to 1ms is achievable in all but most weird cases.

To summarize:

  • if additional latency of around 1 millisecond is ok for you – don’t worry about additional latencies and go for Front-End Servers; this certainly covers all genres with the only potential exception being MMOFPS
  • if additional latency you can live with, is well below 1 millisecond (which is difficult for me to imagine as it is still over an order of magnitude less that 1/60 sec frame update time, but in MMOFPS world pretty much anything can happen) – think about it a bit more and try to find out (ideally – experimentally) what kind of latency you can achieve in practice; if your experiments show that latencies are indeed unacceptable, you MIGHT need to drop those Front-End Servers because of the latency they’re introducing5
  • YMMV, no warranties of any kind, batteries not included

The second (IMHO more theoretical, but as usual, YMMV) potential issue with having Front-End Servers would arise if some of your Front-End Servers are overloaded (or they’re running using significantly different hardware), so those players connected to less-loaded Front-End Servers, will have lower latencies, and therefore will have an advantage.


If/when such inter-player latency becomes a real problem, you MAY need to implement some kind of affinity for players of certain Game Worlds to certain Front-End servers

 On the one hand, I didn’t see situations where it makes any practical difference in real-world deployments (i.e. as I’ve seen it, if some of the Front-End Servers are overloaded, it means that most of the other ones are already at 90%+ of capacity, which you should avoid anyway; see [[TODO!]] section for further discussion of load balancing). On the other hand, YMMV and in theory you might get hit by such an effect (though I certainly don’t see it coming into play for anything but MMOFPS).

If such inter-player latency differences become the case (and only when/if it becomes a real problem), you MAY need to implement some kind of affinity for players of certain Game Worlds to certain Front-End servers (more on affinity in “On Affinity” section below). However, keep in mind that large-scale affinity tends to remove most of the benefits provided by Front-End Servers, so if you feel that you’re going to implement affinity for each-and-every-game – you’ll probably be better without Front-End Servers (implementing affinity only for a small percentage of your games, such as “high profile tournaments” will cause less trouble, see “On Affinity” section below for further discussion).

3 yes, I’m arguing for TCP connections for inter-server communications in most cases, see “On Inter-Server Communication” section above. On the other hand, UDP is also possible if you really really prefer it.

4 note that this might become a non-trivial exercise, see further discussion in Chapter [[TODO]]. On the other hand, I’ve done it myself.

5 in theory, you may also want to experiment with something like Infiniband, which BTW would fit nicely in overall QnFSM architecture with communications neatly isolated from the rest of the code, but most likely it won’t be worth the trouble

Client-Side Random Balancing And Law Of Big Numbers

As soon as you have several Front-End servers where your clients are coming, you have a question “how to ensure that all the Front-End Servers are loaded equally”, i.e. a typical load balancing question. Load balancing in general is quite a big topic at least over last 20 years. Three most common techniques out there are the following: DNS Round-Robin, Client-Side Random Balancing, and Server-Side (usually hardware-based) Load Balancers. With the industry producing those hardware boxes behind the last one, there is no wonder that it becomes more and more popular at least in the enterprise world. Still, let’s take a closer look at these load balancing solutions.

DNS Round-Robin


one of these returned IPs can get cached by a Big Fat DNS server, and then get distributed to many thousands of clients

 DNS round-robin is based on a traditional DNS requests. Whenever a client requests address to be resolved into IP address, a DNS request is sent (this stands with or without DNS round-robin) to your (or “your DNS provider’s”) DNS server. If DNS server is configured for DNS round-robin, it returns different IP addresses to different DNS requests, in a round-robin fashion6 hence the name.

DNS Round-Robin, when applied to balancing browsers across different web servers, has two major disadvantages. First of all, there is a problem with caching DNS servers along the path of the request (which is a very standard part of DNS handling). That is, even if your server is faithfully returning all your IPs in a round robin fashion, one of these returned IPs can get cached by a Big Fat DNS server (think Comcast or AT&T), and then get distributed to many thousands of clients; in this case distribution of your clients across your servers will be skewed towards that “lucky” IP which got cached by the Big Fat DNS server :-( . The second problem with using DNS round-robin for web servers, if that if one of your servers is down, usual web browser won’t try another server on the list, so usually in web server realm round-robin DNS doesn’t provide server fault tolerance.

Fortunately, as we DO have a client, we can solve both these problems very easily. Moreover, these techniques will also work for your browser-based games (that is, after you’ve got your JS loaded and it started execution).

6 strictly speaking, it is a little bit more complicated than that, as DNS packets contain a list of servers, but as virtually everybody out there ignores all the entries in returned packet except for the very first one, it is more or less equivalent to returning only one IP per request – that is, unless you have your own client which can do the choice itself, see “Client-Side Balancing”


Client-Side Random Balancing


Client simply takes random item from the IP list, and tries connecting to this randomly chosen IP.

 To improve on DNS round-robin, a very simple idea can be used. We won’t rotate anything on the server side; instead, we will distribute exactly the same list of servers to all the clients. This list may be hardcoded into your clients (and that’s what I’ve used personally with big success), or the list can be distributed via DNS as a simple list of IPs for desired name (and retrieved on client via getaddrinfo() or equivalent). Which way to prefer – doesn’t matter to us now, but we’ll discuss relevant issues in Chapter [[TODO]].

As soon as the client gets the list of IPs, everything is very simple. Client simply takes random item from the IP list, and tries connecting to this randomly chosen IP. If connection attempt is unsuccessful (or connection is lost, etc.) – client gets another random item from the list and tries connecting again.

One note of caution – while you don’t really need a cryptographic-quality random generator to choose the IP from the list, you DO want to avoid situations when your random number generator (the one used for this purpose) is essentially just some function of coarse-grained time. One Really Bad example would be something like

int myrand() {//DON'T DO THIS!
  return rand();

In such a case, if you get mass disconnect (and as a result all your players will attempt to reconnect at about the same time), your IP distribution will likely get skewed due to too few differences between the clients trying to get their IP addresses; if all the clients attempt to connect within 5 seconds, with such a bad myrand() function you’ll get at most 5 different IPs (less if you’re unlucky). Other than such extremely bad cases, pretty much any RNG should be fine for this purpose. Even a trivial linear congruential generator, seeded with time(0) at the moment when the program was launched (and NOT at the moment of request, as in example above), should do in practice, though adding some kind of milliseconds or some other randomly looking or client-specific data to the mix is advisable “just in case”.

Law of Large Numbers. According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.— Wikipedia —

Client-Side Random Balancing: A Law Of Large Numbers, And Comparison With DNS Round-Robin

Unlike DNS round-robin (which in theory provides “ideal” balancing), client-side random balancing relies on the statistical Law of Large Numbers to achieve flat distribution of clients between the servers. What the law basically says is that for independent measurements, the more experiments you’re performing – the more flat distribution you’ll get. [[TODO!: add stuff about binomial distribution, and an example]]

In practice, despite being “non-ideal” in theory, client-side random balancing achieves much more flat distribution than DNS round-robin. The reason for it is two-fold. First, as soon as the number of clients is large (hundreds and up), client-side random balancing becomes sufficiently flat for practical purposes (and if your system is provisioned for thousands of players, and only a few have came yet – the distribution won’t be too flat, but the inequality involved won’t be able to hurt, and the balance will improve as the number grows). On the positive side, however, client-side random balancing doesn’t suffer from DNS caching issue described above. Even if you’re using DNS to distribute IP lists (and this list gets cached) – with client-side balancing all the IP lists circulating in the system are identical by design, so caching (unlike with DNS round-robin) doesn’t change client distribution at all.

To summarize: personally, I would be very cautious to use DNS Round-Robin for production load balancing. On the other hand, I’ve seen Client-Side Random Balancing to work extremely well for a game which grew from a few hundreds of simultaneous players into hundreds of thousands; it worked without any problems whatsoever, providing almost-perfect balancing all the time. That is, if the average load across the board was 50%, you could find some servers at 48% and some at 52%, but not more than that.7

As for the second disadvantage mentioned above for DNS Round-Robin as applied to web browsers (which was inability of most of the browsers to provide fault tolerance in case when one of the servers crashes) – this evaporates as soon as we have the whole list on the client-side, can detect failure, and can select another item from the list.

7 this, of course, stands only when you have run your servers identically for sufficient time; if one of the servers has just entered service, it will take some hours until it reaches the same load level than the others. If really necessary, this effect can be mitigated, though mitigation is rather ugly and I’ve never seen it necessary in practice

 Server-Side Load Balancers

An approach which is very different from both round-robin DNS and client-side random balancing, is to use server-side load balancers. Load balancer is usually an additional box, sitting in front of your servers, and doing, as advertised, load balancing.


These additional balancing capabilities are usually completely unnecessary for games (where Law of Large Numbers tends to stand very firmly)

 Server-side load balancers do have significantly more balancing capabilities with regards to scenarios when different clients cause very different loads (so that server-side balancers can work even if the Law of Large Numbers doesn’t work anymore). However, on the one hand, these additional balancing capabilities are usually completely unnecessary for games (where Law of Large Numbers tends to stand very firmly), and on the other hand, such load balancer boxes tend to be damn expensive (double that if you want redundancy, and you certainly want it), they do not allow inter-datacenter balancing and fault tolerance (by design), and they introduce additional not-so-well-controlled latencies.8

Oh, and BTW – when speaking about redundancy and the cost of their boxes, quite a few hardware manufacturers will tell you “hey, you can use our balancer in active/active configuration, so you won’t waste anything!”. Well, while you can indeed use many server-side load balancers in active/active configuration, you still MUST have at least one redundant box to handle the load if one of those boxes fails. In other words, if all you have is two boxes in active/active configuration, when both are working, overall load on each of them MUST be well below 50%, there is no way around it if you want redundancy.

As a result of all the considerations above, for game load-balancing purposes I have never seen any practical uses for server-side load balancer boxes (as always, YMMV and batteries are not included). Even if you’re using Web-Based Deployment Architecture (in the way described above), you should be able to stay away from them (though YMMV even more).

8 most of load balancers are designed to balance web sites where anything below 100ms is pretty much nothing, so at the very least make sure to discuss and measure the latency (under your kind of load!) before buying such a box

Balancing Summary

From my experience, client-side random balancing (aimed towards front-end servers) worked really good, and I’ve never seen any reasons to use something different. Round-robin DNS is almost universally inferior to client-side balancing, and hardware-based server-side balancers are too complicated and expensive, usually without any real reason to use them in gaming environment. As note above, one exception when you MAY need server-side balancers, is if you’re using Web-Based Deployment Architecture.

CDN A content delivery network or content distribution network (CDN) is a globally distributed network of proxy servers deployed in multiple data centers— Wikipedia —One last word about load balancing: it is possible to use more than one of the methods listed here (and it might even work for you); however, implications of such combined use of more than one method of load balancing, are way too convoluted to discuss them in this book.

Front-End Servers As A CDN

It is possible to use Front-End Servers as a kind of CDN (or even use them to build your own CDN). Even if you’re running all your Game Servers from one single datacenter, for certain kinds of games it might be a good idea to have your Front-End Servers sitting in different datacenters (and acting as different “entry points” to your clients), as shown on Fig VI.9:



The idea here is pretty much like the one behind classical CDN: to reduce latencies for end-users. On the other hand, we need to note that

unlike classical CDN, the content with our game-sorta-CDN is not static, so gain in latencies is possible only because of better peering, with gains usually being in single-digit milliseconds

There is still a different reason to use such deployment architectures – in case if you want to protect yourself from Internet connectivity in your primary datacenter going down (provided that “Some Connectivity” survives); in practice, if you have a decent datacenter, it should never happen. More precisely – your datacenter WILL occasionally experience transient faults of around 1.5-2 minutes long (typical BGP convergence time), so if you’re looking for excuses to use this nice diagram on Fig VI.9 and your client can detect the fault and redirect to a different datacenter significantly faster than that, it MAY make some difference to your players.

Implementation-wise, there are several considerations for such CDN-like multi-datacenter Front-End Server configurations:

  • you MUST have very good connectivity between your data centers (“some connectivity” on Fig. VI.9). At the very least, you should have inter-ISP peering explicitly set by both of your ISPs (to each other) to ensure the best data flow for this critical path
    • strictly speaking, “some connectivity” does not necessarily need to be Internet-based; you often can save additional few milliseconds by getting something like “dedicated” Frame-Relay between your datacenters, but this will likely cost you in the range of tens of thousands per month :-( .
  • traffic on “some connectivity” can be an order (or even two) of magnitude lower than that going to the clients due to Front-End Servers acting as “concentrators”
  • you SHOULD account for secondary datacenter to go down (in particular, in case of inter-datacenter connectivity going down). The simplest way to deal with it is to have enough capacity in your primary datacenter (both traffic-wise and CPU-wise) to handleall of your clients, but this tends to be expensive. As an alternative, shutting down some activities in case of such a failure may be possible depending on specifics of your game.

Bottom line for CDN-like arrangements. CDN-like arrangements of Front-End Servers may save some of your players a few milliseconds in latency (that is, if you have a really good connection between datacenters), which in turn may allow to level the field a bit with regards to latency. From my experience, it was hardly worth the trouble (because you cannot really improve MUCH in terms of latency, as the packets still need to go all the way to the Game Server and back), but keep the possibility in mind. For example, it may come handy in some really strange scenarios when you’re legally required to keep your game servers in a strange location (hey casino guys!) where you simply don’t have enough bandwidth to serve your clients directly.

Front-End Servers + Game Servers As A Kinda-CDN

On the other hand, if you’re really concerned about latencies, it is usually much better to bring your Game World Servers closer to players (while leaving DB Server behind), as shown on Fig VI.10:


Here, we’re moving the most time-critical stuff (which is usually your Game World Servers) towards the end-user, providing significantly better latencies to those players who’re in the vicinity of corresponding datacenter. Maintaining such infrastructure is quite a Big Headache, but is doable, so if you’re really concerned about latencies – you may want to deploy in such a manner. A word of caution – if going this way, you will end up with “regional servers”, which have their own share of troubles (you’ll need to ensure that clients in the region go only to the relevant Front-End Servers, security on inter-datacenter connections becomes quite an issue, etc., etc.); once again – it is doable, but go this way only if youreally need it.

On Affinity

In some cases, you may decide that you need to have a kind of “affinity” so that some specific players (usually those playing in a specific game world) are coming to specific Front-End Servers.


The things will go smoothly as long as the number of the game worlds which use affinity is small.

 Note when we’re speaking about our Front-End Servers, “affinity” is quite different from classical affinity (usually referred to as “persistence” or “stickiness”) used on load balancers for web servers. In the web world persistence/stickiness is about having the same client coming to the same server (to deal with sessions and per-client caches). For our Front-End Servers, however, affinity has a very different motivation, and is usually about Front-End-Server-to-game-world affinity (for players or for players+observers) rather than client-to-server affinity (see “Front-End Servers: Latencies and Inter-Player Latency Differences” section above for one reason where you MIGHT need such affinity).

Technically, implementing Front-End-Server-to-game-world-affinity is not that difficult, but the real problems will start after you deploy your affinity. In short – the things will go smoothly as long as the number of the game worlds which use affinity is small. On the other hand, as soon as you have a significant chunk of your players connected using the affinity rules, you will find that achieving reasonable load balance between different Front-End Servers becomes difficult :-( . When there is no affinity, the balance is near-perfect just because of the Law of Large Numbers; as you’re introducing the affinity rules, you’re starting to skew this near-perfectly-flat distribution, and the more players are affected by affinity, the more you’re deviating from the ideal distribution, so managing those rules while achieving load balance can become a Big Fat Challenge.

Bottom line: avoid affinity as long as possible (and most likely you will be able to get away without it).

Front-End Servers: Implementation

Now let’s discuss ways how our Front-End Servers can be implemented. As mentioned above, the key property of our Front-End Servers is that they’re easily replaceable in case of failure. To achieve this behavior,

you MUST ensure that there is NO original game-world state on any of your Front-End Servers. In other words, Front-End Servers should have only a replica of the original game-world state, with the original game-world state kept by Game Servers

There is no need to worry too much about it if you’re using a generic subscriber/publisher (or state replication) kind of stuff, but be extremely careful if you’re introducing any custom logic to your Front-End Servers, because you may lose the all-important “easily replaceable” property above. See Chapter [[TODO]] for further discussion of this potential issue.

Front-End Servers: QnFSM Implementation

One implementation of the Front-End Server implemented under pure Queues-and-FSMs architecture (see Chapter V for details on QnFSM, state machines, and queues) is shown on Fig VI.10:



Here, we have TCP- and UDP-related threads similar to those described in “Implementing Game Servers under QnFSM architecture” section above with regards to Game Servers, and one or more of Routing&Data Threads (with at least one Routing&Data FSM each), which are responsible for routing of all the packets, and for caching the data (such as “game world” data). Let’s discuss these routing-related FSMs in a bit more detail.

Routing&Data FSMs. Each of Routing&Data FSMs has its own data that it handles (and updates if applicable). For example, one such Routing&Data FSM may contain a state of one game world. Other Routing&Data FSMs may handle routing of the point-to-point packets from players to (and from) one specific Game Server. Further details of the data types handled by Routing&Data FSMs will be discussed in Chapter [[TODO]], but generally there will be three different types of Routing&Data FSMs:

  • generic connection handlers (to handle point-to-point communications including player input and server-to-server connections)
  • generic publisher/subscriber handlers (to cache and handle generic but structured data such as a list of available games, if players are allowed to select the game)
  • specific game world handlers (to cache and handle game world data if the required functionality doesn’t fit into generic handler). In many cases you’ll be able to live without specific game world handlers, but if you want to implement some kind of server-side filtering, like server-side fog-of-war to avoid sending data to those players who shouldn’t see it (so no hack of the client can possibly lift fog-of-war) – specific game world handlers become a necessity.


It is possible (and often advisable) to have more than one Routing&Data FSM within single Routing&Data Thread

 It is possible (and often advisable) to have more than one Routing&Data FSM within single Routing&Data Thread to reduce unnecessary load due to an exceedingly high number of threads (and unnecessary thread context switches). How to combine those Routing&Data FSMs into specific threads – depends on your game significantly, but usually generic connection handlers are extremely fast and all of them can be combined in one thread. As for generic publisher/subscriber and specific game world handlers, their distribution into different threads should take into account typical load and allowed latencies. The rule of thumb is (as usual) the following: the more FSMs per thread – the more latency and the less thread-related overhead; unfortunately, the rest depends too much on specifics of your game to discuss it here.

Routing&Data Factory Thread. Routing&Data Factory Thread is responsible for creating Routing&Data Threads (and Routing&Dara FSMs), according to requests coming from TCP/UDP threads. A typical life cycle of Routing&Data FSM may look as follows:

  • One of TCP/UDP FSMs needs to route some message (or to provide synchronization to some state), and realizes that it has no data on Routing&Data FSM, which it needs to route the message to, in its own cache.
  • TCP/UDP FSM sends a request to Routing&Data Factory FSM
  • Factory FSM creates Routing&Data Thread (with an appropriate Routing&Data FSM)
  • Factory FSM reports ID of the Queue, where the messages towards appropriate Routing&Data FSM should be sent, back to the requesting TCP/UDP Thread
    • TCP/UDP FSM (the one mentioned above) sends the message to the appropriate Queue (using ID rather than pointer to enable deterministic “recording”/”replay”, see Chapter V for details).
  • Whenever the Routing&Data FSM is no longer necessary for its purposes, TCP/UDP FSM reports it to the Factory FSM
    • if it was the last TCP/UDP FSM which needs this Routing&Data FSM, Factory FSM may instruct appropriate Routing&Data Thread to destroy the Routing&Data FSM

Routing&Data FSMs In Game Servers And Clients

I need to confess that personally I am positively in love these Routing&Data FSMs. I lovethem so much that I usually have not only on Front-End Servers, but also on Game Servers, and on Clients too; while they’re not strictly necessary there (and are not shown on appropriate diagrams to avoid unnecessary clutter), they did help me to simplify things quite a bit, making all the communications very uniform. Still, it is pretty much your choice if you want to have Routing&Data stuff on your Game Servers and/or Clients.

Front-End Servers Summary


“As a rule of thumb, Front-End Servers are a Good Thing™.

To summarize the section on Front-End Servers:

  • As a rule of thumb, Front-End Servers are a Good Thing™. In particular:
    • they take the load off your Game Servers
      • which often makes the system cheaper (as Front-End Servers are cheap)
      • and also improves overall system reliability (as Front-End Servers are easily replaceable)
    • they facilitate single client connection (which is generally a good thing to have, see Chapter [[TODO]] for further discussion)
    • they facilitate client-side load balancing
    • they allow to handle 100’000+ observers for your Big Event easily (actually, the sky is the limit)
    • their drawbacks are pretty much limited to the additional latency, and this additional latency is firmly in sub-millisecond range
  • Client-side load balancing usually is the best one for games
    • one potential exception is Web-Based Deployment Architectures, where you MAY need server-side balancers
    • large-scale affinity is to be avoided
  • CDN-like arrangements are possible, but not without caveats
  • Front-End Servers can (and IMHO SHOULD) be implemented in QnFSM architecture, as described above


CloudI: Bringing Erlang’s Fault-Tolerance to Polyglot Development

Clouds must be efficient to provide useful fault-tolerance and scalability, but they also must be easy to use.

CloudI (pronounced “cloud-e” /klaʊdi/) is an open source cloud computing platform built in Erlang that is most closely related to the Platform as a Service (PaaS) clouds. CloudI differs in a few key ways, most importantly: software developers are not forced to use specific frameworks, slow hardware virtualization, or a particular operating system. By allowing cloud deployment to occur without virtualization, CloudI leaves development process and runtime performance unimpeded, while quality of service can be controlled with clear accountability.

What makes a cloud a cloud?

The word “cloud” has become ubiquitous over the past few years. And its true meaning has become somewhat lost. In the most basic technological sense, these are the properties that a cloud computing platform must have:

And these are the properties that we’d like a cloud to have:

  • Easy integration
  • Simple deployment

My goal in building CloudI was to bring together these four attributes.

It’s important to understand that few programming languages can provide real fault-tolerance with scalability. In fact, I’d say Erlang is roughly alone in this regard.

I began by looking at the Erlang programming language (on top of which CloudI is built). The Erlang virtual machine provides fault-tolerance features and a highly scalable architecture while the Erlang programming language keeps the required source code small and easy to follow.

It’s important to understand that few programming languages can provide real fault-tolerance with scalability. In fact, I’d say Erlang is roughly alone in this regard. Let me take a detour to explain why and how.


What is fault-tolerance in cloud computing?

Fault-tolerance is robustness to error. That is, fault-tolerant systems are able to continue operating relatively normally even in the event of (hopefully isolated) errors.

Here, we see service C sending requests to services A and B. Although service B crashes temporarily, the rest of the system continues, relatively unimpeded.

Erlang tutorial for beginners

Erlang is known for achieving 9x9s of reliability (99.9999999% uptime, so less than 31.536 milliseconds of downtime per year) with real production systems (within the telecommunications industry). Normal web development techniques only achieve 5x9s reliability (99.999% uptime, so about 5.256 minutes of downtime per year), if they are lucky, due to slow update procedures and complete system failures. How does Erlang provide this advantage?

The Erlang virtual machine implements what is called an “Actor model”, a mathematical model for concurrent computation. Within the Actor model, Erlang’s lightweight processes are a concurrency primitive of the language itself. That is, within Erlang, we assume that everything is an actor. By definition, actors perform their actions concurrently; so if everything is an actor, we get inherent concurrency. (For more on Erlang’s Actor model, there’s a longer discussion here.)

As a result, Erlang software is built with many lightweight processes that keep process state isolated while providing extreme scalability. When an Erlang process needs external state, a message is normally sent to another process, so that the message queuing can provide the Erlang processes with efficient scheduling. To keep the Erlang process state isolated, the Erlang virtual machine does garbage collection for each process individually so that other Erlang processes can continue running concurrently without being interrupted.

The Erlang virtual machine garbage collection is an important difference when compared with Java virtual machine garbage collection because Java depends on a single heap, which lacks the isolated state provided by Erlang. The difference between Erlang garbage collection and Java garbage collection means that Java is unable to provide basic fault-tolerance guarantees simply due to the virtual machine garbage collection, even if libraries or language support was developed on top of the Java virtual machine. There have been attempts to develop fault-tolerance features in Java and other Java virtual machine based languages, but they continue to be failures due to the Java virtual machine garbage collection.


Basically, building real-time fault-tolerance support on top of the JVM is by definition impossible, because the JVM itself is not fault-tolerant.

Erlang processes

At a low level, what happens when we get an error in an Erlang process? The language itself uses message passing between processes to ensure that any errors have a scope limited by a concurrent process. This works by storing data types as immutable objects; these objects are copied to limit the scope of the process state (large binaries are a special exception because they are reference counted to conserve memory).

In basic terms, that means that if we want to send variable X to another process P, we have to copy over X as its own immutable variable X’. We can’t modify X’ from our current process, and so even if we trigger some error, our second process P will be isolated from its effects. The end result is low-level control over the scope of any errors due to the isolation of state within Erlang processes. If we wanted to get even more technical, we’d mention that Erlang’s lack of mutability gives it referential transparency unlike, say, Java.

This type of fault-tolerance is deeper than just adding try-catch statements and exceptions. Here, fault-tolerance is about handling unexpected errors, and exceptions are expected. Here, you’re trying to keep your code running even when one of your variables unexpectedly explodes.

Erlang’s process scheduling provides extreme scalability for minimal source code, making the system simpler and easier to maintain. While it is true that other programming languages have been able to imitate thescalability found natively within Erlang by providing libraries with user-level threading (possibly combined with kernel-level threading) and data exchanging (similar to message passing) to implement their own Actor model for concurrent computation, the efforts have been unable to replicate the fault-tolerance provided within the Erlang virtual machine.

This leaves Erlang alone amongst programming languages as being both scalable and fault-tolerant, making it an ideal development platform for a cloud.

Taking advantage of Erlang

So, with all that said, I can make the claim that CloudI brings Erlang’s fault-tolerance and scalability to various other programming languages (currently C++/C, Erlang (of course), Java, Python, and Ruby), implementing services within a Service Oriented Architecture (SOA).

This simplicity makes CloudI a flexible framework for polyglot software development, providing Erlang’s strengths without requiring the programmer to write or even understand a line of Erlang code.

Every service executed within CloudI interacts with the CloudI API. All of the non-Erlang programming language services are handled using the same internal CloudI Erlang source code. Since the same minimal Erlang source code is used for all the non-Erlang programming languages, other programming language support can easily be added with an external programming language implementation of the CloudI API. Internally, the CloudI API is only doing basic serialization for requests and responses. This simplicity makes CloudI a flexible framework for polyglot software development, providing Erlang’s strengths without requiring the programmer to write or even understand a line of Erlang code.

The service configuration specifies startup parameters and fault-tolerance constraints so that service failures can occur in a controlled and isolated way. The startup parameters clearly define the executable and any arguments it needs, along with default timeouts used for service requests, the method to find a service (called the “destination refresh method”), both an allow and deny simple access control list (ACL) to block outgoing service requests and optional parameters to affect how service requests are handled. The fault-tolerance constraints are simply two integers (MaxR: maximum restarts, and MaxT: maximum time period in seconds) that control a service in the same way an Erlang supervisor behavior (an Erlang design pattern) controls Erlang processes. The service configuration provides explicit constraints for the lifetime of the service which helps to make the service execution easy to understand, even when errors occur.

To keep the service memory isolated during runtime, separate operating system processes are used for each non-Erlang service (referred to as “external” services) with an associated Erlang process (for each non-Erlang thread of execution) that is scheduled by the Erlang VM. The Erlang CloudI API creates “internal” services, which are also associated with an Erlang process, so both “external” services and “internal” services are processed in the same way within the Erlang VM.


In cloud computing, it’s also important that your fault-tolerance extends beyond a single computer (i.e., distributed system fault-tolerance). CloudI uses distributed Erlang communication to exchange service registration information, so that services can be utilized transparently on any instance of CloudI by specifying a single service name for a request made with the CloudI API. All service requests are load-balanced by the sending service and each service request is a distributed transaction, so the same service with separate instances on separate computers is able to provide system fault-tolerance within CloudI. If necessary, CloudI can be deployed within a virtualized operating system to provide the same system fault-tolerance while facilitating a stable development framework.

For example, if an HTTP request needs to store some account data in a database it could be made to the configured HTTP service (provided by CloudI) which would send the request to an account data service for processing (based on the HTTP URL which is used as the service name) and the account data would then be stored in a database. Every service request receives a Universally Unique IDentifier (UUID) upon creation, which can be used to track the completion of the service request, making each service request within CloudI a distributed transaction. So, in the account data service example, it is possible to make other service requests either synchronously or asynchronously and use the service request UUIDs to handle the response data before utilizing the database. Having the explicit tracking of each individual service request helps ensure that service requests are delivered within the timeout period of a request and also provides a way to uniquely identify the response data (the service request UUIDs are unique among all connected CloudI nodes).

With a typical deployment, each CloudI node could contain a configured instance of the account data service (which may utilize multiple operating system processes with multiple threads that each have a CloudI API object) and an instance of the HTTP service. An external load balancer would easily split the HTTP requests between the CloudI nodes and the HTTP service would route each request as a service request within CloudI, so that the account data service can easily scale within CloudI.

CloudI in action

CloudI lets you take unscalable legacy source code, wrap it with a thin CloudI service, and then execute the legacy source code with explicit fault-tolerance constraints. This particular development workflow is important for fully utilizing multicore machines while providing a distributed system that is fault-tolerant during the processing of real-time requests. Creating an “external” service in CloudI is simply instantiating the CloudI API object in all the threads that have been configured within the service configuration, so that each thread is able to handle CloudI requests concurrently. A simple service example can utilize the single main thread to create a single CloudI API object, like the following Python source code:

import sys
from cloudi_c import API

class Task(object):
    def __init__(self):
        self.__api = API(0) # first/only thread == 0

    def run(self):
        self.__api.subscribe('hello_world_python/get', self.__hello_world)

    def __hello_world(self, command, name, pattern, request_info, request,
                      timeout, priority, trans_id, pid):
        return 'Hello World!'

if __name__ == '__main__':
    assert API.thread_count() == 1 # simple example, without threads
    task = Task()

The example service simply returns a “Hello World!” to an HTTP GET request by first subscribing with a service name and a callback function. When the service starts processing incoming CloudI service bus requests within the CloudI API poll function, incoming requests from the “internal” service which provides a HTTP server are routed to the example service based on the service name, because of the subscription. The service could also have returned no data as a response, if the request needed to be similar to a publish message in a typical distributed messaging API that provides publish/subscribe functionality. The example service wants to provide a response so that the HTTP server can provide the response to the HTTP client, so the request is a typical request/reply transaction. With both possible responses from a service, either data or no data, the service callback function controls the messaging paradigm used, instead of the calling service, so the request is able to route through as many services as necessary to provide a response when necessary, within the timeout specified for the request to occur within. Timeouts are very important for real-time event processing, so each request specifies an integer timeout which follows the request as it routes though any number of services. The end result is that real-time constraints are enforced alongside fault-tolerance constraints, to provide a dependable service in the presence of any number of software errors.

The presence of source code bugs in any software should be understood as a clear fact that is only mitigated with fault-tolerance constraints. Software development can reduce the presence of bugs as software is maintained, but it can also add bugs as software features are added. CloudI provides cloud computing which is able to address these important fault-tolerance concerns within real-time distributed systems software development. As I have demonstrated in this CloudI and Erlang tutorial, the cloud computing that CloudI provides is minimal so that efficiency is not sacrificed for the benefits of cloud computing.


Web Crawling & Analytics Case Study – Database Vs Self Hosted Message Queuing Vs Cloud Message Queuing


The Business Problem:

 To build a repository of used car prices and identify trends based on data available from used car dealers. The solution to the problem necessarily involved building large scale crawlers to crawl & parse thousands of used car dealer websites everyday.

1st Solution: Database Driven Solution to Crawling & Parsing

Our initial Infrastructure consisted of a crawling, parsing and database insertion web services all written in Python. When the crawling web service finishes with crawling a web site it pushes the output data to the database & the parsing web service picks it from there & after parsing the data, pushes the structured data into the database.



 Problems with Database driven approach:

  • Bottlenecks: Writing the data into database and reading it back proved be a huge bottleneck and slowed down the entire process & left to high & low capacity issues in the crawling & parsing functions.

  • High Processing Cost: Due to the slow response time of many websites the parsing service would remain mostly idle which lead to a very high cost of servers & processing.

We tried to speed up the process by directly posting the data to the parsing service from crawling service but this resulted in loss of data when the parsing service was busy. Additionally, the approach presented a massive scaling challenge from read & write bottlenecks from the database.

2nd Solution: Self Hosted / Custom Deployment Using RabbitMQ


To overcome the above mentioned problems and to achieve the ability to scale we moved to a new architecture using RabbitMQ. In the new architecture crawlers and parsers were Amazon EC2 micro instances. We used Fabric to push commands to the scripts running in the instances. The crawling instance would pull the used car dealer website from the website queue, crawl the relevant pages and push output the data to a crawled pages queue.The parsing instance would pull the data from the crawled pages queue, parse them and push data into parsed data queue and a data base insertion script would transfer that data into Postgres.

 This approach speeded up the crawling and parsing cycle. Scaling was just a matter of adding more instances created from specialized AMIs.

Problems with RabbitMQ Approach:

  • Setting up, deploying & maintaining this infrastructure across hundreds of servers was a nightmare for a small team

  • We suffered data losses every time there was a deployment & maintenance issues. Due to the tradeoff we were forced to make between speed and persistence of data in RabbitMQ, there was a chance we lost some valuable data if the server hosting RabbitMQ crashed.

3rd Solution: Cloud Messaging Deployment Using IronMQ & IronWorker

The concept of having multiple queues and multiple crawlers and parsers pushing and pulling data from them gave us a chance to scale the infrastructure massively. We were looking for solutions which could help us overcome the above problems using a similar architecture but without the headache of deployment & maintenance management.

The architecture, business logic & processing methods of using & Ironworkers were similar to RabbitMQ but without the deployment & maintenance efforts. All our code is written in python and since supports python we could set up the crawl & parsing workers and queues within 24 hours with minimal deployment & maintenance efforts. Reading and writing data into IronMQ is fast and all the messages in IronMQ are persistent and the chance of losing data is very less.



Key Variables Database Driven Batch Processing Self Hosted – RabbitMQ Cloud Based – Iron MQ
Speed of processing a batch Slow Fast Fast
Data Loss from Server Crashes & Production Issues Low Risk Medium Risk Low Risk
Custom Programming for Queue Management High Effort Low Effort Low Effort
Set Up for Queue Management NA Medium Effort Low Effort
Deployment & Maintenance of Queues NA High Effort Low Effort


How And Why Swiftype Moved From EC2 To Real Hardware


This is a guest post by Oleksiy Kovyrin, Head of Technical Operations at Swiftype. Swiftype currently powers search on over 100,000 websites and serves more than 1 billion queries every month.

When Matt and Quin founded Swiftype in 2012, they chose to build the company’s infrastructure using Amazon Web Services. The cloud seemed like the best fit because it was easy to add new servers without managing hardware and there were no upfront costs.

Unfortunately, while some of the services (like Route53 and S3) ended up being really useful and incredibly stable for us, the decision to use EC2 created several major problems that plagued the team during our first year.

Swiftype’s customers demand exceptional performance and always-on availability and our ability to provide that is heavily dependent on how stable and reliable our basic infrastructure is. With Amazon we experienced networking issues, hanging VM instances, unpredictable performance degradation (probably due to noisy neighbors sharing our hardware, but there was no way to know) and numerous other problems. No matter what problems we experienced, Amazon always had the same solution: pay Amazon more money by purchasing redundant or higher-end services.

The more time we spent working around the problems with EC2, the less time we could spend developing new features for our customers. We knew it was possible to make our infrastructure work in the cloud, but the effort, time and resources it would take to do so was much greater than migrating away.

After a year of fighting the cloud, we made a decision to leave EC2 for real hardware. Fortunately, this no longer means buying your own servers and racking them up in a colo. Managed hosting providers facilitate a good balance of physical hardware, virtualized instances, and rapid provisioning. Given our previous experience with hosting providers, we made the decision to choose SoftLayer. Their excellent service and infrastructure quality, provisioning speed, and customer support made them the best choice for us.

After more than a month of hard work preparing the inter-data center migration, we were able to execute the transition with zero downtime and no negative impact on our customers.The migration to real hardware resulted in enormous improvements in service stability from day one, provided a huge (~2x) performance boost to all key infrastructure components, and reduced our monthly hosting bill by ~50%.

This article will explain how we planned for and implemented the migration process, detail the performance improvements we saw after the transition, and offer insight for younger companies about when it might make sense to do the same.

Preparing For The Switch

Before the migration, we had around 40 instances on Amazon EC2. We would experience a serious production issue (instance outage, networking issue, etc) at least 2-3 times a week, sometimes daily. Once we decided to move to real hardware, we knew we had our work cut out for us because we needed to switch data centers without bringing down the service. The preparation process involved two major steps, each of which has a dedicated explanation in their own sections below:

  1. Connecting EC2 and SoftLayer. First, we built a skeleton of our new infrastructure (the smallest subset of servers to be able to run all key production services with development-level load) in SoftLayer’s data center. Once the new data center was set up, we built a system of VPN tunnels between our old and our new data centers to ensure transparent network connectivity between components in both data centers.

  2. Architectural changes to our applications. Next, we needed to make changes to our applications to make them work both in the cloud and on our new infrastructure. Once the application could live in both data centers simultaneously, we built a data-replication pipeline to make sure both the cloud infrastructure and the SoftLayer deployment (databases, search indexes, etc) were always in-sync.

Step 1: Connecting EC2 And Softlayer

One of the first things we had to do to prepare for our migration was figure out how to connect our EC2 and our SoftLayer networks together. Unfortunately the “proper” way of connecting a set of EC2 servers to another private network – using the Virtual Private Cloud (VPC) feature of EC2 – was not an option for us since we could not convert our existing set of instances into a VPC without downtime. After some consideration and careful planning, we realized that the only servers that really needed to be able to connect to each other across the data center boundary were our MongoDB nodes. Everything else we could make data center-local (Redis clusters, search servers, application clusters, etc).


Since the number of instances we needed to interconnect was relatively small, we implemented a very simple solution that proved to be stable and effective for our needs:

  • Each data center had a dedicated OpenVPN server deployed in it that NAT’ed all client traffic to its private network address.

  • Each node that needed to be able to connect to another data center would set up a VPN channel there and set up local routing to properly forward all connections directed at the other DC into that tunnel.

Here are some features that made this configuration very convenient for us:

  • Since we did not control network infrastructure on either side, we could not really force all servers on either end to funnel their traffic through a central router connected to the other DC. In our solution, each VPN server decided (with the help of some automation) which traffic to route through the tunnel to ensure complete inter-DC connectivity for all of its clients.

  • Even if a VPN tunnel collapsed (surprisingly, this only happened a few times during the weeks of the project), it would only mean one server lost its outgoing connectivity to the other DC (one node dropped out of MongoDB cluster, some worker server would lose connectivity to the central Resque box, etc). None of those one-off connectivity losses would affect our infrastructure since all important infrastructure components had redundant servers on both sides.

Step 2: Architectural Changes To Our Applications

There were many small changes we had to make in our infrastructure in the weeks of preparation for the migration, but having deep understanding of each and every component of it helped us make appropriate decisions reducing a chance of a disaster during the transitional period. I would argue that infrastructure of almost any complexity could be migrated with enough time and engineering resources to carefully consider each and every network connection established between applications and backend services.


Here are the main steps we had to take to ensure smooth and transparent migration:

  • All stateless services (caches, application clusters, web layer) were independently deployed on each side.

  • For each stateful backend service (database, search cluster, async queues, etc) we had to consider if we wanted (or could afford to) replicate the data to the other side or if we had to incur inter-data center latency for all connections. Relying on the VPN was always considered the last resort option and eventually we were able to reduce the amount of traffic between data centers to a few small streams of replication (mostly MongoDB) and connections to primary/main copies of services that could not be replicated.

  • If a service could be replicated, we would do that and then make application servers always use or prefer the local copy of the service instead of going to the other side.

  • For services that we could not replicate with their internal replication capabilities (like our search backends) we made the changes in our application to implement replication between data centers where asynchronous workers on each side would pull the data from their respective queues and we would always write all asynchronous jobs into queues for both data centers.

Step 3: Flipping The Switch

When both sides were ready to serve 100% of our traffic, we prepared for the final switchover by reducing our DNS TTL down to a few seconds to ensure fast traffic change.

Finally, we switched traffic to the new data center. Requests switched to the new infrastructure with zero impact on our customers. Once traffic to EC2 had drained, we disabled the old data center and forwarded all remaining connections from the old infrastructure to the new one. DNS updates take time, so some residual traffic was visible on our old servers for at least a week after the cut-off time.

A Clear Improvement: Results After Moving From EC2 To Real Hardware

Stability improved. We went from 2-3 serious outages a week (most of these were not customer-visible, since we did our best to make the system resilient to failures, but many outages would wake someone up or force someone to abandon family time) down to 1-2 outages a month, which we were able to handle more thoroughly by spending engineering resources on increasing system resilience to failures and reducing a chance of them making any impact on our customer-visible availability.

Performance improved. Thanks to the modern hardware available from SoftLayer we have seen a consistent performance increase for all of our backend services (especially IO-bound ones like databases and search clusters, but for CPU-bound app servers as well) and, what is more important, the performance was much more predictable: no sudden dips or spikes unrelated to our own software’s activity. This allowed us to start working on real capacity planning instead of throwing more slow instances at all performance problems.

Costs decreased. Last, but certainly not least for a young startup, the monthly cost of our infrastructure dropped by at least 50%, which allowed us to over-provision some of the services to improve performance and stability even further, greatly benefiting our customers.

Provisioning flexibility improved, but provisioning time increased. We are now able to exactly specify servers to meet their workload (lots of disk doesn’t mean we need a powerful CPU). However, we can no longer start new servers in minutes with an API call. SoftLayer generally can add a new server to our fleet within 1-2 hours. This is a big trade-off for some companies, but it was one that works well for Swiftype.


Since switching to real hardware, we’ve grown considerably – our data and query volume is up 20x – but our API performance is better than ever. Knowing exactly how our servers will perform lets us plan for growth in a way we couldn’t before.

In our experience, the cloud may be a good idea when you need to rapidly spin up new hardware, but it only works well when you’re making a huge (Netflix-level) effort to survive in it. If your goal is to build a business from day one and you do not have spare engineering resources to spend on paying the “cloud tax”, using real hardware may be a much better idea.

AppLovin: Marketing To Mobile Consumers Worldwide By Processing 30 Billion Requests A Day


This is a guest post from AppLovin‘s VP of engineering, Basil Shikin, on the infrastructure of its mobile marketing platform. Major brands like Uber, Disney, Yelp and use AppLovin’s mobile marketing platform. It processes 30 billion requests a day and 60 terabytes of data a day.

AppLovin’s marketing platform provides marketing automation and analytics for brands who want to reach their consumers on mobile. The platform enables brands to use real-time data signals to make effective marketing decisions across one billion mobile consumers worldwide.

Core Stats

  • 30 Billion ad requests per day

  • 300,000 ad requests per second, peaking at 500,000 ad requests per second

  • 5ms average response latency

  • 3 Million events per second

  • 60TB of data processed daily

  • ~1000 servers

  • 9 data centers

  • ~40 reporting dimensions

  • 500,000 metrics data points per minute

  • 1 Pb Spark cluster

  • 15GB/s peak disk writes across all servers

  • 9GB/s peak disk reads across all servers

  • Founded in 2012, AppLovin is headquartered in Palo Alto, with offices in San Francisco, New York, London and Berlin.


Technology Stack


Third Party Services

Data Storage

  • Aerospike for user profile storage

  • Vertica for aggregated statistics and real-time reporting

  • Aggregating 350,000 rows per second and writing to Vertica at 34,000 rows per second

  • Peak 12,000 user profiles per second written to Aerospike

  • MySQL for ad data

  • Spark for offline processing and deep data analysis

  • Redis for basic caching

  • Thrift for all data storage and transfers

  • Each data point replicated in 4 data centers

  • Each service is replicated at least in 2 data centers (at most in 8)

  • Amazon Web Services used for long term data storage and backups

Core App And Services

  • Custom C/C++ Nginx module for high performance ad serving

  • Java for data processing and auxiliary services

  • PHP / Javascript for UI

  • Jenkins for continuous integration and deployment

  • Zookeeper for distributed locking

  • HAProxy and IPVS for high availability

  • Coverity for Java/C++ static code analysis

  • Checkstyle and PMD for PHP static code analysis

  • Syslog for DC-centralized log server

  • Hibernate for transaction-based services

Servers And Provisioning

  • Ubuntu

  • Cobbler for bare metal provisioning

  • Chef for configuring servers

  • Berkshelf for Chef dependencies

  • Docker with Test Kitchen for running infrastructure tests


Monitoring Stack


Server Monitoring

  • Icinga for all servers

  • ~100 custom Nagios plugins for deep server monitoring

  • 550 various probes per server

  • Graphite as data format

  • Grafana for displaying all graphs

  • PagerDuty for issue escalation

  • Smokeping for network mesh monitoring

Application Monitoring

  • VividCortex for MySQL monitoring

  • JSON /health endpoint on each service

  • Cross-DC database consistency monitoring

  • 9 4K 65” TVs for showing all graphs across the office

  • Statistical deviation monitoring

  • Fraudulent users monitoring

  • Third-party systems monitoring

  • Deployments are recorded in all graphs

Intelligent Monitoring

  • Intelligent alerting system with a feedback loop: a system that can introspect anything can learn anything

  • Third-party stats about AppLovin are also monitored

  • Alerting is a cross-team exercise: developers, ops, business, data scientists are involved


Architecture Overview


General Considerations

  • Store everything in RAM

  • If it does not fit, save it to SSD

  • L2 cache level optimizations matter

  • Use right tool for the right job

  • Architecture allows swapping any component

  • Upgrade only if an alternative is 10x better

  • Write your own components if there is nothing suitable out there

  • Replicate important data at least 3x

  • Make sure every message can be re-played without data corruption

  • Automate everything

  • Zero-copy message passing

Message Processing

  • Custom message processing system that guarantees message delivery

  • 3x replication for each message

  • Sending a message = writing to disk

  • Any service may fail, but no data are lost

  • Message dispatching system connects all components together, provides isolation and extensibility of the system

  • Cross-DC failure tolerance

Ad Serving

  • Nginx is really fast: can serve an ad in less than a millisecond

  • Keep all ad serving data in memory: read only

  • jemalloc gave a 30% speed improvement

  • Use Aerospike for user profiles: less than 1ms to fetch a profile

  • Pre-compute all ad serving data on one box and dispatch across all servers

  • Torrents are used to propagate serving data across all servers. Using Torrents resulted in 83% network load drop on the originating server compared to HTTP-based distribution.

  • mmap is used to share ad serving data across nginx processes

  • XXHash is the fastest hash function with a low collision rate. 75x faster than SHA-1 for computing checksums

  • 5% of real traffic goes to staging environment

  • Ability to run 3 A/B tests at once (20%/20%/10% of traffic for three separate tests, 50% for control)

  • A/B test results are available in regular reporting


Data Warehouse

  • All data are replicated

  • Running most reports takes under 2 seconds

  • Aggregation is key to allow fast reports on large amounts of data

  • Non-aggregated data for the last 48 hours is usually to resolve most issues

  • 7 days of raw logs is usually enough for debug

  • Some reports must be pre-computed

  • Always think multiple data centers: every data point goes to a multiple locations

  • Backup in S3 for catastrophic failures

  • All raw data are stored in Spark cluster





  • 70 full-time employees

  • 15 developers (platform, ad serving, frontend, mobile)

  • 4 data scientists

  • 5 dev. ops.

  • Engineers in Palo Alto, CA

  • Business in San Francisco, CA

  • Offices in New York, London and Berlin


  • HipChat to discuss most issues

  • Asana for project-based communication

  • All code is reviewed

  • Frequent group code reviews

  • Quarterly company outings

  • Regular town hall meetings with CEO

  • All engineers (junior to CTO) write code

  • Interviews are tough: offers are really rare

Development Cycle

  • Developers, business side or data science team comes up with an idea

  • Idea is reviewed and scheduled to be executed on a Monday meeting

  • Feature is implemented in a branch; development environment is used for basic testing

  • A pull request is created

  • Code is reviewed and iterated upon

  • For big features group code reviews are common

  • The feature gets merged to master

  • The feature gets deployed to staging with the next build

  • The feature gets tested on 5% real traffic

  • Statistics are examined

  • If the feature is successful it graduates to production

  • Feature is closely monitored for couple days

Avoiding Issues

  • The system is designed to handle failure of any component

  • No failure of a single component can harm ad serving or data consistency

  • Omniscient monitoring

  • Engineers watch and analyze key business reports

  • High quality of code is essential

  • Some features take multiple code reviews and iterations before graduating

  • Alarms are triggered when:

    • Stats for staging are different from production

    • FATAL errors on critical services

    • Error rate exceeds threshold

    • Any irregular activity is detected

  • data are never dropped

  • Most log lines can be easily parsed

  • Rolling back of any change is easy by design

  • After every failure: fix, make sure same thing does not happen in the future, and add monitoring


Lessons Learned


Product Development

  • Being able to swap any component easily is key to growth

  • Failures drive innovative solutions

  • Staging environment is essential: always be ready to loose 5%

  • A/B testing is essential

  • Monitor everything

  • Build intelligent alerting system

  • Engineers should be aware of business goals

  • Business people should be aware of limitations of engineering

  • Make builds and continuous integration fast. Jenkins run on a 2 bare metal servers with 32 CPU, 128G RAM and SSD drives


  • Monitoring all data points is critical

  • Automation is important

  • Every component should support HA by design

  • Kernel optimizations can have up to 25% performance improvement

  • Process and IRQ balancing lead to another 20% performance improvement

  • Power saving features impact performance

  • Use SSDs as much as possible

  • When optimizing, profile everything. Flame graphs are great!


The Architecture Of Algolia’s Distributed Search Network

Algolia started in 2012 as an offline search engine SDK for mobile. At this time we had no idea that within two years we would have built a worldwide distributed search network.

Today Algolia serves more than 2 billion user generated queries per month from 12 regions worldwide, our average server response time is 6.7ms and 90% of queries are answered in less than 15ms. Our unavailability rate on search is below 10-6 which represents less than 3 seconds per month.

The challenges we faced with the offline mobile SDK were technical limitations imposed by the nature of mobile. These challenges forced us to think differently when developing our algorithms because classic server-side approaches would not work.

Our product has evolved greatly since then. We would like to share our experiences with building and scaling our REST API built on top of those algorithms.

We will explain how we are using a distributed consensus for high-availability and synchronization of data in different regions around the world and how we are doing the routing of queries to the closest locations via an anycast DNS.

The data size misconception

Before designing the architecture, we first had to identify the major use cases we needed to support. This was especially true when considering our scaling needs. We had to know if our customers would need to index Gigabytes, Terabytes, or Petabytes of data. The architecture would be different depending on how many of those use cases we needed to handle.

When people think about search, most think about very big use cases like Google’s web page indexing or Facebook’s indexing of trillions of posts. If you stop and think about the search boxes you see every day, the majority of them do not search massively big datasets. Netflix searches approximately 10,000 titles and Amazon’s database in the US contains around 200,000,000 products. The data from both of these cases can be stored on a single machine! We are not saying that having a single machine is a good setup, but keeping in mind all that data can fit on one machine is really important since cross-machine synchronization is a big source of complexity and performance loss.

The road to high-availability

When building a SaaS API, high availability is a big concern as removing all single points of failure (SPOF) is extremely challenging. We spent weeks brainstorming the ideal search architecture for our service while keeping in mind our product would be geared towards user facing search.

Master-Slave Vs. Master-Master

By temporarily restricting the problem to each index being stored on a single machine, we simplified our high availability setup to several machines hosted in different data centers. With this setup, the first solution we thought of was to have a master-slave setup with one master machine receiving all indexing operations and then replicating them to one or more slave machines. With this approach, we could easily load balance search queries across all the machines.

The problem with this master-slave approach is that our high availability only works for search queries. All indexing operations need to go to the master. This architecture is too risky for a service company. All it takes is for the master to be down, which will happen, and clients will start having indexing errors.

We must implement a master-master architecture! The key element to enabling a master-master setup is to have a way of agreeing on a single result among a group of machines. We need to have shared knowledge between all machines which stays consistent under all circumstances, even when there is a network split between machines.

Introducing The Distributed Coherency

For a search engine, one of the best ways to introduce this shared knowledge is to treat the write operations as a unique stream of operations that must be applied in a certain order. When we have several operations coming at the exact same time, we need to assign them a sequence ID. This ID can then be used to ensure the sequence is applied exactly the same way on all replicas.

In order to assign a sequence ID (a number incremented by one after each job), we need to have a shared global state on the next sequence ID between machines. ZooKeeper opensource software is the de-facto solution for distributed knowledge in a cluster and we initially started to use ZooKeeper with the following sequence:

  1. When a machine receives a job, it copies the job to all replicas using a temporary name.

  2. That machine then takes the distributed lock.

  3. Reads the last sequence ID in ZooKeeper and sends an order to copy the temporary file as sequence ID + 1 on all machines. This is equivalent to a two phase commit.

  4. If we have a majority of positive answers from the machines (quorum), we save sequence ID + 1 in Zookeeper.

  5. The distributed lock is then released.

  6. Finally, the client sending the job is informed of the result. This would be success if there is a majority of commit.

Unfortunately this sequence is not right because if a machine that acquires the lock crashes or restarts between steps 3 and 4, we can end up in a state where the job is committed on some machines, a more complex sequence is needed.

The packaging of ZooKeeper as an external service via a TCP connection makes it really difficult to have it right and requires to use a big timeout (default timeout is set to 4 seconds, representing two ticks of two seconds each).

As a consequence, every failure event, either from hardware or software, would freeze our entire system for the duration of this timeout. It might seem acceptable, but in our case we wanted to test a failure very often in production (like the Monkey testing approach of Netflix).

The Raft Consensus Algorithm

Around the time we were running into these problems, the RAFT consensus algorithm was published. It was clear right away that this algorithm fit our use case perfectly. The state machine of RAFT is our index and the log is the list of index jobs to be executed. I already knew about the PAXOS protocol but did not have a strong enough understanding of it and all the variants to be confident enough to implement it myself. RAFT, on the other hand, was much clearer. If was a perfect match for what we needed and even without stable open source implementations at that time, I was confident enough in my understanding to implement it as the basis of our architecture.

The hardest part of implementing consensus algorithms is making sure there are no bugs in the system. To handle that, I opted for a monkey testing approach by randomly killing processes using a sleep before restarting. To test it even further, I simulated network drops and degradations via the firewall. This type of testing helped us find many bugs. Once we were operating for several days without any problems, I was very confident the implementation was done correctly.

Replicate At Application Or Filesystem Level?

We have chosen to distribute the write operations to all machines and execute them locally rather than replicating the final results on filesystem. We made this choice for two reasons:

  • It is faster. Indexing is done in parallel on all machines, it is faster than replicating the resulting binary files that can be big

  • It is compatible with multiple regions. If we replicate the files after indexing, we need to have a process that will rewrite the whole index. This means we could have huge amounts of data to transfer. The size of data to transfer is very inefficient if you need to transfer it to different geographic regions around the world (ex. New York to Singapore).

Each machine will receive all write operation jobs in the correct order and process them as soon as possible independently of other machines. This means all machines are assured to be at the same state but not necessarily at the same time. This is because the changes may not be committed on all machines at exactly the same moment.

The Compromise On Consistency

In distributed computing, the CAP Theorem states that it is impossible for a distributed computing system to simultaneously provide all three of the following:

  • Consistency: all nodes see the same data at the same time.

  • Availability: a guarantee that every request receives a response about whether it succeeded or failed.

  • Partition tolerance: the system continues to operate despite arbitrary message loss or failure of part of the system.

According to this theorem, we compromised on Consistency. We don’t guarantee that all nodes see exactly the same data at the same time but they will all receive the updates. In other words, we can have small cases where the machines are not synchronized. In reality, this is not a problem because when a customer performs a write operation we apply that job on all hosts. There is less than one second between the time of application on the first and last machine so it is normally not visible for end users. The only inconsistency possible is whether the last updated received is already applied or not, which is compatible with the use cases of our clients.

General Architecture

Definition Of A Cluster

Having a distributed consensus between machines is mandatory in order to have a high availability infrastructure but there is unfortunately a big drawback. This consensus requires several round trips between the machines, so the number of possible consensus per second is directly related to the latency between the different machines. They need to be close to have a high number of consensus per second. To be able to support several regions without sacrificing the number of possible write operations means that we need to have several clusters, each cluster will contains three machines that will act as perfect replicas.

Having one cluster per region is the minimum needed for consensus, but is still far from perfect:

  • We cannot make all customers fit on one machine.

  • The more customers we have, the less number of write operations per second each unique customer will be able to perform. This is because the maximum number of consensus per second is fixed.

In order to work around this problem, we decided to apply the same concept at the region level: each region will have several clusters of three machines. One cluster can host from one to several customers depending on the size of the data they have. This concept is close to what virtualization is doing on a physical machine. We are able to put several customers on a cluster except one customer can grow and change their usage dynamically. In order to do this, we need to develop and automate the following processes:

  • Migrate one customer to another cluster if the cluster has too much data or number of write operations.

  • Add a new machine to the cluster if the volume of queries is too big.

  • Change the number of shards or split one customer across several clusters if their volume of data is too big.

If we have these processes in place, a customer won’t be assigned to a cluster permanently. Assignment will change depending on their own usage as well as the cluster’s usage. This means we need a way to assign a customer to a cluster.

Assigning A Customer To A Cluster

The standard way to manage this assignment is to have one unique DNS entry per customer. This is similar to how Amazon Cloudfront works. Each customer is assigned a unique DNS entry of the form that can then target a different set of machines depending on the customer.

We chose to go with the same approach. Each customer is assigned a unique application ID which is linked to a DNS record of the form This DNS record targets a specific cluster with all machines in the cluster being part of the DNS record so there is load balancing done via DNS. We also use health check mechanisms to detect machine failures and remove them from the DNS resolution.

The health check mechanism is still not sufficient to provide a good SLA even with a very low TTL on the DNS records (TTL is the time the client is allowed to keep the DNS answer cached). The problem is that a host may go down but a user still has the host in cache. The user will continue to send queries to it until the cache expires. It gets even worse because TTL is not an exact science. There are cases where systems do not respect the TTL. We have seen DNS records with a TTL of one minute transformed into a TTL of 30 minutes by some DNS servers.

In order to further improve high availability and avoid a machine failure impacting users, we generate another set of DNS records for each customer of the form,, and The idea behind these DNS records is to allow our API clients to retry other records when a TCP connect timeout is reached (usually set to one second). Our standard implementation is to shuffle the list of DNS records and try them in sequential order.

Combined with carefully-controlled retry and timeout logic in our API clients, this proved to be a better and cheaper solution than using specialized load balancer.

Later, we discovered the trendy .IO TLD was not a good choice for performance. There are fewer DNS servers in the anycast network of .IO compared to .NET and the ones there were saturated. This resulted in a lot of timeouts that slowed down the name resolution. We have since solved these performance problems by switching to domains while keeping backwards compatibility by continuing to support

What about Scalability of a cluster?

Our choice of using several clusters allows us to add more customers without too much risk of impacting existing customers because of the isolation between clusters. But we still had concerns about the scalability of one cluster that needed to be addressed.

The first limiting factor in the scalability of a cluster is the number of write operations per second due to the consensus. In order to mitigate this factor, we introduced a batch method in our API that encapsulates a set of write operations in one operation from the consensus point of view. The problem is that some customers still perform write operations without batching which can have a negative impact on indexing speed for other customers of the cluster.

In order to reduce this performance impact, we have made two changes to our architecture:

  • We added a batching strategy when there is contention on the consensus by automatically aggregating all write operations of each customer inside a unique operation from the consensus point of view. In practice, this means that we are reordering the sequence of jobs but without an impact on the semantics of the operations. For example, if there are 1,000 jobs pending for consensus and 990 are from one customer, we will merge 990 write operations into one even if there are jobs of other customers interlaced with them.

  • We added a consensus scheduler that controls the number of write operations per second entering the consensus for each application ID. This avoids one customer being able to use all the bandwidth of the consensus.

Before we implemented these improvements, we tried a rate limit strategy by returning a 429 HTTP status code. It was apparent very quickly that this was too painful for our customers to have to watch for this response and implement a retry strategy. Today, our biggest customer performs more than one billion write operations per day on a single cluster of three machines which is an average of 11,500 operations per second with bursts of more than 150,000.

The second problem was to find the best hardware setup and avoid any potential bottlenecks such as CPU or I/O that could compromise the scalability of a cluster. Since the beginning we made the choice to use our own bare metal servers in order to fully control the performance of our service and avoid wasting any resources. Selecting the correct hardware proved to be a challenging task.

At the end of 2012, we started with a small setup consisting of: Intel Xeon E3 1245v2, 2x Intel SSD 320 series 120GB in raid 0, and 32GB of RAM. This hardware was reasonable in terms of price, more powerful than cloud platforms, and allowed us to start the service in Europe and US-East.

This setup allowed us to tune the kernel for I/O scheduling and virtual memory which was critical for us to take advantage of all available physical resources. Even so, we soon discovered our limits were the amount of RAM and I/O. We were using around 10GB of RAM for indexing which left only 20GB of RAM for caching of files used for performing search queries. Our goal had always been to have customer indices in memory in order to have a service optimized for millisecond response times. The current hardware setup was designed for 20GB of index data which was too small.

After this first setup, we tried different hardware machines with single and dual socket CPUs, 128GB and 256GB of RAM, and different models/sizes of SSD.

We finally found an optimal setup with a machine containing an Intel Xeon E5 1650v2, 128GB of RAM, and 2x400GB Intel S3700 SSD. The model of the SSD was very important for durability. We burned a lot of SSDs before finding the correct model that can operate in production for years.

In the end, the final architecture we built allowed us to scale well in all areas with only one condition: we needed to have free resources available at any moment. It might seem crazy in 2015 to deal with the pain of having to manage bare metal servers, but the gain we have in terms of quality of service and price for our customers is well worth it. We are able to offer a fully packaged search engine with replication to three different locations, in memory indices, and with excellent performance in more locations than AWS!

Is it complex to operate?

Limit The Number Of Processes

Each machine contains only three processes. The first is a nginx server with all our query interpretation code embedded inside as a module. To answer a query, we memory map the index files and directly execute the query inside the nginx worker without communicating to another process or machine. The only exception is when the customer data does not fit on one machine which is rare.

The second process is a redis key/value store that we use to check rates and limits as well as storing real time logs and counters for each application ID. These counters are used to build our real time dashboard which can be viewed when you connect to your account. This is useful for visualizing your last API calls and for debugging.

The last process is the builder. This is the process responsible for handling all write operations. When the nginx process receives a write operation, it forwards the operation to the builder to perform the consensus. It is also responsible for building the indices and contains a lot of monitoring code that checks for errors in our service such as crashes, slow indexing, indexing errors, etc. Depending on the severity of the problem, some are reported by SMS via Twilio’s API while others are reported directly to PagerDuty. Each time a new problem is detected in production and not reported we make sure to add a new probe to watch for this type of error in the future.

Ease Of Deployment

The simplicity of this stack makes deployments easy. Before we deploy any code we apply a bunch of unit tests and non regression tests. Once all those tests are passing, we gradually deploy to clusters.

Our deployments should never impact production nor be visible to end users. At the same time, we also want to generate a host failure in consensus in order to check everything is working as expected. In order to achieve both goals, we deploy each machine of a cluster independently and apply the following procedures:

  1. Fetch new nginx and builder binaries.

  2. Gracefully restart the nginx web server and relaunch nginx using the new binary without losing any user queries.

  3. Kill the builder and launch it using the new binary. This triggers a failure in RAFT on the deployment of each machine with allows us to make sure our failover is working as expected.

The simplicity of operating our system was an important goal in our architecture. We did not want nor believe deployment should be constrained by the architecture.

Achieving A Good Worldwide Coverage

Services are becoming more and more global. Serving search queries from only one worldwide region is far from optimal. For example, having search hosted in US-East will have a big difference in usability depending on where users are searching from. Latency will go from a few milliseconds for users in US-East to several hundred milliseconds for users in Asia without counting the bandwidth limitations of saturated oversea fibers.

We have seen some companies use a CDN on top of a search engine to address these issues. This ends up causing more problems than value for us because invalidating cache is a nightmare and it only improves the speed for a small percentage of queries that are frequently made. It was clear to us that in order to solve this problem we would need to replicate indices to different regions and have them loaded in memory in order to answer user queries efficiently.

What we need is an inter-region replication on top of our existing cluster replication. The replica can be stored on one machine since the replica will only be used for search queries. All write operations will still go to the original cluster of the customer.

Each customer can select the set of data centers they want to have as a replicate, so a replicate machine in a specific region can receive data from several clusters and a cluster can send data to several replicates.

The implementation of this architecture is modeled on our consensus based stream of operations. Each cluster transforms its own stream of write operations after consensus into a version for each replicate making sure to replace jobs that are not relevant for this replicate with no-op jobs. This stream of operations is then sent to all replicates as a batch of operations to avoid as much latency as possible. Sending jobs one by one would result in too many round trips with the replicates.

On the cluster, write operations are kept on the machines until they are acknowledged by all replicates.

The last part of DSN is to redirect the end user directly to the closest location. In order to do that we added another DNS record in the form of that takes care of the resolution to the closest data center. We first used the Route53 DNS service of Amazon but rapidly hit its limits.

  • The latency-based routing is limited to the AWS regions and we have locations not covered by AWS like India, Hong Kong, Canada and Russia.

  • The geo-based routing is horrible. You need to indicate for each country what the DNS resolution will be. This is a classic approach a lot of hosted DNS providers are taking but in our case it would be a nightmare to support and would not provide enough relevancy. For example, we have several data centers in the US.

After a lot of benchmarking and discussion, we decided upon using NSOne for several reasons:

  • Their Anycast network is very good and better balanced than AWS for us. For example, they have a POP in India and Africa.

  • Their filter logic is really good. For each customer we can specify the list of machines that are associated with them (including replicates) and use a geo filter to sort them by distance. We are then able to keep the best one.

  • They support EDNS client subnets. This is important for us in order to be more relevant. We use the IP of the final user instead of the IP of their DNS server for resolution.

In terms of performance, we have been able to reach global worldwide synchronization at the second level. You can try it out on Product Hunt’s search (hosted in US-East, US-West, India, Australia, and Europe) or on Hacker News’ search (hosted in US-East, US-West, India, and Europe).


We spent a lot of time building our distributed and scalable architecture and have faced a lot of different problems. I hope this article gives you a better understanding about how we resolved those problems and provides a useful guide on how to design your own services.

I’m seeing more and more services that are currently facing problems similar to us, having a worldwide audience with multi-region infrastructure but with some worldwide consistent information like login or content. Having a multi-region infrastructure today is mandatory to achieve an excellent user experience. This approach can be used for example to distribute read-only replicates of a database that will be consistent worldwide!

StackExchange’s Performance Dashboard

6148448748_ee8eedd346_mStackExchange created a very cool performance dashboard that looks to be updated from real system metrics. Wouldn’t it be fascinating if every site had a similar dashboard?

The dashboard contains information like there are 560 million page views per month, 260,000 sustained connections,  34 TB data transferred per month, 9 web servers with 48GB of RAM handling 185 req/s at 15% CPU usage. There are 4 SQL servers, 2 redis servers, 3 tag engine servers, 3 elasticsearch servers, and 2 HAProxy servers, along with stats on each.

There’s also an excellent discussion thread on reddit that goes into more interesting details, with questions being answered by folks from StackExchange.

StackExchange is still doing innovative work and is very much an example worth learning from. They’ve always danced to their own tune and it’s a catchy tune at that. More at StackOverflow Update: 560M Pageviews A Month, 25 Servers, And It’s All About Performance.

( via )