Simpler, Cheaper, Faster: Playtomic’s Move From .NET To Node And Heroku

 

This is a guest post by Ben Lowry, CEO of Playtomic. Playtomic is a game analytics service implemented in about 8000 mobile, web and downloadable games played by approximately 20 million people daily.

Here’s a good summary quote by Ben Lowry on Hacker News:

Just over 20,000,000 people hit my API yesterday 700,749,252 times, playing the ~8,000 games my analytics platform is integrated in for a bit under 600 years in total play time. That’s just yesterday. There are lots of different bottlenecks waiting for people operating at scale. Heroku and NodeJS, for my use case, eventually alleviated a whole bunch of them very cheaply.

Playtomic began with an almost exclusively Microsoft.NET and Windows architecture which held up for 3 years before being replaced with a complete rewrite using NodeJS.  During its lifetime the entire platform grew from shared space on a single server to a full dedicated, then spread to second dedicated, then the API server was offloaded to a VPS provider and 4 – 6 fairly large VPSs.   Eventually the API server settled on 8 dedicated servers at Hivelocity, each a quad core with hyperthreading + 8gb of ram + dual 500gb disks running 3 or 4 instances of the API stack.

These servers routinely serviced 30,000 to 60,000 concurrent game players and received up to 1500 requests per second, with load balancing done via DNS round robin.

In July the entire fleet of servers was replaced with a NodeJS rewrite hosted at Heroku for a significant saving.

 

Scaling Playtomic With NodeJS

There were two parts to the migration:

  1. Dedicated to PaaS:  Advantages include price, convenience, leveraging their load balancing and reducing overall complexity.  Disadvantages include no New Relic for NodeJS, very inelegant crashes, and a generally immature platform.
  2. .NET to NodeJS: Switching architecture from ASP.NET / C# with local MongoDB instances and a service preprocessing event data locally and sending it to centralized server to be completed; to NodeJS on Heroku + Redis and preprocessing on SoftLayer (see Catalyst program).

Dedicated To PaaS

The reduction in complexity is significant; we had 8 dedicated servers each running 3 or 4 instances of the API at our hosting partner Hivelocity.  Each ran a small suite of software including:

  • MongoDB instance
  • log pre-processing service
  • monitoring service
  • IIS with api sites

Deploying was done via an FTP script that uploaded new api site versions to all servers.  Services were more annoying to deploy but changed infrequently.

MongoDB was a poor choice for temporarily holding log data before it was pre-processed and sent off.  It offered a huge speed advantage of just writing to memory initially which meant write requests were “finished” almost instantly which was far superior to common message queues on Windows, but it never reclaimed space left from deleted data which meant the db size would balloon to 100+ gigabytes if it wasn’t compacted regularly.

The advantages of PaaS providers are pretty well known, they all seem quite similar although it’s easiest to have confidence in Heroku and Salesforce since they seem the most mature and have broad technology support.

The main challenges transitioning to PaaS was shaking the mentality that we could run assistive software alongside the website as we did on the dedicated servers.  Most platforms provide some sort of background worker threads you can leverage but that means you need to route data and tasks from the web threads through a 3rd party service or server which seems unnecessary.

We eventually settled on a large server at Softlayer running a dozen purpose-specfic Redis instances and some middleware rather than background workers.  Heroku doesn’t charge for outbound bandwidth and Softlayer doesn’t charge for inbound which neatly avoided the significant bandwidth involved.

Switching From .NET To NodeJS

Working with JavaScript on the serverside is a mixed experience.  On the one hand the lack of formality and boilerplate is liberating.  On the other hand there’s no New Relic and no compiler errors which makes everything harder than it needs to be.

There are two main advantages that make NodeJS spectacularly useful for our API.

  1. Background workers in the same thread and memory as the web server
  2. Persistant, shared connections to redis and mongodb (etc)

Background Workers

NodeJS has the very useful ability to continue working independently of requests, allowing you to prefetch data and other operations that allow you to terminate a request very early and then finish processing it.

It is particularly advantageous for us to replicate entire MongoDB collections in memory, periodically refreshed, so that entire classes of work had access to current data without having to go an external database or local/shared caching layer.

We collectively save 100s – 1000s of database queries per second using this in:

  • Game configuration data on our main api
  • API credentials on our data exporting api
  • GameVars which developers use to store configuration or other data to hotload into their games
  • Leaderboard score tables (excluding scores)

The basic model is:

var cache = {};module.exports = function(request, response) {
response.end(cache[“x”]);
}

function refresh() {

// fetch updated data from database, store in cache object
cache[“x”] = “foo”;
setTimeout(refresh, 30000);
}

refresh();

The advantages of this are a single connection (per dyno or instance) to your backend databases instead of per-user, and a very fast local memory cache that always has fresh data.

The caveats are your dataset must be small, and this is operating on the same thread as everything else so you need to be conscious of blocking the thread or doing too-heavy cpu work.

Persistent Connections

The other massive benefit NodeJS offers over .NET for our API is persistant database connections.  The traditional method of connecting in .NET (etc) is to open your connection, do your operation, after which your connection is returned to a pool to be re-used shortly or expired if it’s no longer needed.

This is very common and until you get to a very high concurrency it will Just Work.  At a high concurrency the connection pool can’t re-use the connections fast enough which means it generates new connections that your database servers will have to scale to handle.

At Playtomic we typically have several hundred thousand concurrent game players that are sending event data which needs to be pushed back to our Redis instances in a different datacenter which with .NET would require a massive volume of connections – which is why we ran MongoDB locally on each of our old dedicated servers.

With NodeJS we have a single connection per dyno/instance which is responsible for pushing all the event data that particular dyno receives.  It lives outside of the request model something like this:

var redisclient  = redis.createClient(….);module.exports = function(request, response) {

var eventdata = “etc”;

redisclient.lpush(“events”, eventdata);

}

The End Result

High load:

REQUESTS IN LAST MINUTE


_exceptions: 75 (0.01%)
_failures: 5 (0.00%)
_total: 537,151 (99.99%)
data.custommetric.success: 1,093 (0.20%)
data.levelaveragemetric.success: 2,466 (0.46%)
data.views.success: 105 (0.02%)
events.regular.invalid_or_deleted_game#2: 3,814 (0.71%)
events.regular.success: 527,837 (98.25%)
gamevars.load.success: 1,060 (0.20%)
geoip.lookup.success: 109 (0.02%)
leaderboards.list.success: 457 (0.09%)
leaderboards.save.missing_name_or_source#201: 3 (0.00%)
leaderboards.save.success: 30 (0.01%)
leaderboards.saveandlist.success: 102 (0.02%)
playerlevels.list.success: 62 (0.01%)
playerlevels.load.success: 13 (0.00%)


 

This data comes from some load monitoring that operates in the background on each instance, pushes counters to Redis where they’re then aggregated and stored in MongoDB,  you can see it in action at https://api.playtomic.com/load.html.

There are a few different classes of requests in that data:

  • Events that check the game configuration from MongoDB, perform a GeoIP lookup (opensourced very fast implementation at https://github.com/benlowry/node-geoip-native), and then push to Redis
  • GameVars, Leaderboards, Player Levels all check game configuration from MongoDB and then whatever relevant MongoDB database
  • Data lookups are proxied to a Windows server because of poor NodeJS support for stored procedures

The result is 100,000s of concurrent users causing spectactularly light Redis loads fo 500,000 – 700,000 lpush’s per minute (and being pulled out on the other end):

1  [||                                                                                      1.3%]     Tasks: 83; 4 running
2  [|||||||||||||||||||                                                                    19.0%]     Load average: 1.28 1.20 1.19
3  [||||||||||                                                                              9.2%]     Uptime: 12 days, 21:48:33
4  [||||||||||||                                                                           11.8%]
5  [||||||||||                                                                              9.9%]
6  [|||||||||||||||||                                                                      17.7%]
7  [|||||||||||||||                                                                        14.6%]
8  [|||||||||||||||||||||                                                                  21.6%]
9  [||||||||||||||||||                                                                     18.2%]
10 [|                                                                                       0.6%]
11 [                                                                                        0.0%]
12 [||||||||||                                                                              9.8%]
13 [||||||||||                                                                              9.3%]
14 [||||||                                                                                  4.6%]
15 [||||||||||||||||                                                                       16.6%]
16 [|||||||||                                                                               8.0%]
Mem[|||||||||||||||                                                                 2009/24020MB]
Swp[                                                                                    0/1023MB]

PID USER     PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
12518 redis     20   0 40048  7000   640 S  0.0  0.0  2:21.53  `- /usr/local/bin/redis-server /etc/redis/analytics.conf
12513 redis     20   0 72816 35776   736 S  3.0  0.1  4h06:40  `- /usr/local/bin/redis-server /etc/redis/log7.conf
12508 redis     20   0 72816 35776   736 S  2.0  0.1  4h07:31  `- /usr/local/bin/redis-server /etc/redis/log6.conf
12494 redis     20   0 72816 37824   736 S  1.0  0.2  4h06:08  `- /usr/local/bin/redis-server /etc/redis/log5.conf
12488 redis     20   0 72816 33728   736 S  2.0  0.1  4h09:36  `- /usr/local/bin/redis-server /etc/redis/log4.conf
12481 redis     20   0 72816 35776   736 S  2.0  0.1  4h02:17  `- /usr/local/bin/redis-server /etc/redis/log3.conf
12475 redis     20   0 72816 27588   736 S  2.0  0.1  4h03:07  `- /usr/local/bin/redis-server /etc/redis/log2.conf
12460 redis     20   0 72816 31680   736 S  2.0  0.1  4h10:23  `- /usr/local/bin/redis-server /etc/redis/log1.conf
12440 redis     20   0 72816 33236   736 S  3.0  0.1  4h09:57  `- /usr/local/bin/redis-server /etc/redis/log0.conf
12435 redis     20   0 40048  7044   684 S  0.0  0.0  2:21.71  `- /usr/local/bin/redis-server /etc/redis/redis-servicelog.conf
12429 redis     20   0  395M  115M   736 S 33.0  0.5 60h29:26  `- /usr/local/bin/redis-server /etc/redis/redis-pool.conf
12422 redis     20   0 40048  7096   728 S  0.0  0.0 26:17.38  `- /usr/local/bin/redis-server /etc/redis/redis-load.conf
12409 redis     20   0 40048  6912   560 S  0.0  0.0  2:21.50  `- /usr/local/bin/redis-server /etc/redis/redis-cache.conf

and very light MongoDB loads for 1800 – 2500 crud operations a minute:

insert  query update delete getmore command flushes mapped  vsize    res faults locked % idx miss %     qr|qw   ar|aw  netIn netOut  conn       time
2      9      5      2       0       8       0  6.67g  14.8g  1.22g      0      0.1          0       0|0     0|0     3k     7k   116   01:11:12
1      1      5      2       0       6       0  6.67g  14.8g  1.22g      0      0.1          0       0|0     0|0     2k     3k   116   01:11:13
0      3      6      2       0       8       0  6.67g  14.8g  1.22g      0      0.2          0       0|0     0|0     3k     6k   114   01:11:14
0      5      5      2       0      12       0  6.67g  14.8g  1.22g      0      0.1          0       0|0     0|0     3k     5k   113   01:11:15
1      9      7      2       0      12       0  6.67g  14.8g  1.22g      0      0.1          0       0|0     0|0     4k     6k   112   01:11:16
1     10      6      2       0      15       0  6.67g  14.8g  1.22g      0      0.1          0       0|0     1|0     4k    22k   111   01:11:17
1      5      6      2       0      11       0  6.67g  14.8g  1.22g      0      0.2          0       0|0     0|0     3k    19k   111   01:11:18
1      5      5      2       0      14       0  6.67g  14.8g  1.22g      0      0.1          0       0|0     0|0     3k     3k   111   01:11:19
1      2      6      2       0       8       0  6.67g  14.8g  1.22g      0      0.2          0       0|0     0|0     3k     2k   111   01:11:20
1      7      5      2       0       9       0  6.67g  14.8g  1.22g      0      0.1          0       0|0     0|0     3k     2k   111   01:11:21
insert  query update delete getmore command flushes mapped  vsize    res faults locked % idx miss %     qr|qw   ar|aw  netIn netOut  conn       time
2      9      8      2       0       8       0  6.67g  14.8g  1.22g      0      0.2          0       0|0     0|0     4k     5k   111   01:11:22
3      8      7      2       0       9       0  6.67g  14.8g  1.22g      0      0.2          0       0|0     0|0     4k     9k   110   01:11:23
2      6      6      2       0      10       0  6.67g  14.8g  1.22g      0      0.2          0       0|0     0|0     3k     4k   110   01:11:24
2      8      6      2       0      21       0  6.67g  14.8g  1.22g      0      0.2          0       0|0     0|0     4k    93k   112   01:11:25
1     10      7      2       3      16       0  6.67g  14.8g  1.22g      0      0.2          0       0|0     0|0     4k     4m   112   01:11:26
3     15      7      2       3      24       0  6.67g  14.8g  1.23g      0      0.2          0       0|0     0|0     6k     1m   115   01:11:27
1      4      8      2       0      10       0  6.67g  14.8g  1.22g      0      0.2          0       0|0     0|0     4k     2m   115   01:11:28
1      6      7      2       0      14       0  6.67g  14.8g  1.22g      0      0.2          0       0|0     0|0     4k     3k   115   01:11:29
1      3      6      2       0      10       0  6.67g  14.8g  1.22g      0      0.1          0       0|0     0|0     3k   103k   115   01:11:30
2      3      6      2       0       8       0  6.67g  14.8g  1.22g      0      0.2          0       0|0     0|0     3k    12k   114   01:11:31
insert  query update delete getmore command flushes mapped  vsize    res faults locked % idx miss %     qr|qw   ar|aw  netIn netOut  conn       time
0     12      6      2       0       9       0  6.67g  14.8g  1.22g      0      0.2          0       0|0     0|0     4k    31k   113   01:11:32
2      4      6      2       0       8       0  6.67g  14.8g  1.22g      0      0.1          0       0|0     0|0     3k     9k   111   01:11:33
2      9      6      2       0       7       0  6.67g  14.8g  1.22g      0      0.1          0       0|0     0|0     3k    21k   111   01:11:34
0      8      7      2       0      14       0  6.67g  14.8g  1.22g      0      0.2          0       0|0     0|0     4k     9k   111   01:11:35
1      4      7      2       0      11       0  6.67g  14.8g  1.22g      0      0.2          0       0|0     0|0     3k     5k   109   01:11:36
1     15      6      2       0      19       0  6.67g  14.8g  1.22g      0      0.1          0       0|0     0|0     5k    11k   111   01:11:37
2     17      6      2       0      19       1  6.67g  14.8g  1.22g      0      0.2          0       0|0     0|0     6k   189k   111   01:11:38
1     13      7      2       0      15       0  6.67g  14.8g  1.22g      0      0.2          0       0|0     1|0     5k    42k   110   01:11:39
2      7      5      2       0      77       0  6.67g  14.8g  1.22g      0      0.1          0       0|0     2|0    10k    14k   111   01:11:40
2     10      5      2       0     181       0  6.67g  14.8g  1.22g      0      0.1          0       0|0     0|0    21k    14k   112   01:11:41
insert  query update delete getmore command flushes mapped  vsize    res faults locked % idx miss %     qr|qw   ar|aw  netIn netOut  conn       time
1     11      5      2       0      12       0  6.67g  14.8g  1.22g      0      0.1          0       0|0     0|0     4k    13k   116   01:11:42
1     11      5      2       1      33       0  6.67g  14.8g  1.22g      0      0.1          0       0|0     3|0     6k     2m   119   01:11:43
0      9      5      2       0      17       0  6.67g  14.8g  1.22g      0      0.1          0       0|0     1|0     5k    42k   121   01:11:44
1      8      7      2       0      25       0  6.67g  14.8g  1.22g      0      0.2          0       0|0     0|0     6k    24k   125   01:11:45

(via Highscalability.com)

MongoDB Schema Design at Scale

I had the recent opportunity to present a talk at MongoDB Seattle onSchema Design at Scale. It’s basically a short case study on what steps the MongoDB Monitoring Service (MMS) folks did to evolve their schema, along with some quantitative performance comparisons between different schemas. Given that one of my most widely read blog posts is still MongoDB’s Write Lock, I decided that my blog readers would also be interested in the quantitative comparison as well.

MongoDB Monitoring Service

First off, I should mention that I am not now, nor have I ever been, an employee of 10gen, the company behind MongoDB. I am, however, a longtime MongoDB user, and have seen a lot of presentations and use cases on the database. My knowledge of MMS’s internal design comes from watching publically-available talks. I don’t have any inside knowledge or precise performance numbers, so I decided to do some experiments on my own to see the impact of different schema designs they might have used to build MMS.

So what is MMS, anyway? The MongoDB Monitoring Service is a free service offered by 10gen to all MongoDB users to monitor several key performance indicators on their MongoDB installations. The way it works is this:

  • You download a small script that you run on your own servers that will periodically upload performance statistics to MMS.
  • You access reports through the MMS website. You can graph per-minute performance of any of the metrics as well as see historical trends.

Eating your own dogfood

When 10gen designed MMS, they decided that it would not only be a useful service for those who have deployed MongoDB, but that it would also be a showcase of MongoDB’s performance, keeping the performance graphs updated in real time across all customers and servers. To that end, they store all the performance metrics in MongoDB documents and get by on a modest (I don’t know exactlyhow modest) cluster of MongoDB servers.

To that end, it was extremely important in the case of MMS to use the hardware they had allocated efficiently. Since this service is available for real-time reporting 24 hours per day, they had to make design the system to be responsive even under “worst-case” conditions, avoiding anything in the design that would cause an uneven performance during the day.

Building an MMS-like system

Since I don’t have access to the actual MMS software, I decided to build a system that’s similar to MMS. Basically, what I wanted was a MongoDB schema that would allow me to keep per-minute counters on a collection of different metrics (we could imagine something like a web page analytics system using such a schema, for example).

In order to keep everything compact, I decided to keep a day’s statistics inside a single MongoDB document. The basic schema is the following:

{
    _id: "20101010/metric-1",
    metadata: {
        date: ISODate("2000-10-10T00:00:00Z"),
        metric: "metric-1" },
    daily: 5468426,
    hourly: {
        "00": 227850,
        "01": 210231,
        ...
        "23": 20457 },
    minute: {
        "0000": 3612,
        "0001": 3241,
        ...
        "1439": 2819 }
}

Here, we keep the date and metric we’re storing in a “metadata” property so we can easily query it later. Note that the date and metric name are also embedded in the _id field as well (that will be important later). Actual metric data is stored in the dailyhourly, and minute properties.

Now if we want to update this document (say, to record a hit to a web page), we can use MongoDB’s in-place update operators to increment the appropriate daily, hourly, and per-minute counters. To further simplify things, we’ll use MongoDB’s “upsert” feature to create a document if it doesn’t already exist (this prevents us from having to allocate the documents ahead-of-time). The first version of our update method, then, looks like this:

def record_hit(coll, dt, measure):
    sdate = dt.strftime('%Y%m%d')
    metadata = dict(
        date=datetime.combine(
            dt.date(),
            time.min),
        measure=measure)
    id='%s/%s' % (sdate, measure)
    minute = dt.hour * 60 + dt.minute
    coll.update(
        { '_id': id, 'metadata': metadata },
        { '$inc': {
                'daily': 1,
                'hourly.%.2d' % dt.hour: 1,
                'minute.%.4d' % minute: 1 } },
        upsert=True)

To use this to record a “hit” to our website, then, we would simply call it with our collection, the current date, and the measure being updated:

>>> record_hit(db.daily_hits, datetime.utcnow(), '/path/to/my/page.html')

Measuring performance

To measure the performance of this approach, I created a 2-server cluster on Amazon EC2: one server to run MongoDB and one to run my benchmark code to do a bunch of record_hit() calls, simulating different times of day to see the performance over multiple 24-hour periods. This is what I found:

Initial Schema Performance

Ouch! For some reason, we see the performance of our system steadily decrease from 3000-5000 writes per second to 200-300 writes per second as the day goes on. This, it turns out, happens because our “in-place” update was not, in fact, in-place.

Growing documents

MongoDB allows you great flexibility when updating your documents, even allowing you to add new fields and cause the documents to grow in size over time. And as long as your documents don’t growtoo much, everything just kind of works. MongoDB will allocate some “padding” to your documents, assuming some growth, and as long as you don’t outgrow your padding, there’s really very little performance impact.

Once you do outgrow your padding, however, MongoDB has to moveyour document to another location. As your documeng gets bigger, this takes longer (more bytes to copy and all that). So documents that grow and grow and grow are a real performance-killer with MongoDB. And that’s exactly what we have here. Consider the first time we call record_hit during a day. After this, the document looks like the following:

{
    _id: ...,
    metadata: {...},
    daily: 1,
    hourly: { "00": 1 }, 
    minute: { ""0000": 1 }
}

Then we record a hit during the second minute of a day and our document grows:

{
    _id: ...,
    metadata: {...},
    daily: 2,
    hourly: { "00": 2 }, 
    minute: { ""0000": 1, "0001": 1 }
}

Now, even if we’re only recording a single hit per minute, our document had to grow 1439 times, and by the end of the day it takes up substantially more space than it did when we recorded our first hit just after midnight.

Fixing with preallocation

The solution to the problem of growing documents is preallocation. However, we’d prefer not to preallocate all the documents at once (this would cause a large load on the server), and we’d prefer not to manually schedule documents for preallocation throughout the day (that’s just a pain). The solution that 10gen decided upon, then, was to randomly (with a small probability) preallocate tomorrow’sdocument each time we record a hit today.

In the system I designed, this preallocation is performed at the beginning of record_hit:

def record_hit(coll, dt, measure):
    if PREALLOC and random.random() < (1.0/2000.0):
        preallocate(coll, dt + timedelta(days=1), measure)
    # ...

Our preallocate function isn’t that interesting, so I’ll just show the general idea here:

def preallocate(coll, dt, measure):
    metadata, id = # compute metadata and ID
    coll.update( 
       { '_id': id },
       { '$set': { 'metadata': metadata },
         '$inc': { 
             'daily': 0,
             'hourly.00': 0,
             # ...
             'hourly.23': 0,
             'minute.0000': 0,
             # ...
             'minute.1439: 0 } },
       upsert=True)

There are two important things to note here:

  • Our preallocate function is safe. If by some chance we call preallocate on a date/metric that already has a document, nothing changes.
  • Even if preallocate is never called, record_hit is still functionally correct, so we don’t have to worry about the small probability that we get through a whole day without preallocating a document.

Now with these changes in place, we see much better performance:

Performance with Preallocation

We’ve actually improvide performance in two ways using this approach:

  • Preallocation means that our documents never grow, so they never get moved
  • By preallocating throughout the day, we don’t have a “midnight problem” where our upserts all end up inserting a new document and increasing load on the server.

We do, however, have a curious downward trend in performance throughout the day (though much less drastic than before). Where did that come from?

MongoDB’s storage format

To figure out the downward performance through the day, we need to take a brief detour into the actual format that MongoDB uses to store data on disk (and memory), BSON. Normally, we don’t need to worry about it, since the pymongo driver converts everything so nicely into native Python types, but in this case BSON presents us a performance problem.

Although MongoDB documents, such as our minute embedded document, are represented in Python as a dict (which is a constant-speed lookup hash table), BSON actually stores documents as anassociation list. So rather than having a nice hash table for minute, we actually have something that looks more like the following:

minute = [
    [ "0000", 3612 ],
    [ "0001", 3241 ],
    # ...
    [ "1439", 2819 ] ]

Now to actually update a particular minute, the MongoDB server performs the something like the following operations (psuedocode, with lots of special cases ignored):

inc_value(minute, "1439", 1)

def inc_value(document, key, value)
    for entry in document:
        if entry[0] == key:
            entry[1] += value
            break

The performance of this algorithm, far from our nice O(1) hash table, is actually O(N) in the number of entries in the document. In the case of the minute document, MongoDB has to actually perform 1439comparisons before it finds the appropriate slot to update.

Fixing the downward trend with hierarchy

To fix the problem, then, we need to reduce the number of comparisons MongoDB needs to do to find the right minute to increment. The way we can do this is by splitting up the minutes into hours. Our daily stats document now looks like the following:

{ _id: "20101010/metric-1",
  metadata: {
    date: ISODate("2000-10-10T00:00:00Z"),
    metric: "metric-1" },
  daily: 5468426,
  hourly: {
    "0": 227850,
    "1": 210231,
    ...
    "23": 20457 },
  minute: {
    "00": {        
        "0000": 3612,
        "0100": 3241,
        ...
    }, ...,
    "23": { ..., "1439": 2819 }
}

Our record_hit and preallocate routines have to change a bit as well:

def record_hit_hier(coll, dt, measure):
    if PREALLOC and random.random() < (1.0/1500.0):
        preallocate_hier(coll, dt + timedelta(days=1), measure)
    sdate = dt.strftime('%Y%m%d')
    metadata = dict(
        date=datetime.combine(
            dt.date(),
            time.min),
        measure=measure)
    id='%s/%s' % (sdate, measure)
    coll.update(
        { '_id': id, 'metadata': metadata },
        { '$inc': {
                'daily': 1,
                'hourly.%.2d' % dt.hour: 1,
                ('minute.%.2d.%.2d' % (dt.hour, dt.minute)): 1 } },
        upsert=True)

def preallocate(coll, dt, measure):
    '''Once again, simplified for explanatory purposes'''
    metadata, id = # compute metadata and ID
    coll.update( 
       { '_id': id },
       { '$set': { 'metadata': metadata },
         '$inc': { 
             'daily': 0,
             'hourly.00': 0,
             # ...
             'hourly.23': 0,
             'minute.00.00': 0,
             # ...
             'minute.00.59': 0,
             'minute.01.00': 0,
             # ...
             'minute.23.59': 0 } },
       upsert=True)

Once we’ve added the hierarchy and re-run our experiment, we get the nice, level performance we’d like to see:

Performance with Hierarchical Minutes

Conclusion

It’s always nice to see “tips and tricks” borne out through actual, quantitative results, so this was probably the most enjoyable talk I’ve ever put together. The things I got out of it were the following:

  • Growing documents is a very bad thing for performance. Avoid it if at all possible.
  • Awareness of the BSON specification and data representation can actually be quite useful when diagnosing performance problems.
  • To get the best performance out of your system, you need to actually run it (or a highly representative stand-in). Actually seeing the results of performance tweaking in graphical form is incredibly helpful in targeting your efforts.

The source code for all these updates is available in my mongodb-sdas Github repo, and I welcome any feedback either there, or here in the comments. In particular, I’d love to hear of any performance problems you’ve run into and how you got around them. And of course, if you’ve got a really perplexing problem, I’m always available for consulting by emailing me at Arborian.com.

(via blog.pythonisito.com)

Derby introduce

MVC framework making it easy to write realtime, collaborative applications that run in both Node.js and browsers.

Introduction

Derby includes a powerful data synchronization engine called Racer. While it works differently, Racer is to Derby somewhat like ActiveRecord is to Rails. Racer automatically syncs data between browsers, servers, and a database. Models subscribe to changes on specific objects and queries, enabling granular control of data propagation without defining channels. Racer supports offline usage and conflict resolution out of the box, which greatly simplifies writing multi-user applications. Derby makes it simple to write applications that load as fast as a search engine, are as interactive as a document editor, and work offline.

Features

  • HTML templates: Handlebars-like templates are rendered into HTML on both the server and client. Because they render on the server, pages display immediately—even before any scripts are downloaded. Templates are mostly just HTML, so designers can understand and modify them.
  • View bindings: In addition to HTML rendering, templates specify live bindings between the view and model. When model data change, the view updates the properties, text, or HTML necessary to reflect the new data. When the user interacts with the page—such as editing the value of a text input—the model data updates.
  • Client and server routing: The same routes produce a single-page browser app and an Express server app. Links render instantly with push/pop state changes in modern browsers, while server rendering provides access to search engines and browsers without JavaScript.
  • Model syncing: Model changes are automatically synchronized with the server and all clients subscribed to the same data over Socket.IO.
  • Customizable persistence: Apps function fully with in-memory, dynamic models by default. Apps can also use the racer-db-mongo plugin to add MongoDB support with no change to the application code. Any changes to data made within the app are automatically persisted. Adding support for other databases is simple.
  • Conflict resolution: The server detects conflicts, enabling clients to respond instantly and work offline. Multiple powerful techniques for conflict resolution are included.

Why not use Rails and Backbone?

Derby represents a new breed of application frameworks, which we believe will replace currently popular libraries like Rails and Backbone.

Adding dynamic features to apps written with RailsDjango, and other server-side frameworks tends to produce a tangled mess. Server code renders various initial states while jQuery selectors and callbacks desperately attempt to make sense of the DOM and user events. Adding new features typically involves changing both server and client code, often in different languages.

Many developers now include a client MVC framework like Backbone to better structure client code. A few have started to use declarative model-view binding libraries, such as Knockout and Angular, to reduce boilerplate DOM manipulation and event bindings. These are great concepts, and adding some structure certainly improves client code. However, they still lead to duplicating rendering code and manually synchronizing changes in increasingly complex server and client code bases. Not only that, each of these pieces must be manually wired together and packaged for the client.

Derby radically simplifies this process of adding dynamic interactions. It runs the same code in servers and browsers, and it syncs data automatically. Derby takes care of template rendering, packaging, and model-view bindings out of the box. Since all features are designed to work together, no code duplication and glue code are needed. Derby equips developers for a future when all data in all apps are realtime.

Flexibility without the glue code

Derby eliminates the tedium of wiring together a server, server templating engine, CSS compiler, script packager, minifier, client MVC framework, client JavaScript library, client templating and/or bindings engine, client history library, realtime transport, ORM, and database. It eliminates the complexity of keeping state synchronized among models and views, clients and servers, multiple windows, multiple users, and models and databases.

At the same time, it plays well with others. Derby is built on top of popular libraries, including Node.jsExpressSocket.IOBrowserifyStylusLESSUglifyJSMongoDB, and soon other popular databases and datastores. These libraries can also be used directly. The data synchronization layer, Racer, can be used separately. Other client libraries, such as jQuery, and other Node.js modules from npm work just as well along with Derby.

When following the default file structure, templates, styles, and scripts are automatically packaged and included in the appropriate pages. In addition, Derby can be used via a dynamic API, as seen in the simple example above.

Visit Derbyjs.com for more information

Hacker commandeers GitHub to prove Rails vulnerability

A Russian hacker dramatically demonstrated one of the most common security weaknesses in the Ruby on Rails web application language. By doing so, he took full control of the databases GitHub uses to distribute Linux and thousands of other open-source software packages.

Egor Homakov exploited what’s known as a mass assignment vulnerability in GitHub to gain administrator access to the Ruby on Rails repository hosted on the popular website. The weekend hack allowed him to post an entry in the framework’s bug tracker dated 1,001 years into the future. It also allowed him to gain write privileges to the code repository. He carried out the attack by replacing a cryptographic key of a known developer with one he created. While the hack was innocuous, it sparked alarm among open-source advocates because it could have been used to plant malicious code in repositories millions of people use to download trusted software.

Homakov launched the attack two days after he posted a vulnerability report to the Rails bug list warning mass assignments in Rails made the websites relying on the developer language susceptible to compromise. A variety of developers replied with posts saying the vulnerability is already well known and responsibility for preventing exploits rests with those who use the language. Homakov responded by saying even developers for large sites for GitHub, PosterSpeakerdeck, and Scribd were failing to adequately protect against the vulnerability.

In the following hours, participants in the online discussion continued to debate the issue. The mass assignment vulnerability is to Rails what SQL injection weaknesses are to other web applications. It’s a bug that’s so common many users have grown impatient with warnings about them. Maintainers of Rails have largely argued individual developers should single out and “blacklist” attributes that are too sensitive to security to be externally modified. Others such as Homakov have said Rails maintainers should turn on whitelist technology by default. Currently, applications must explicitly enable such protections.

A couple days into the debate, Homakov responded by exploiting mass assignment bugs in GitHub to take control of the site. Less than an hour after discovering the attack, GitHub administrators deployed a fix for the underlying vulnerability and initiated an investigation to see if other parts of the site suffered from similar weaknesses. The site also temporarily suspended Homakov, later reinstating him.

“Now that we’ve had a chance to review his activity, and have determined that no malicious intent was present, @homakov’s account has been reinstated,” a blog post published on Monday said. It went on to encourage developers to practice “responsible disclosure.”

Updated to differentiate between Ruby and Rails.

(via arstechnica.com)

The HipHop Virtual Machine

We’re always looking for ways to make our computing infrastructure more efficient, and in 2010 we deployed HipHop for PHP to help support the growing number of Facebook users. While HipHop has helped us make significant gains in the performance of our code, its reliance on static compilation makes optimizing our code time consuming. We were also compelled to develop a separate HipHop interpreter (hphpi) that requires a lot of effort to maintain. So, early last year, we put together a small team to experiment with dynamic translation of PHP code into native machine code. What resulted is a new PHP execution engine based on the HipHop language runtime that we call the HipHop Virtual Machine (hhvm). We’re excited to report that Facebook is now using hhvm as a faster replacement for hphpi, with plans to eventually use hhvm for all PHP execution.

 

Facebook uses hphpi (and now hhvm) for day-to-day software development, but uses the HipHop compiler (hphpc) to create optimized binaries that serve the Facebook website. hphpc is in essence a traditional static compiler that converts PHP→AST→C++→x64. We have long been keenly aware of the limitations to static analysis imposed by such a dynamic language as PHP, not to mention the risks inherent in developing software with hphpi and deploying with hphpc. Our experiences with hphpc led us to start experimenting with dynamic translation to native machine code, also known as just-in-time (JIT) compilation. A dynamic translator can observe data types as the program executes, and generate type-specialized machine code. Unfortunately we didn’t have a clean model of PHP language semantics built into HipHop, as hphpc and hphpi are based directly on two distinct abstract syntax tree (AST) implementations, rather than sharing a unified intermediate representation. Therefore we developed a high-level stack-based virtual machine specifically tailored to PHP that executes HipHop bytecode (HHBC). hhvm uses hphpc’s PHP→AST implementation and extends the pipeline to PHP→AST→HHBC. We iteratively codeveloped both an interpreter and a dynamic HHBC→x64 translator that seamlessly interoperate, with the primary goal of cleanly supporting translation.

 

Throughout the hhvm project we have tended toward simple solutions. This is nowhere more evident than in the core premise of the dynamic translator itself. Most existing systems use method-at-a-time JIT compilation (e.g., Java and C#), though trace-based translation has also been explored in recent systems (e.g., Tamarin and TraceMonkey). We decided to try a very simple form of tracing that limits each trace to a single basic block with known input types. This “tracelet” approach simplifies the trace cache management problem, because complex application control flow cannot create a combinatorial explosion of traces. Each tracelet has a simple three-part structure:

  • Type guard(s)
  • Body
  • Linkage to subsequent tracelet(s)

The type guards prevent execution for incompatible input types, and the remainder of the tracelet does the real work. Each tracelet has great freedom, the only requirement being that it restore the virtual machine to a consistent state any time execution escapes the tracelet and its helpers. The obvious disadvantage is that tracelet guards may repeat unnecessary work. Thus far we have not found this to be a problem, but we do have some solutions in mind should a need arise.

 

For those who are interested in understanding HHBC in detail, the bytecode specification is available in the HipHop source tree. However, detailed knowledge should not be necessary for understanding the following example:

 

Example PHP program translated to HHBC

 

f() is executed twice, for which the translator creates three tracelets total as shown below. f($a, 42) causes creation of tracelets A and B, and f($a, "hello") causes creation of tracelet C; B is used by both invocations.

 

Tracelets induced by f($a, 42) and f($a, “hello”)

 

How well does hhvm work? As compared to hphpi, the hhvm bytecode interpreter is approximately 1.6X faster for a set of real-world Facebook-specific benchmarks. Right now there is a stability gap between the hhvm interpreter and translator, which precludes us reporting translator performance for the same set of benchmarks. However, we can infer from a set of benchmarks based on the Language Shootout that translator performance is closer to hphpc-compiled program performance than to interpreter performance, as indicated by the geometric mean of the benchmarks (rightmost column in the following figure). The interpreters are all roughly 0.2X as fast as hphpc, and the translator is approximately 0.6X as fast. For perspective on why this matters, consider that many Facebook engineers spend their days developing PHP code in an endless edit-reload-debug cycle. The difference between 8-second and 5-second reloads due to switching from hphpi to the hhvm interpreter makes a big difference to productivity, and this improvement will be even more dramatic once we enable the translator.

 

 

 

We expect hhvm to rapidly close the performance gap with hphpc-compiled binaries over the coming months as the dynamic translator stabilizes and matures. In fact, we predict that hhvm will eventually outperform statically compiled binaries in Facebook’s production environment, in part because we are already sharing enough infrastructure with the static compiler that we will soon be able to leverage static analysis results during tracelet creation.

 

Many challenges remain, as well as some uncertainty regarding the translator’s behavior when running the entirety of Facebook’s PHP codebase. In the near term we need to stabilize the translator and create an on-disk bytecode format (to reduce startup time and to store global static analysis results). Then we will need to optimize/tune both the translator and the interpreter as we observe how the system behaves under production workloads. Here are just a couple of the interesting problems we will soon face:

  • The x64 machine code that the translator generates consumes approximately ten times as much memory as the corresponding HHBC. CPU instruction cache misses are a limiting factor for the large PHP applications that Facebook runs, so a hybrid between interpretation and translation may outperform pure translation.
  • The translator currently makes no use of profile feedback, though we do have sample-based profiling infrastructure in place. Profile-guided optimization is an especially interesting problem for us because Facebook dynamically reconfigures its website between code pushes. Hot code paths mutate over time, and portions of the x64 translation cache effectively become garbage. One solution may be to repeatedly create a new translation cache based on the previous one, taking advantage of recent profile data. This approach has a lot in common with semi-space garbage collection.

The first 90% of the hhvm project is done; now we’re on to the second 90% as we make it really shine. The hhvm code is deeply integrated with the HipHop source code, and we will continue to share HipHop via the public GitHub site for HipHop, just as we have since HipHop’s initial release in early 2010. We hope that the PHP community will find hhvm useful as it matures and engage with us to broaden its usefulness through technical discussions, bug reports, and code contributions. We actively monitor the GitHub site and mailing list.

 

Jason Evans is a software engineer on the HipHop team at Facebook, one of nearly 20 people who have contributed to the hhvm project so far.

(via Facebook)

Node.js Now Runs Natively on Windows

Node.js can now run on Windows without Cygwin, the performance being significantly improved both on Windows and UX systems.

Ryan Dahl, the creator of Node.js, has announced Node.js 0.6, a new stable version of the server-side JavaScript environment, the most important feature being the support of native Windows using I/O Completion Ports for sockets. Previous versions of Node.js could be used on Windows only via Cygwin, but now Cygwin builds are no longer supported. Besides Windows, Node.js is also supported on Linux, Mac OS X, webOS, and a number of UXes: Solaris, FreeBSD, and OpenBSD.

Dahl declared that porting to Windows involved major architectural changes, but most of the API has remained unchanged and the performance on Unix systems was not affected as it was initially feared. He gave some numbers as a proof:

Linux v 0.4.12 Linux v 0.6.0
http_simple.js /bytes/1024 5461 r/s 6263 r/s
io.js read 19.75 MB/s 26.63 MB/s
io.js write 21.60 MB/s 17.40 MB/s
startup.js 74.7 ms 49.6 ms

The only problem seem to be writes where Node.js 0.6 has poorer performance on Linux. But by not using Cygwin, Node.js is having much better results on Windows as the following data shows:

Windows v 0.4.12 Windows v 0.6.0
http_simple.js /bytes/1024 3858 r/s 5823 r/s
io.js read 12.41 MB/s 26.51 MB/s
io.js write 12.61 MB/s 33.58 MB/s
startup.js 152.81 ms 52.04 ms

Other major improvements in Node.js 0.6 are: Integrated load balancing for running multiple Node processes on a cluster and Built-in binding to zlib library. There is also better support for IPC between Node instances, an improved debugger, and Node is now using the V8 3.6.

The API changes are detailed on a Node’s GitHub page.

Dahl has also mentioned that they plan a faster release cycle, trying to release a new version when Google releases a new V8 version with Chrome.

(via InfoQ.com)