Comparing the Performance of Various Web Frameworks

TechEmpower has been running benchmarks for the last year, attempting to measure and compare the performance of web frameworks. For these benchmarks the term “framework” is used loosely including platforms and micro-frameworks.

The results of the first benchmark were published in March 2013, and the tests were conducted on two hardware configurations: 2 Amazon EC2 m1.large instances and 2 Intel Sandy Bridge Core i7-2600K with 8GB of RAM, both setups using Gbit Ethernet. The frameworks had to reply with a simple JSON response {"message" : "Hello, World!"}. Following is an excerpt of the results, showing 10 of the most common web frameworks out of 24 benchmarked along with their results measured in Requests per second (RPS) and percentage of the best result.

image

Over the year, the benchmark evolved to cover 119 frameworks, including multiple types of queries and responses and running on one additional hardware configuration: 2 x Dell R720xd Dual-Xeon E5 v2 with 32GB of RAM each and 10 Gbit Ethernet.

Following are excerpts from the recently published 9th round of benchmarking, this time performed on three different hardware configurations:

image

Several observations:

While high-end hardware certainly performs better than lower end in terms of RPS, there is a major change in hierarchy for the top frameworks, as it is the case for Xeon E5 vs. Intel I7. This can be correlated with the fact that there are many at over 90% for Intel I7, TechEmpower attributing it to a bottleneck generated by the Ethernet 1Gbit used in the respective configuration:

Since the first round, we have known that the highest-performance frameworks and platforms have been network-limited by our gigabit Ethernet…. With ample network bandwidth, the highest-performing frameworks are further differentiated.

Google’s Go is the first on Intel I7, but has a major drop in performance on Xeon E5 in spite of using all cores, emphasizing the importance of performance tuning for certain types of hardware architectures, as noticed by TechEmpower:

The Peak [Xeon E5] test environment uses servers with 40 HT cores across two Xeon E5 processors, and the results illustrate how well each framework scales to high parallelism on a single server node.

Of course, other variables are at play, such as significantly more HT processor cores–40 versus 8 for our i7 tests–and a different system architecture, NUMA.

Amazon EC2 M1.large is severely underperforming compared with other configurations, making one wonder if the price/performance ratio is still attractive for cloud computing instances.

-*) Most frameworks are based on Linux. No Windows framework was benchmarked on EC2 or I7. The best Windows framework (plain-windows) performed at 18.6% of the leader on E5. Also, Facebook’s HHVM does not appear on EC2. It is also likely that HHVM is tuned differently when it runs on Facebook’s hardware.

Some of the well known frameworks are severely underperforming on all hardware configurations used, such as Grails, Spring, Django and Rails.

The performance of the recently released Dart 1.3 has doubled and should be on par with Node.js’ but it does not show in the benchmark because this round used Dart 1.2. This may be applicable to other frameworks that have been improved lately and the improvements will show up in future benchmarks.

As it is with other performance benchmarks, the results are debatable since they depend on the hardware, network and system configurations. TechEmpower invites those interested in bumping up the results for their favorite framework to fork the code on GitHub.

The benchmark includes detailed explanations on how tests are conducted, the hardware configurations, and the components of each framework: the language, web server, OS, etc.

(via InfoQ.com)

How To Scale A Web Application Using A Coffee Shop As An Example

This is a guest repost by Sriram Devadas, Engineer at Vistaprint, Web platform group. A fun and well written analogy of how to scale web applications using a familiar coffee shop as an example. No coffee was harmed during the making of this post.

I own a small coffee shop.

My expense is proportional to resources
100 square feet of built up area with utilities, 1 barista, 1 cup coffee maker.

My shop’s capacity
Serves 1 customer at a time, takes 3 minutes to brew a cup of coffee, a total of 5 minutes to serve a customer.

Since my barista works without breaks and the German made coffee maker never breaks down,
my shop’s maximum throughput = 12 customers per hour.

 

Web Server

Customers walk away during peak hours. We only serve one customer at a time. There is no space to wait.

I upgrade shop. My new setup is better!

Expenses
Same area and utilities, 3 baristas, 2 cup coffee maker, 2 chairs

Capacity
3 minutes to brew 2 cups of coffee, ~7 minutes to serve 3 concurrent customers, 2 additional customers can wait in queue on chairs.

Concurrent customers = 3, Customer capacity = 5

 

Scaling Vertically

Business is booming. Time to upgrade again. Bigger is better!

Expenses
200 square feet and utilities, 5 baristas, 4 cup coffee maker, 3 chairs

Capacity goes up proportionally. Things are good.

In summer, business drops. This is normal for coffee shops. I want to go back to a smaller setup for sometime. But my landlord wont let me scale that way.

Scaling vertically is expensive for my up and coming coffee shop. Bigger is not always better.

 

Scaling Horizontally With A Load Balancer

Landlord is willing to scale up and down in terms of smaller 3 barista setups. He can add a setup or take one away if I give advance notice.

If only I could manage multiple setups with the same storefront…

There is a special counter in the market which does just that!
It allows several customers to talk to a barista at once. Actually the customer facing employee need not be a barista, just someone who takes and delivers orders. And baristas making coffee do not have to deal directly with pesky customers.

Now I have an advantage. If I need to scale, I can lease another 3 barista setup (which happens to be a standard with coffee landlords), and hook it up to my special counter. And do the reverse to scale down.

While expenses go up, I can handle capacity better too. I can ramp up and down horizontally.

 

Resource Intensive Processing

My coffee maker is really a general purpose food maker. Many customers tell me they would love to buy freshly baked bread. I add it to the menu.

I have a problem. The 2 cup coffee maker I use, can make only 1 pound of bread at a time. Moreover it takes twice as long to make bread.

In terms of time,
1 pound of bread = 4 cups of coffee

Sometimes bread orders clog my system! Coffee ordering customers are not happy with the wait. Word spreads about my inefficiency.

I need a way of segmenting my customer orders by load, while using my resources optimally.

 

Asynchronous Queue Based Processing

I introduce a token based queue system.

Customers come in, place their order, get a token number and wait.
The order taker places the order on 2 input queues – bread and coffee.
Baristas look at the queues and all other units and decide if they should pick up the next coffee or the next bread order.
Once a coffee or bread is ready, it is placed on an output tray and the order taker shouts out the token number. The customer picks up the order.

- The inputs queues and output tray are new. Otherwise the resources are the same, only working in different ways.

- From a customers point of view, the interaction with the system is now quite different.

- The expense and capacity calculations are complicated. The system’s complexity went up too. If any problem crops up, debugging and fixing it could be problematic.

- If the customers accept this asynchronous system, and we can manage the complexity, it provides a way to scale capacity and product variety. I can intimidate my next door competitor.

 

Scaling Beyond

We are reaching the limits of our web server driven, load balanced, queue based asynchronous system. What next?

My already weak coffee shop analogy has broken down completely at this point.

Search for DNS round robin and other techniques of scaling if you are interested to know more.

If you are new to web application scaling, do the same for each of the topics mentioned in this page.

My coffee shop analogy was an over-simplification, to spark interest on the topic of scaling web applications.

If you really want to learn, tinker with these systems and discuss them with someone who has practical knowledge.

(via HighScalability.com)

 

Scale PHP on Ec2 to 30,000 Concurrent Users / Server

RockThePost.com is a LAMP stack hosted on Ec2. We’re preparing to be featured in an email which will be sent to ~1M investors… all at the same time. For our 2 person engineering department, that meant we had to do a quick sanity check to see just how many people we can support concurrently.

Our app uses PHP’s Zend Framework 2. Our web servers are two m1.medium Ec2 machines with an ELB in front of them setup to split the load. On the backend, we’re running a master/slave MySQL database configuration. Very typical for most of the small data shops I’ve worked at.

Here are my opinions and thoughts in random order.

  • Use PHP’s APC feature. I don’t understand why this isn’t on by default but having APC enabled is really a requirement in order for a website to have a chance at performing well.
  • Put everything that’s not a .php request on a CDN. No need to bog down your origin server with static file requests. Our deployer puts everything on S3 and we abs path everything to CloudFront. Disclaimer: CloudFront has had some issues lately and we’ve recently set the site to instead serve all static materials directly from S3 until the CloudFront issues are resolved.
  • Don’t make connections to other servers in your PHP code, such as the database and memcache servers, unless it’s mandatory and there’s really NO other way to do what you’re trying to do. I’d guess that the majority of PHP web apps out there use a MySQL database for the backend session management and a memcache pool for caching. Making connections to other servers during your execution flow is not efficient. It blocks, runs up the CPU, and tends to hold up the line, as it were. Instead, use the APC key/value store for storing data in PHP and Varnish for full page caching.
  • I repeat… Use Varnish. In all likelihood, most of the pages on your site do not change, or hardly ever change. Varnish is the Memcache/ModRewrite of web server caching. Putting Varnish on my web server made the single biggest difference to my web app when I was load testing it.
  • Use a c1.xlarge. The m1.medium has only 1 CPU to handle all of the requests. I found that upgrading to a c1.xlarge, which has 8 CPU’s, really paid off. During my load test, I did an apt-get install nmon and then ran nmon and monitored the CPU. You can literally watch the server use all 8 of its cores to churn out pages.

Google Analytics shows us how many seconds an average user spends on an average page. Using this information, we ran some load tests with Siege and were able to extrapolate that with the above recommendations, we should be able to handle 30,000 users concurrently per web server on the “exterior” of the site. For “interior” pages which are PHP powered, we’re expecting to see something more along the lines of 1,000 concurrent sessions per server. In advance of the email blast, we’ll launch a bunch of c1.xlarges and then kill them off slowly as we get more comfortable with the actual load we’re seeing on blast day.

Finally, we will do more work to make our code itself more scaleable… however, the code is only about 4 months old and this is the first time we’ve ever been challenged to actually make it scale.

(source: coderwall.com)