G-WAN web server benchmarks (vs other web server) – so fast

  1. Apache (IBM)
  2. Apache Traffic Server (Yahoo!)
  3. IIS (Microsoft)
  4. GlassFish (Oracle)
  5. Tuxedo (Oracle)
  6. TntNet (Deutsche Boerse)
  7. Rock (Accoria)
  8. Tomcat / JBoss (Red Hat)
  9. Jetty
  10. PlayFramework
  11. Lighttpd
  12. Nginx
  13. Varnish (the Web ‘accelerator’)
  14. OpenTraker
  15. Mongoose
  16. Cherokee
  17. Monkey
  18. Libevent
  19. Libev
  20. ULib
  21. Poco
  22. ACE
  23. Boost
  24. Snorkel
  25. AppWeb

Let’s have a closer look at Nginx and Lighttpd, the raising stars: (Varnish, “the Web accelerator”, is included in this weighttp test because a previous AB test made by Nicolas Bonvin was criticized for not using Varnish’s “heavily threaded” architecture)

This test (detailled here) has been done on localhost to remove the network from the equation (a 1 GbE LAN is the bottleneck).This is a 100-byte static file served on a 6-Core CPU by 6 server workers (processes for Nginx/Lighty, threads for G-WAN) and requested by 6 weighttp threads.

G-WAN served 749,574 requests per second (Nginx 207,558 Lighty 215,614 and Varnish 126,047).


On average, G-WAN is 725% faster than Varnish, 415% faster than Nginx and 318% faster than Lighty.


G-WAN only has a flat memory usage: 45x less RAM than Varnish, 2.8x less RAM than Nginx and 4.4x less than Lighttpd:

 Server   Total Min.   Total Avg.   Total Max.      RAM        User       Kernel
 ------   ----------   ----------   ----------   ---------   ---------   ---------
 Nginx    15,072,297   15,927,773   16,797,720    11.93 MB   1,000,270   1,910,443
 Lighty   21,273,484   21,631,876   21,897,404    20.12 MB   1,087,810   1,684,312
 Varnish   8,817,943    9,612,933   10,399,610   223.86 MB   2,699,356   1,543,446
 G-WAN    64,266,023   69,659,350   72,930,727     5.03 MB     243,166     572,618

And despite being the fastest, G-WAN uses 4x to 11x less user-mode and 3.3x less kernel-mode jiffies (CPU time).

Dynamic Contents: Scripts vs. Compiled code

The success of scripts comes from the instant gratification that interpreters provide: at the press of a key your code executes instantly. Compare this with the compilation and linkage cycles of Apache, Lighttpd or Nginx modules (which require server stops & restarts).

But since low performance comes at a cost, interpreters left the scene to invite compilers (Facebook HipHop PHP => C++ translator).

G-WAN C scripts offer the best of both worlds: the convenience of scripts and the speed of compiled code.

Comparing G-WAN C scripts / Apache + PHP / GlassFish + Java / IIS + C#

To evaluate the performance gap of poorly executed convenience, we invited the major scripted language vendors to review our version of the loan.c script ported in their language – and then we used an ApacheBench wrapper to test them all:

Note that the vertical axis (requests per second) uses an exponential scale.This is a 100-year loan test with G-WAN v2.10 (you can also see many older tests).

One single request of loan(100) takes:

›   G-WAN + C script …………… 0.5 ms
›   Apache + PHP ……………… 12.6 ms
›   GlassFish + Java ……………42.3 ms
›   IIS + ASP.Net C# ………… 171.8 ms

Not all script engines (nor all servers) are equal in the light of a dynamic content benchmark – even without concurrency:

Note that G-WAN serves up to 322,000 100-year loans (131.4 KB) per second.This is faster than Nginx or Lighty merely serving a 100-byte static file (Nginx 207,558 RPS and Lighty 215,614 RPS, see the first chart of this page).G-WAN makes AJAX Web apps scale better than static files served by Nginx!

Note also that how the user-mode CPU time is low and constant while the kernel-mode time follows the RPS curve.

And see how memory usage grows with RPS – but remains constant once the RPS curve is stabilized.

When sollicited, multi-Cores are great!

This may be why publicly available comparative benchmarks are a so scare resource – and why publishing anything relevant triggers the ire of the Censorship & F.U.D. departments. When, for the last time, have you seen G-WAN’s feats discussed by the same press where each micro-second saved by Facebook makes the headlines?

The sum it up: using the least efficient technologies comes at an hefty (recurring) cost.

And the fact that 40-year old technology outdoes more ‘modern’ (complex and patented) alternatives by several orders of magnitude may disturb those among us who still have their mind (and those who have to sign big checks).

Comparing G-WAN to Nginx, Lighttpd, Varnish and Apache Traffic Server (ATS)

Using a low-end Intel Core i3 laptop and a 100-byte static file, an independent expert [1] of the EPFL’s Distributed Information Systems Laboratory has evaluated the performance of the best Web servers (and “Web server accelerators” like Varnish and ATS) by using this open-source ApacheBench wrapper:

Apache Traffic Server (Yahoo!) vs G-WAN vs Lighttpd vs Nginx vs Varnish (Facebook)The authors of Nginx and Varnish have participated to this study (click the chart to read it).

They helped to tune their server: Igor went as far as to ask new benchmarks for the latest version of Nginx and then, when Nginx v1.0 had no effect, to re-build a features-stripped-down Nginx in an attempt to catch with G-WAN’s performance.

Varnish provided several versions of its configuration file and, like Igor for Nginx, requested several benchmarks to be done.

Their efforts did not make any difference: G-WAN (without tuning and with more features than all others) is just much faster.

Note that G-WAN v2.1, the version tested by Nicolas Bonvin, was twice faster and used twice less CPU resources than Nginx – but it used more memory. This pushed us to at least equal Nginx’s feat: G-WAN v2.8+ uses less memory than Nginx (while offering many more features like a ‘wait-free’ KV store – the first of its kind – and ANSI C scripts to generate dynamic contents).

Why localhost tests are relevant?

We have all read that “benchmarks on localhost do not reflect reality”.

Sure, there is no substitute to a 100GbE network of tenths of thousands of inter-connected machines driven by human users available to test your Web application each time you need it. But not everybody can afford this kind of tests.

And the most relevant substitute is localhost: without bandwidth limits you will test the server rather than the network (which is tuned by “PR” benchmarks with OS kernel patches, multi-homed servers using arrays of 10 Gbps NICs and tuned drivers, high-end switches, etc.).

Everybody has access to same (free and standard) localhost.

A test that everybody can duplicate has certainly some value – especially if your goal is merely to compare how different HTTP servers behave under heavy loads (CPU, memory usage and performance: requests per second).

With 400 GbE networks in the works, the question of how fast servers can run becomes crucial.

Why CPU load matters?

When we will have 1,000-Core CPUs with address bus saturation resolved, the Linux kernel will make a machine hundreds of times faster. And G-WAN will be much faster without requiring any modification. This is because G-WAN/Linux scales on multi-Core systems while using little CPU resources (see the first chart of this page).

For many parallelized Web servers, the limit that prevents them from scaling is their own user-mode code rather than the kernel or CPU cache probing issues.

A high CPU usage without performance just reveals how much fat a parallelized process is dragging. Reasons range from bloated implementations to inadequate designs, both being increasingly visible as clients concurrency grows (but the latter bites even harder).

Researchers have used existing software like IBM Apache to experiment with diverse paralellized strategies, locality and cache updates. What contributed to limit the range of their experimentations is the overhead of the user-mode code used to measure more subtle interactions. Research will clearly need to focus on implementation to get relevant results with design issues.

Conclusion: 1,000-Core CPUs will make very few Web servers (and even less script engines) fly higher.

Why memory footprint matters?

Whether you are hosting several Web sites or need to run several applications on a single machine (or just want to get the best possible performance) then using as little RAM as possible helps.

This is because (a) accessing system memory is immensely slower than accessing the CPU caches and (b) those caches have limited sizes.

The less you are using memory, the more your code and data have chances to stay loaded in the fast CPU caches – and the faster your code executes.

At 200 KB (script engine included), G-WAN leaves plenty of space for your code and data.

(via G-WAN)

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s