- Apache (IBM)
- Apache Traffic Server (Yahoo!)
- IIS (Microsoft)
- GlassFish (Oracle)
- Tuxedo (Oracle)
- TntNet (Deutsche Boerse)
- Rock (Accoria)
- Tomcat / JBoss (Red Hat)
- Jetty
- PlayFramework
- Lighttpd
- Nginx
- Varnish (the Web ‘accelerator’)
- OpenTraker
- Mongoose
- Cherokee
- Monkey
- Libevent
- Libev
- ULib
- Poco
- ACE
- Boost
- Snorkel
- AppWeb
Let’s have a closer look at Nginx and Lighttpd, the raising stars: (Varnish, “the Web accelerator”, is included in this weighttp test because a previous AB test made by Nicolas Bonvin was criticized for not using Varnish’s “heavily threaded” architecture)
This test (detailled here) has been done on localhost to remove the network from the equation (a 1 GbE LAN is the bottleneck).This is a 100-byte static file served on a 6-Core CPU by 6 server workers (processes for Nginx/Lighty, threads for G-WAN) and requested by 6 weighttp threads.
G-WAN served 749,574 requests per second (Nginx 207,558 Lighty 215,614 and Varnish 126,047).
G-WAN IS THE ONLY SERVER WHICH SCALES ON SMP (MULTI-CORE) SYSTEMS:
On average, G-WAN is 725% faster than Varnish, 415% faster than Nginx and 318% faster than Lighty.
G-WAN IS THE FASTEST WEB SERVER. GUESS WHAT WILL HAPPEN ON 1024-CORE CPUS.
G-WAN only has a flat memory usage: 45x less RAM than Varnish, 2.8x less RAM than Nginx and 4.4x less than Lighttpd:
Server Total Min. Total Avg. Total Max. RAM User Kernel ------ ---------- ---------- ---------- --------- --------- --------- Nginx 15,072,297 15,927,773 16,797,720 11.93 MB 1,000,270 1,910,443 Lighty 21,273,484 21,631,876 21,897,404 20.12 MB 1,087,810 1,684,312 Varnish 8,817,943 9,612,933 10,399,610 223.86 MB 2,699,356 1,543,446 G-WAN 64,266,023 69,659,350 72,930,727 5.03 MB 243,166 572,618
And despite being the fastest, G-WAN uses 4x to 11x less user-mode and 3.3x less kernel-mode jiffies (CPU time).
Dynamic Contents: Scripts vs. Compiled code
The success of scripts comes from the instant gratification that interpreters provide: at the press of a key your code executes instantly. Compare this with the compilation and linkage cycles of Apache, Lighttpd or Nginx modules (which require server stops & restarts).
But since low performance comes at a cost, interpreters left the scene to invite compilers (Facebook HipHop PHP => C++ translator).
G-WAN C scripts offer the best of both worlds: the convenience of scripts and the speed of compiled code.
Comparing G-WAN C scripts / Apache + PHP / GlassFish + Java / IIS + C#
To evaluate the performance gap of poorly executed convenience, we invited the major scripted language vendors to review our version of the loan.c script ported in their language – and then we used an ApacheBench wrapper to test them all:
Note that the vertical axis (requests per second) uses an exponential scale.This is a 100-year loan test with G-WAN v2.10 (you can also see many older tests).
One single request of loan(100) takes:
› G-WAN + C script …………… 0.5 ms
› Apache + PHP ……………… 12.6 ms
› GlassFish + Java ……………42.3 ms
› IIS + ASP.Net C# ………… 171.8 ms
Not all script engines (nor all servers) are equal in the light of a dynamic content benchmark – even without concurrency:
Note that G-WAN serves up to 322,000 100-year loans (131.4 KB) per second.This is faster than Nginx or Lighty merely serving a 100-byte static file (Nginx 207,558 RPS and Lighty 215,614 RPS, see the first chart of this page).G-WAN makes AJAX Web apps scale better than static files served by Nginx!
Note also that how the user-mode CPU time is low and constant while the kernel-mode time follows the RPS curve.
And see how memory usage grows with RPS – but remains constant once the RPS curve is stabilized.
When sollicited, multi-Cores are great!
This may be why publicly available comparative benchmarks are a so scare resource – and why publishing anything relevant triggers the ire of the Censorship & F.U.D. departments. When, for the last time, have you seen G-WAN’s feats discussed by the same press where each micro-second saved by Facebook makes the headlines?
The sum it up: using the least efficient technologies comes at an hefty (recurring) cost.
And the fact that 40-year old technology outdoes more ‘modern’ (complex and patented) alternatives by several orders of magnitude may disturb those among us who still have their mind (and those who have to sign big checks).
Comparing G-WAN to Nginx, Lighttpd, Varnish and Apache Traffic Server (ATS)
Using a low-end Intel Core i3 laptop and a 100-byte static file, an independent expert [1] of the EPFL’s Distributed Information Systems Laboratory has evaluated the performance of the best Web servers (and “Web server accelerators” like Varnish and ATS) by using this open-source ApacheBench wrapper:
The authors of Nginx and Varnish have participated to this study (click the chart to read it).
They helped to tune their server: Igor went as far as to ask new benchmarks for the latest version of Nginx and then, when Nginx v1.0 had no effect, to re-build a features-stripped-down Nginx in an attempt to catch with G-WAN’s performance.
Varnish provided several versions of its configuration file and, like Igor for Nginx, requested several benchmarks to be done.
Their efforts did not make any difference: G-WAN (without tuning and with more features than all others) is just much faster.
Note that G-WAN v2.1, the version tested by Nicolas Bonvin, was twice faster and used twice less CPU resources than Nginx – but it used more memory. This pushed us to at least equal Nginx’s feat: G-WAN v2.8+ uses less memory than Nginx (while offering many more features like a ‘wait-free’ KV store – the first of its kind – and ANSI C scripts to generate dynamic contents).
Why localhost tests are relevant?
We have all read that “benchmarks on localhost do not reflect reality”.
Sure, there is no substitute to a 100GbE network of tenths of thousands of inter-connected machines driven by human users available to test your Web application each time you need it. But not everybody can afford this kind of tests.
And the most relevant substitute is localhost: without bandwidth limits you will test the server rather than the network (which is tuned by “PR” benchmarks with OS kernel patches, multi-homed servers using arrays of 10 Gbps NICs and tuned drivers, high-end switches, etc.).
Everybody has access to same (free and standard) localhost.
A test that everybody can duplicate has certainly some value – especially if your goal is merely to compare how different HTTP servers behave under heavy loads (CPU, memory usage and performance: requests per second).
With 400 GbE networks in the works, the question of how fast servers can run becomes crucial.
Why CPU load matters?
When we will have 1,000-Core CPUs with address bus saturation resolved, the Linux kernel will make a machine hundreds of times faster. And G-WAN will be much faster without requiring any modification. This is because G-WAN/Linux scales on multi-Core systems while using little CPU resources (see the first chart of this page).
For many parallelized Web servers, the limit that prevents them from scaling is their own user-mode code rather than the kernel or CPU cache probing issues.
A high CPU usage without performance just reveals how much fat a parallelized process is dragging. Reasons range from bloated implementations to inadequate designs, both being increasingly visible as clients concurrency grows (but the latter bites even harder).
Researchers have used existing software like IBM Apache to experiment with diverse paralellized strategies, locality and cache updates. What contributed to limit the range of their experimentations is the overhead of the user-mode code used to measure more subtle interactions. Research will clearly need to focus on implementation to get relevant results with design issues.
Conclusion: 1,000-Core CPUs will make very few Web servers (and even less script engines) fly higher.
Why memory footprint matters?
Whether you are hosting several Web sites or need to run several applications on a single machine (or just want to get the best possible performance) then using as little RAM as possible helps.
This is because (a) accessing system memory is immensely slower than accessing the CPU caches and (b) those caches have limited sizes.
The less you are using memory, the more your code and data have chances to stay loaded in the fast CPU caches – and the faster your code executes.
At 200 KB (script engine included), G-WAN leaves plenty of space for your code and data.
(via G-WAN)