Tues, Fed 6 2010 Facebook announced a ambitious project called “HipHop for PHP”, if you missed it general opinion says you have been coding PHP in a cave. As I write this review no code has been posted yet, but Facebook has made a great move to open source the project so we can all get our hands on it, use it and contribute to it. So since the code is not out there yet, this is literally a first impression article based on the presentation made by Facebook and various post from core PHP developers who got a first look at the technology before the release.
What is it?
To be blunt, its a PHP to C++ code transformer (compiler). But that does not make justice to it, so let’s look deeper. To those of you that know intimately you understand the process behind running PHP, it is thus:
PHP code –interpreter–> OP CODE –Zend Engine –>Machine Language
Generally caching solutions store OP Code and reuse it instead of running the interpreter for every request. What HipHop does is completely different and surprised quite a few people who decided go guess what they were doing. On a general view this is the process (simplified):
PHP Code –parser –>C++ Code — G++ –> Compiled binary
Historically PHP has always been executed on the Zend Engine, heart of PHP that has been around since PHP3, but what this solution does is that the Zend Engine has been recoded into the HipHop Runtime Engine, which instead of OP Code takes in C++ code that has been generated based on the original PHP code.
Its a well known fact that running code in C is faster then running PHP code, for obvious reasons, it very common for large applications in PHP to port part of its codebase to C and package it into an extension, such as Yahoo and even PHP projects like Doctrine have done so, performance of simple operations can increase in as much as hundreds percent, depending on load and usage.
This is the premise for Facebook’s project, they have long contributed to APC and PHP to get more performance out of their code, but with the increased load of billions of pages served it was not enough, they decided then to solve the problem. One of the options on the table was move on to another language all together, but this is where PHP shines, Facebook declared that PHP in simply a great solution because they can easily and rapidly get new programmers up to status and developing in PHP due to simplicity, that and the fact that their code base consists of million + lines of code made them decide that this was not a solution, thus HipHop started.
How does it work?
The idea is that PHP code can be divided into “mundane” and “magic” code. Being mundane code basic operations that are directly mapped to C++ functions. This code if converted to C++ can be executed with much higher performance, while the magic code, which is the really complex code to be converted would run at equal or slightly lower speeds. This is the point that determines if you application can benefit from this, is it more mundane then magic?
If your answer is yes, then you may want to look into it. The converter does a lot of processing identifying dependencies, doing static analysis and other operations to get the basic code, it then has to take care of the problematic issue, Typing. PHP is a weakly typed language, meaning variables can juggle their types to and from various types. In the backend of this Zend Engine implements the ZVAL type. which basically stores anything. For the C++ code the new variables are typed so the parser needs to do all this in its Type Interface. The project’s lead Engineer, Haiting Zhao, stated that one of the solutions was to map ZVALs to the C++ Variant type whenever it impossible to determine a specific type (failed type inference), or when typecasting occurs in the process of the script. After all this analysis code is finally generated.
Thus this code is compiled against the HipHop Runtime, which as I said works like the Zend Engine and works now with specialized types instead of the abstract type in the Zend Engine. Binary in hand this can now be run straight from the command line, or interfacing with a web server as its compatible with the libevent library. Currently Facebook also wrote a very simple web server to interface with its compiled code replacing its Apache on calls to this code (as far as information goes, they proxy PHP traffic to this server and leave resources going through Apache).
The good and the bad
Good: This leaves programmers to continue coding in PHP, no slow downs, they can still have PHP’s ease of operation, code, run, see, fix, run, see, no need to re-compile and such. Compilation only happens to production code and unfortunately is a slow process. The final result is on large binary, a true binary that can be executed and it maps out to one process with multiple threads, which is interesting in other scaling topics like this mean you have one DB connection and not multiple.
Bad: Its compatible up to PHP 5.2, existing PHP extensions need to be converted to be compatible, compilation. With the markets overwhelming move to 5.3 and the incredible features present in it, having to fall back on 5.2 (earlier 5.2 versions, not latest) can really be a downside to the whole thing. Also, PHP extensions based in C and not thread safe need to be rewritten in C++ to be compatible, Facebook has converted a few, but their are lots of extension out there and we might need to use more then a few. Compilation process is long so fixing a bug on a live production app is not as simple as fix, test, deploy, works; code must be recompiled and deployed, which is just fine if your QA processes are spotless, but in most cases you will run into delays due to compilation.
Not supported? Some pieces of PHP are not and probably never will be supported, like eval(), create_function(), and preg_replace use the /e flag. These functions won’t be missed if you like clean and quality code, but templating systems rely on it, like Smarty, so that’s not good news for them.
Result? Well Facebook has one advantage here, this is not an “experiment” or a theoretical project, its currently being used massively on their code base, so it works. Facebook stated reduction of 50% CPU usage on their servers, which is the equivalent of doubling your pool of servers, really impressive results.
What’s coming down the pipe? Well current plans include PHP 5.2.12 support followed by PHP 5.3 and support for running this inside Apache (mod_hiphop?). Timeframes on this are still undeclared.
Is HipHop for you?
From the various articles around the web, Terry Chay does a great job of helping you define of this if something you need to look into. In general I must say if you can run your application on 2 servers or less, keep going this is not for you. If you host or code apps that will live in Hosted Services, then this is still not for you, even though some providers like Server Groove already pointed out they intend to look into supporting it, its still shaky ground. Also if you application is more magic code then “mundane” code, you are still better off with PHP.
HipHop is an amazing concept and the complexity of it is enough to leave you in awe of the team responsible for it. It is definitively not a solution for most of the PHP-related market, apps and developers, most reviews I have seen state its not for 99,9% of code out there. I do think it will grow and evolve quite a bit one it is open to the community, its open source nature will be a generous boost and by far this has been one of the greatest move by Facebook and something I really respect in their work.
I was quite refreshed to see a move of total innovation when all external medias placed their bets on a JIT compiler or re-write of the language. Its a solution that holds on to one of PHP’s advantages, its simplicity, and still brings a new point of performance gains to be explored by the community, it also to generate performatic code that can be compared to the likes of Java and C#. In short it takes a scripting language and promotes it to machine code.
You can get HipHop’s source from gihub
(from Rafael Dohms)