Async Fragments: Rediscovering Progressive HTML Rendering with Marko

At eBay, we take site speed very seriously and are always looking for ways to allow developers to create faster-loading web apps. This involves fully understanding and controlling how web pages are delivered to web browsers. Progressive HTML rendering is a relatively old technique that can be used to improve the performance of websites, but it has been lost in a whole new class of web applications. The idea is simple: give the web browser a head start in downloading and rendering the page by flushing out early and multiple times. Browsers have always had the helpful feature of parsing and responding to the HTML as it is being streamed down from the server (even before the response is ended). This feature allows the HTML and external resources to be downloaded earlier, and for parts of the page to be rendered earlier. As a result, both the actual load time and the perceived load time improve.

In this blog post, we will take an in-depth look at a technique we call “Async Fragments” that takes advantage of progressive HTML rendering to improve site speed in ways that do not drastically complicate how web applications are built. For concrete examples we will be using Node.js,Express.js and the Marko templating engine (a JavaScript templating engine that supports streaming, flushing, and asynchronous rendering). Even if you are not using these technologies, this post can give you insight into how your stack of choice could be further optimized.

To see the techniques discussed in this post in action, please take a look at the accompanying sample application.

Background

Progressive HTML rendering is discussed in the post The Lost Art of Progressive HTML Rendering by Jeff Atwood, which was published back in 2005. In addition, the “Flush the Buffer Early” rule is described by the Yahoo! Performance team in their Best Practices for Speeding Up Your Web Site guide. Stoyan Stefanov provides an in-depth look at progressive HTML rendering in his Progressive rendering via multiple flushes post. Facebook discussed how they use a technique they call “BigPipe” to improve page load times and perceived performance by dividing up a page into “pagelets.” Those articles and techniques inspired many of the ideas discussed in this post.

In the Node.js world, its most popular web framework, Express.js, unfortunately recommends a view rendering engine that does not allow streaming and thus prevents progressive HTML rendering. In a recent post, Bypassing Express View Rendering for Speed and Modularity, I described how streaming can be achieved with Express.js; this post is largely a follow-up to discuss how progressive HTML rendering can be achieved with Node.js (with or without Express.js).

Without progressive HTML rendering

A page that does not utilize progressive HTML rendering will have a slower load time because the bytes will not be flushed out until the complete HTML response is built. In addition, after the client finally receives the complete HTML it will then see that it needs to download additional external resources (such as CSS, JavaScript, fonts, and images), and downloading these external resources will require additional round trips. In addition, pages that do not utilize progressive HTML rendering will also have a slower perceived load time, since the screen will not update until the complete HTML is downloaded and the CSS and fonts referenced in the <head> section are downloaded. Without progressive HTML rendering, a server/client waterfall chart might be similar to the following:

ps1

The corresponding page controller might look something like this:

function controller(req, res) {
    async.parallel([
            function loadSearchResults(callback) {
                ...
            },
            function loadFilters(callback) {
                ...
            },
            function loadAds(callback) {
                ...
            }
        ],
        function() {
            ...
            var viewModel = { ... };
            res.render('search', viewModel);
        })
}

As you can see in the above code, the page HTML is not rendered until all of the data is asynchronously loaded.

Because the HTML is not flushed until all back-end services are completed, the user will be staring at a blank screen for a large portion of the time. This will result in a sub-par user experience (especially with a poor network connection or with slow back-end services). We can do much better if we flush part of the HTML earlier.

Flushing the head early

A simple trick to improve the responsiveness of a website is to flush the head section immediately. The head section will typically include the links to the external CSS resources (i.e. the <link>tags), as well as the page header and navigation. With this approach the external CSS will be downloaded sooner and the initial page will be painted much sooner as shown in the following waterfall chart:

ps2

As you can see in the chart above, flushing the head early reduces the time to render the initial page. This technique improves the responsiveness of the page, but it does not significantly reduce the total time it takes to make the page fully functional. With this approach, the server is still waiting for all back-end services to complete before flushing the final HTML. In addition, downloading of external JavaScript resources will be delayed since <script> tags are placed at the end of the page (assuming you are following best practices) and don’t get sent out until the second and final flush.

Multiple flushes

Instead of flushing only the head early, it is often beneficial to flush multiple times before ending the response. Typically, a page can be divided into multiple fragments where some of the fragments may depend on data asynchronously loaded from various back-end services while others may not depend on any asynchronously loaded data. The fragments that depend on asynchronously loaded data should be rendered asynchronously and flushed as soon as possible.

For now, we will assume that these fragments need to be flushed in the proper HTML order (versus the order that the data asynchronously loads), but we will also show how out-of-order flushing can be used to further improve both page load times and perceived performance. When using “in-order” flushing, fragments that complete out of order will need to be buffered until they are ready to be flushed in the proper order.

In-order flushing of async fragments

As an example, let’s assume we have divided a complex page into the following fragments:

ps3

Each fragment is assigned a number based on the order that it appears in the HTML document. In code, our output HTML for the page might look like the following:

<html>
<head>
    <title>Clothing Store</title>
    <!-- 1a) Head <link> tags -->
</head>
<body>
    <header>
        <!-- 1b) Header -->
    </header>
    <div class="body">
        <main>
            <!-- 2) Search Results -->
        </main>
        <section class="filters">
            <!-- 3) Search filters -->
        </section>
        <section class="ads">
            <!-- 4) Ads -->
        </section>
    </div>
    <footer>
        <!-- 5a) Footer -->
    </footer>
    <!-- 5b) Body <script> tags -->
</body>
</html>

The Marko templating engine provides a way to declaratively bind template fragments to asynchronous data provider functions (or Promises). An asynchronous fragment is rendered when the asynchronous data provider function invokes the provided callback with the data. If the asynchronous fragment is ready to be flushed, then it is immediately flushed to the output stream. Otherwise, if the asynchronous fragment completed out of order then the rendered HTML is buffered in memory until it is ready to be flushed. The Marko templating engine ensures that fragments are flushed in the proper order.

Continuing with the previous example, our HTML page template with asynchronous fragments defined will be similar to the following:

<html>
<head>
    <title>Clothing Store</title>
    <!-- Head <link> tags -->
</head>
<body>
    <header>
        <!-- Header -->
    </header>
    <div class="body">
        <main>
            <!-- Search Results -->
            <async-fragment data-provider="data.searchResultsProvider"
                var="searchResults">
 
                <!-- Do something with the search results data... -->
                <ul>
                    <li for="item in searchResults.items">
                        $item.title
                    </li>
                </ul>
 
            </async-fragment>
        </main>
        <section class="filters">
 
            <!-- Search filters -->
            <async-fragment data-provider="data.filtersProvider"
                var="filters">
                <!-- Do something with the filters data... -->
            </async-fragment>
 
        </section>
        <section class="ads">
 
            <!-- Ads -->
            <async-fragment data-provider="data.adsProvider"
                var="ads">
                <!-- Do something with the ads data... -->
            </async-fragment>
 
        </section>
    </div>
    <footer>
        <!-- Footer -->
    </footer>
    <!-- Body <script> tags -->
</body>
</html>

The data provider functions should be passed to the template as part of the view model as shown in the following code for a sample page controller:

function controller(req, res) {
    template.render({
            searchResultsProvider: function(callback) {
                performSearch(req.params.category, callback);
            },
 
            filtersProvider: function(callback) {
                ...
            },
 
            adsProvider: function(callback) {
                ...
            }
        },
        res /* Render directly to the output HTTP response stream */);
}

In this particular example, the “search results” async fragment appears first in the HTML template, and it happens to take the longest time to complete. As a result, all of the subsequent fragments will need to be buffered on the server. The resulting waterfall with in-order flushing of async fragments is shown below:

ps4

While the performance of this approach might be fine, we can enable out-of-order flushing for further performance gains as described in the next section.

Out-of-order flushing of async fragments

Marko achieves out-of-order flushing of async fragments by doing the following:

Instead of waiting for an async fragment to finish, a placeholder HTML element with an assigned id is written to the output stream. Out-of-order async fragments are rendered before the ending <body> tag in the order that they complete. Each out-of-order async fragment is rendered into a hidden <div> element. Immediately after the out-of-order fragment, a <script> block is rendered to replace the placeholder DOM node with the DOM nodes of the corresponding out-of-order fragment. When all of the out-of-order async fragments complete, the remaining HTML (e.g. </body></html>) will be flushed and the response ended.

To clarify, here is what the output HTML might look like for a page with out-of-order flushing enabled:

<html>
<head>
    <title>Clothing Store</title>
    <!-- 1a) Head <link> tags -->
</head>
<body>
    <header>
        <!-- 1b) Header -->
    </header>
    <div class="body">
        <main>
            <!-- 2) Search Results -->
            <span id="asyncFragment0Placeholder"></span>
        </main>
        <section class="filters">
            <!-- 3) Search filters -->
            <span id="asyncFragment1Placeholder"></span>
        </section>
        <section class="ads">
            <!-- 4) Ads -->
            <span id="asyncFragment2Placeholder"></span>
        </section>
    </div>
    <footer>
        <!-- 5a) Footer -->
    </footer>
 
    <!-- 5b) Body <script> tags -->
 
    <script>
    window.$af=function(){
    // Small amount of code to support rearranging DOM nodes
    // Unminified:
    // https://github.com/raptorjs/marko-async/blob/master/client-reorder-runtime.js
    };
    </script>
 
    <div id="asyncFragment1" style="display:none">
        <!-- 4) Ads content -->
    </div>
    <script>$af(1)</script>
 
    <div id="asyncFragment2" style="display:none">
        <!-- 3) Search filters content -->
    </div>
    <script>$af(2)</script>
 
    <div id="asyncFragment0" style="display:none">
        <!-- 2) Search results content -->
    </div>
    <script>$af(0)</script>
 
</body>
</html>

One caveat with out-of-order flushing is that it requires JavaScript running on the client to move each out-of-order fragment into its proper place in the DOM. Thus, you would only want to enable out-of-order flushing if you know that the client’s web browser has JavaScript enabled. Also, moving DOM nodes may cause the page to be reflowed, which can be visually jarring to the user and result in more client-side CPU usage. If reflow is an issue then there are tricks that can be used to avoid a reflow (e.g., reserving space as part of the initial wireframe). Marko also allows alternative content to be shown while waiting for an out-of-order async fragment.

To enable out-of-order flushing with Marko, the client-reorder="true" attribute must be added to each <async-fragment> tag, and the <async-fragments> tag must be added to the end of the page to serve as the container for rendered out-of-order fragments. Here is the updated<async-fragment> tag for the search results fragment:

<async-fragment data-provider="data.searchResultsProvider"
    var="searchResults"
    client-reorder="true">
    ...
</async-fragment>

The updated HTML page template with the new <async-fragments> tag is shown below:

<html>
<head>
    <title>Clothing Store</title>
    <!-- Head <link> tags -->
</head>
<body>
    ...
 
    <!-- Body <script> tags -->
 
    <async-fragments/>
</body>
</html>

In combination with out-of-order flushing, it may be beneficial to move <script> tags that link to external resources to the end of the first chunk (before all of the out-of-order chunks). While the server is busy preparing the rest of the page, the client can start downloading the external JavaScript required to make the page functional. As a result, the user will be able to start interacting with the page sooner.

Our final waterfall with out-of-order flushing will now be similar to the following:

ps5

The final waterfall shows that the strategy of out-of-order flushing of asynchronous fragments can significantly improve the load time and perceived load time of a page. The user will be met with a progressive loading of a page that is ready to be interacted with sooner.

Additional considerations

HTTP transport and HTML compression

To allow HTML to be served in parts, chunked transfer encoding should be used for the HTTP response. Chunked transfer encoding uses delimiters to break up the response, and each flush results in a new chunk. If gzip compression is enabled (and it should be) then flushing the pending data to the gzip stream will result in a gzip data frame being written to the response as part of each chunk. Flushing too often will negatively impact the effectiveness of the compression algorithm, but without flushing periodically then progressive HTML rendering will not be available. By default, Marko will flush at the beginning of an <async-fragment> block (in order to send everything that has already completed), as well as when an async fragment completes. This default strategy results in efficient progressive loading of an HTML page as long as there are not too many async fragments.

Binding behavior

For improved usability and responsiveness, there should not be a long delay between rendered HTML being displayed to the user in the web browser and behavior being attached to the associated DOM. At eBay, we use the marko-widgets module to bind behavior to DOM nodes. Marko Widgets supports binding behavior to rendered widgets immediately after each independent async fragment, as illustrated in the accompanying sample app. For immediate binding to work, the required JavaScript code must be included earlier in the page. For more details, please see the marko-widgets module documentation.

Error handling

It is important to note that as soon as a byte is flushed for the HTTP body, then the response is committed; no additional HTTP headers can be sent (e.g., no server-side redirects or cookie-setting), and the HTML that has been sent cannot be “unsent”. Therefore, if an asynchronous data provider errors or times out, then the app must be prepared to show alternative content for thatparticular async fragment. Please see the documentation for the marko-async module for additional information on how to show alternative content in case of an error.

Summary

The Async Fragments technique allows web developers to maximize the benefits of progressive HTML rendering to produce web pages that have improved actual and perceived load times. Developers at eBay have found the concept of binding HTML template fragments to asynchronous data providers easy to grasp and utilize. In addition, the flexibility to support both in-order and out-of-order flushing of async fragments makes this technique applicable for all web browsers and user agents.

The Marko templating engine is being used as part of eBay’s latest Node.js stack to improve performance while also simplifying how pages are constructed on both the server and the client. Marko is one of a few templating engines for Node.js and web browsers that support streaming, flushing, and asynchronous rendering. Marko has a simple HTML-based syntax, and the Marko compiler produces small and efficient JavaScript modules as output. We encourage you to try Marko online and in your next Node.js project. Because Marko is a key component of eBay’s internal Node.js stack, and given that it is heavily documented and tested, you can be confident that it will be well supported.

(Via Calendar.perfplanet.com)

Advertisements

Does WebKit face a troubled future now that Google is gone?

chromium-webkit-hands

Now that Google is going its own way and developing its rendering engine independently of the WebKit project, both sides of the split are starting the work of removing all the things they don’t actually need.

This is already causing some tensions among WebKit users and Web developers, as it could lead to the removal of technology that they use or technology that is in the process of being standardized. This is leading some to question whether Apple is willing or able to fill in the gaps that Google has left.

Since Google first released Chrome in 2008, WebCore, the part of WebKit that does the actual CSS and HTML processing, has had to serve two masters. The major contributors to the project, and the contributors with the most widely used browsers, were Apple and Google.

While both used WebCore, the two companies did a lot of things very differently. They used different JavaScript engines (JavaScriptCore [JSC] for Apple, V8 for Google). They adopted different approaches to handling multiple processes and sandboxing. They used different options when compiling the software, too, so their browsers actually had different HTML and CSS features.

The WebCore codebase had to accommodate all this complexity. JavaScript, for example, couldn’t be integrated too tightly with the code that handles the DOM (the standard API for manipulating HTML pages from JavaScript), because there was an intermediary layer to ensure that JSC and V8 could be swapped in and out.

Google said that the decision to fork was driven by engineering concerns and that forking would enable faster development by both sides. That work is already under way, and both teams are now preparing to rip all these unnecessary bits out.

Right now, it looks like Google has it easier. So far, only Google and Opera are planning to use Blink, and Opera intends to track Chromium (the open source project that contains the bulk of Chrome’s code) and Blink anyway, so it won’t diverge too substantially from either. This means that Google has a fairly free hand to turn features that were optional in WebCore into ones that are permanent in Blink if Chrome uses them, or eliminate them entirely if it doesn’t.

Apple’s position is much trickier, because many other projects use WebKit, and no one person knows which features are demanded by which projects. Apple also wants to remove the JavaScript layers and just bake in the use of JSC, but some WebKit-derived projects may depend on them.

Samsung, for example, is using WebKit with V8. But with Google’s fork decision, there’s now nobody maintaining the code that glues V8 to WebCore. The route that Apple wants to take is to purge this stuff and leave it up to third-party projects to maintain their variants themselves. This task is likely to become harder as Cupertino increases the integration between JSC and WebCore.

Oracle is working on a similar project: a version of WebKit with its own JavaScript engine, “Nashorn,” that’s based on the Java virtual machine. This takes advantage of the current JavaScript abstractions, so it’s likely to be made more complicated as Apple removes them.

One plausible outcome for this is further consolidation among the WebKit variants. For those dead set on using V8, switching to Blink may be the best option. If sticking with WebKit is most important, reverting to JSC may be the only practical long-term solution.

Google was an important part of the WebKit project, and it was responsible for a significant part of the codebase’s maintenance. The company’s departure has left various parts of WebKit without any developers to look after them. Some of these, such as some parts of the integrated developer tools, are probably too important for Apple to abandon—even Safari uses them.

Others, however, may be culled—even if they’re on track to become Web standards. For example, Google developed code to provide preliminary support for CSS Custom Properties (formerly known as CSS Variables). It was integrated into WebKit but only enabled in Chromium. That code now has nobody to maintain it, so Apple wants to remove it.

This move was immediately criticized by Web developer Jon Rimmer, who pointed out that the standard was being actively developed by the World Wide Web Consortium (W3C), was being implemented by Mozilla, and was fundamentally useful. The developer suggested that Apple had two options for dealing with Google’s departure from the project: either by “cutting out [Google-developed] features and continuing at a reduced pace, or by stepping up yourselves to fill the gap.”

Discussion of Apple’s ability to fill the Google-sized gap in WebKit was swiftly shut down, but Rimmer’s concern remains the elephant in the room. Removing the JavaScript layer is one thing; this was a piece of code that existed only to support Google’s use of V8, and with JavaScriptCore now the sole WebKit engine, streamlining the code makes sense. Google, after all, is doing the same thing in Blink. But removing features headed toward standardization is another thing entirely.

If Apple doesn’t address Rimmer’s concerns, and if Blink appears to have stronger corporate backing and more development investment, one could see a future in which more projects switch to using Blink rather than WebKit. Similarly, Web developers could switch to Blink—with a substantial share of desktop usage and a growing share of mobile usage—and leave WebKit as second-best.

(via arstechnica.com)