Google Launches Cloud CDN Alpha

Earlier this month, Google announced an Alpha Cloud Content Delivery Network (CDN) offering. The service aims to provide static content closer to end users by caching content in Google’s globally distributed edge caches.  Google provides many more edge caches than it does data centers and as a result content can be provided quicker than making full round trip requests to a Google data center. In total, Google provides over 70 Edge points of presence which will help address customer CDN needs.

In order to use Cloud CDN, you must use Google’s Compute Engine HTTP(s) load balancers on your instances. Enabling an HTTP(S) load balancer is achieved through a simple command.

In a recent post, Google explains the mechanics of the service in the following way: “When a user requests content from your site, that request passes through network locations at the edges of Google’s network, usually far closer to the user than your actual instances. The first time that content is requested, the edge cache sees that it can’t fulfill the request and forwards the request on to your instances. Your instances respond back to the edge cache, and the cache immediately forwards the content to the user while also storing it for future requests. For subsequent requests for the same content that pass through the same edge cache, the cache responds directly to the user, shortening the round trip time and saving your instances the overhead of processing the request.”

The following image illustrates how Google leverages Edge point of presence caches to improve responsiveness.


Image Source:

Once the CDN service has been enabled, caching will automatically occur for all cacheable content. Cacheable content is typically defined by requests made through an HTTP GET request.  The service will respect explicit Cache-Control headers taking into account for expiration or make age headers.  Some responses will not be cached including ones that include Set-Cookie headers, message bodies that exceed 4 mb in size or where caching has been explicitly disabled through no-cache directives.  A complete list of cached rules can be found in Google’s documentation.

Google has traditionally partnered with 3rd parties in order to speed up the delivery of content to consumers.  These partnerships include Akamai, Cloudflare, Fastly, Level 3 Communications and Highwinds.

Other cloud providers also have CDN offerings including Amazon’s CloudFront and Microsoft’s Azure CDN.  Google will also see competition from Akamai, one of the aforementioned partners, who has approximately 16.3% CDN market share of the Alexa top 1 million sites.

Why unikernels might kill containers in five years

Sinclair Schuller is the CEO and cofounder of Apprenda, a leader in enterprise Platform as a Service.

Container technologies have received explosive attention in the past year – and rightfully so. Projects like Docker and CoreOS have done a fantastic job at popularizing operating system features that have existed for years by making those features more accessible.

Containers make it easy to package and distribute applications, which has become especially important in cloud-based infrastructure models. Being slimmer than their virtual machine predecessors, containers also offer faster start times and maintain reasonable isolation, ensuring that one application shares infrastructure with another application safely. Containers are also optimized for running many applications on single operating system instances in a safe and compatible way.

So what’s the problem?
Traditional operating systems are monolithic and bulky, even when slimmed down. If you look at the size of a container instance – hundreds of megabytes, if not gigabytes, in size – it becomes obvious there is much more in the instance than just the application being hosted. Having a copy of the OS means that all of that OS’ services and subsystems, whether they are necessary or not, come along for the ride. This massive bulk conflicts with trends in broader cloud market, namely the trend toward microservices, the need for improved security, and the requirement that everything operate as fast as possible.

Containers’ dependence on traditional OSes could be their demise, leading to the rise of unikernels. Rather than needing an OS to host an application, the unikernel approach allows developers to select just the OS services from a set of libraries that their application needs in order to function. Those libraries are then compiled directly into the application, and the result is the unikernel itself.

The unikernel model removes the need for an OS altogether, allowing the application to run directly on a hypervisor or server hardware. It’s a model where there is no software stack at all. Just the app.

There are a number of extremely important advantages for unikernels:

  1. Size – Unlike virtual machines or containers, a unikernel carries with it only what it needs to run that single application. While containers are smaller than VMs, they’re still sizeable, especially if one doesn’t take care of the underlying OS image. Applications that may have had an 800MB image size could easily come in under 50MB. This means moving application payloads across networks becomes very practical. In an era where clouds charge for data ingress and egress, this could not only save time, but also real money.
  2. Speed – Unikernels boot fast. Recent implementations have unikernel instances booting in under 20 milliseconds, meaning a unikernel instance can be started inline to a network request and serve the request immediately. MirageOS, a project led byAnil Madhavapeddy, is working on a new tool named Jitsuthat allows clouds to quickly spin unikernels up and down.
  3. Security – A big factor in system security is reducing surface area and complexity, ensuring there aren’t too many ways to attack and compromise the system. Given that unikernels compile only which is necessary into the applications, the surface area is very small. Additionally, unikernels tend to be “immutable,” meaning that once built, the only way to change it is to rebuild it. No patches or untrackable changes.
  4. Compatibility – Although most unikernel designs have been focused on new applications or code written for specific stacks that are capable of compiling to this model, technology such as Rump Kernels offer the ability to run existing applications as a unikernel. Rump kernels work by componentizing various subsystems and drivers of an OS, and allowing them to be compiled into the app itself.

These four qualities align nicely with the development trend toward microservices, making discrete, portable application instances with breakneck performance a reality. Technologies like Docker and CoreOS have done fantastic work to modernize how we consume infrastructure so microservices can become a reality. However, these services will need to change and evolve to survive the rise of unikernels.

The power and simplicity of unikernels will have a profound impact during the next five years, which at a minimum will complement what we currently call a container, and at a maximum, replace containers altogether. I hope the container industry is ready.

New Windows Server containers and Azure support for Docker

In June, Microsoft Azure added support for Docker containers on Linux VMs, enabling the broad ecosystem of Dockerized Linux applications to run within Azure’s industry-leading cloud. Today, Microsoft and Docker Inc. are jointly announcing we are bringing the Windows Server ecosystem to the Docker community, through 1) investments in the next wave of Windows Server, 2) open-source development of the Docker Engine for Windows Server, 3) Azure support for the Docker Open Orchestration APIs and 4) federation of Docker Hub images in to the Azure Gallery and Portal.

Many customers are running a mix of Windows Server and Linux workloads and Microsoft Azure offers customers the most choice of any cloud provider. By supporting Docker containers on the next wave of Windows Server, we are excited to make available Docker open solutions across both Windows Server and Linux. Applications can themselves be mixed; bringing together the best technologies from the Linux ecosystem and the Windows Server ecosystem. Windows Server containers will run in your datacenter, your hosted datacenter, or any public cloud provider – and of course, Microsoft Azure.




Windows Server Containers

Windows Server containers provide applications an isolated, portable and resource controlled operating environment. This isolation enables containerized applications to run without risk of dependencies and environmental configuration affecting the application. By sharing the same kernel and other key system components, containers exhibit rapid startup times and reduced resource overhead. Rapid startup helps in development and testing scenarios and continuous integration environments, while the reduced resource overhead makes them ideal for service-oriented architectures.

The Windows Server container infrastructure allows for sharing, publishing and shipping of containers to anywhere the next wave of Windows Server is running. With this new technology millions of Windows developers familiar with technologies such as .NET, ASP.NET, PowerShell, and more will be able to leverage container technology. No longer will developers have to choose between the advantages of containers and using Windows Server technologies.






Windows Server containers in the Docker ecosystem

Docker has done a fantastic job of building a vibrant open source ecosystem based on Linux container technologies, providing an easy user experience to manage the lifecycle of containers drawn from a huge collection of open and curated applications in Docker Hub. We will bring Windows Server containers to the Docker ecosystem to expand the reach of both developer communities.

As part of this, Docker Engine for Windows Server containers will be developed under the aegis of the Docker open source project, where Microsoft will participate as an active community member. Windows Server container images will also be available in the Docker Hub alongside the 45,000 and growing Docker images for Linux already available.

Finally, we are working on supporting Docker client natively on Windows Server. As a result, Windows customers will be able to use the same standard Docker client and interface on multiple development environments.



You can find more about Microsoft’s work with the Docker open source project on the MS Open Tech blog here.

Docker on Microsoft Azure

Earlier this year, Microsoft released Docker containers for Linux on Azure, offering the first enterprise-ready version of the Docker open platform on Linux Virtual Machines on Microsoft Azure, leveraging the Azure extension model and Azure Cross Platform CLI to deploy the latest and greatest Docker Engine on each requested VM. We have seen lots of excitement from customers deploying Docker containers in Azure as part of our Linux support.

As part of the announcement today, we will be contributing support for multi-container Docker applications on Azure through the Docker Open Orchestration APIs. This will enable users to deploy Docker applications to Azure directly from the Docker client. This results in a dramatically simpler user experience for Azure customers; we are looking forward to demonstrating this new joint capability at Docker’s Global Hack Day as well as at the upcoming Microsoft TechEd Europe conference in Barcelona.

Furthermore, we hope to energize Windows Server and Linux customers by integrating Docker Hub into the Azure Gallery and Management Portal experience. This means that Azure customers will be able to interact directly with repositories and images on Docker Hub, enabling rich composition of content both from the Azure Gallery and Docker Hub.

In summary, today we announced a partnership with Docker Inc. to bring Windows Server to the Docker ecosystem and improve Azure’s support for the Docker Engine and Orchestration APIs and to integrate Docker Hub with the Azure Gallery and Management Portal.

Azure is placing a high priority on developer choice and flexibility including first-class support for Linux and Windows Server. This expanded partnership builds on the Azure’s current support for Docker on Linux and will bring the richness of the Windows Server and .NET ecosystem to the Docker community. It is an exciting time to be in the Azure cloud!

( Via )


A Sneak Peak at the Next-Gen Exascale Operating System

There are several scattered pieces in the exascale software stack being developed and clicked together worldwide. Central to that jigsaw effort is the eventual operating system to power such machines.

This week the Department of Energy snapped in a $9.75 million investment to help round out the picture of what such as OS will look like. The grant went to Argonne National Lab for a multi-institutional project (including Pacific Northwest and Lawrence Livermore labs, as well as other universities) aimed at developing a prototype exascale operating system and associated runtime software.

To better understand“Argo”, the exascale OS effort, we spoke with Pete Beckman, Director of the Exascale Technology and Computing Institute and chief architect of the Argo project. Beckman says that as we look forward to the features of these ultra-scale machines, power management, massive concurrency and heterogeneity, as well as overall resiliency, can all be addressed at the OS level.

These are not unfamiliar concerns, but attacking them at the operating system lends certain benefits, argues Beckman. For instance, fine-tuning power control and management at the core operational and workload level becomes possible with a pared-down, purpose-built and HPC-optimized OS.

Outside of power, the team describes the “allowance for massive concurrency, [met by] a hierarchical framework for power and fault management, as well as a “beacon” mechanism that allows resource managers and optimizers to communicate and control the platform.

Beckman and team describe this hierarchy as an “enclave”–in this model the OS is more hierarchical in nature than we traditionally think of it as. In other words, it’s easy to think of a node-level OS–with Argo, there is a global OS that runs across the machine. This, combined with the platform-neutral design of Argo, will make it flexible enough to change with architectures and manageable at both a system and workload level–all packaged in familiar Linux wrappings.

As shown above, these “enclaves,” are defined as a set of resources dedicated to a particular service, and capable of introspection and autonomic response. As Argonne describes, “They can shape-shift the system configuration of nodes and the allocation of power to different nodes or to migrate data or computations from one node to another.” On the reliability front, the enclaves that tackle failure can do so “by means of global restart and other enclaves supporting finer-level recovery.”

The recognizable Linux core of Argo will have been enhanced and modified to meet the needs of more dynamic, next-gen applications. While development on those prototype applications are ongoing, Beckman and the distributed team plan to test Argo’s ability to dive into a host of common HPC applications. Again, all of this will be Linux-flavored, but with an HPC shell that narrows the focus on the problems at hand.

As a side note, leveraging the positive elements of Linux and building into it a robustness and eye on taking on critical power management, concurrency and resiliency features seems like a good idea. If the trend holds, Linux itself will continue to enjoy the lion’s share (by far–96% according to reporting from yesterday) of the OS market on the Top500.

It’s more about refining the role of Linux versus rebuilding it, Beckman explains. While Linux currently is tasked with managing a multi-user, multi-program balancing act with its resources, doling them out fairly, the Argo approach would hone in on the parts of code that need to blaze–wicking away some of the resource balancing functions. “We can rewrite some of those pieces and design runtime systems that are specifically adapted to run those bits of code fast and not try to deal with the balancing of many users ad many programs.”

The idea is to have part of the chip be capable of running the Linux kernel for the basics; things like control systems, booting, command and interface functions, debugging and the like–but as Beckman says, “for the HPC part, we can specialize and have a special component that lives in the chip.”

In that case, there are some middleware pieces that are hidden inside Argo. Beckman said that that software will move closer to the OS. And just as that happens, more software will be tied to the chips–whatever those happen to look like in the far-flung future (let’s say 2020 to be fair). This is all Argonne’s domain–they’ve been one of the leading labs that have worked on the marriage between processor and file systems, runtimes, message passing and other software workings. Beckman expects a merging between many lines–middleware and OS, and of course, both of those with the processor.

“Bringing together these multiple views and the corresponding software components through a whole-system approach distinguishes our strategy from existing designs,” said Beckman. “We believe it is essential for addressing the key exascale challenges of power, parallelism, memory hierarchy, and resilience.”

As of now, numerous groups are working on various pieces of the power puzzle in particular. This is an especially important issue going forward. Although power consumption has always been a concern, Beckman says that the approach now is to optimize systems in advance, “turn the power on, and accept that they will draw what they’re going to draw.” In addition to the other work being done inside the stack to create efficient supers, there is a role for the operating system to play  in orchestrating “smart” use of power for certain parts of the computation or parts of the machine.



Does WebKit face a troubled future now that Google is gone?


Now that Google is going its own way and developing its rendering engine independently of the WebKit project, both sides of the split are starting the work of removing all the things they don’t actually need.

This is already causing some tensions among WebKit users and Web developers, as it could lead to the removal of technology that they use or technology that is in the process of being standardized. This is leading some to question whether Apple is willing or able to fill in the gaps that Google has left.

Since Google first released Chrome in 2008, WebCore, the part of WebKit that does the actual CSS and HTML processing, has had to serve two masters. The major contributors to the project, and the contributors with the most widely used browsers, were Apple and Google.

While both used WebCore, the two companies did a lot of things very differently. They used different JavaScript engines (JavaScriptCore [JSC] for Apple, V8 for Google). They adopted different approaches to handling multiple processes and sandboxing. They used different options when compiling the software, too, so their browsers actually had different HTML and CSS features.

The WebCore codebase had to accommodate all this complexity. JavaScript, for example, couldn’t be integrated too tightly with the code that handles the DOM (the standard API for manipulating HTML pages from JavaScript), because there was an intermediary layer to ensure that JSC and V8 could be swapped in and out.

Google said that the decision to fork was driven by engineering concerns and that forking would enable faster development by both sides. That work is already under way, and both teams are now preparing to rip all these unnecessary bits out.

Right now, it looks like Google has it easier. So far, only Google and Opera are planning to use Blink, and Opera intends to track Chromium (the open source project that contains the bulk of Chrome’s code) and Blink anyway, so it won’t diverge too substantially from either. This means that Google has a fairly free hand to turn features that were optional in WebCore into ones that are permanent in Blink if Chrome uses them, or eliminate them entirely if it doesn’t.

Apple’s position is much trickier, because many other projects use WebKit, and no one person knows which features are demanded by which projects. Apple also wants to remove the JavaScript layers and just bake in the use of JSC, but some WebKit-derived projects may depend on them.

Samsung, for example, is using WebKit with V8. But with Google’s fork decision, there’s now nobody maintaining the code that glues V8 to WebCore. The route that Apple wants to take is to purge this stuff and leave it up to third-party projects to maintain their variants themselves. This task is likely to become harder as Cupertino increases the integration between JSC and WebCore.

Oracle is working on a similar project: a version of WebKit with its own JavaScript engine, “Nashorn,” that’s based on the Java virtual machine. This takes advantage of the current JavaScript abstractions, so it’s likely to be made more complicated as Apple removes them.

One plausible outcome for this is further consolidation among the WebKit variants. For those dead set on using V8, switching to Blink may be the best option. If sticking with WebKit is most important, reverting to JSC may be the only practical long-term solution.

Google was an important part of the WebKit project, and it was responsible for a significant part of the codebase’s maintenance. The company’s departure has left various parts of WebKit without any developers to look after them. Some of these, such as some parts of the integrated developer tools, are probably too important for Apple to abandon—even Safari uses them.

Others, however, may be culled—even if they’re on track to become Web standards. For example, Google developed code to provide preliminary support for CSS Custom Properties (formerly known as CSS Variables). It was integrated into WebKit but only enabled in Chromium. That code now has nobody to maintain it, so Apple wants to remove it.

This move was immediately criticized by Web developer Jon Rimmer, who pointed out that the standard was being actively developed by the World Wide Web Consortium (W3C), was being implemented by Mozilla, and was fundamentally useful. The developer suggested that Apple had two options for dealing with Google’s departure from the project: either by “cutting out [Google-developed] features and continuing at a reduced pace, or by stepping up yourselves to fill the gap.”

Discussion of Apple’s ability to fill the Google-sized gap in WebKit was swiftly shut down, but Rimmer’s concern remains the elephant in the room. Removing the JavaScript layer is one thing; this was a piece of code that existed only to support Google’s use of V8, and with JavaScriptCore now the sole WebKit engine, streamlining the code makes sense. Google, after all, is doing the same thing in Blink. But removing features headed toward standardization is another thing entirely.

If Apple doesn’t address Rimmer’s concerns, and if Blink appears to have stronger corporate backing and more development investment, one could see a future in which more projects switch to using Blink rather than WebKit. Similarly, Web developers could switch to Blink—with a substantial share of desktop usage and a growing share of mobile usage—and leave WebKit as second-best.


DDoS attacks on major US banks are no Stuxnet—here’s why

The attacks that recently disrupted website operations at Bank of America and at least five other major US banks used compromised Web servers to flood their targets with above-average amounts of Internet traffic, according to five experts from leading firms that worked to mitigate the attacks.

The distributed denial-of-service (DDoS) attacks—which over the past two weeks also caused disruptions at JP Morgan Chase, Wells Fargo, US Bancorp, Citigroup, and PNC Bank—were waged by hundreds of compromised servers. Some were hijacked to run a relatively new attack tool known as “itsoknoproblembro.” When combined, the above-average bandwidth possessed by each server created peak floods exceeding 60 gigabits per second.

More unusually, the attacks also employed a rapidly changing array of methods to maximize the effects of this torrent of data. The uncommon ability of the attackers to simultaneously saturate routers, bank servers, and the applications they run—and to then recalibrate their attack traffic depending on the results achieved—had the effect of temporarily overwhelming the targets.

“This very well could be a kid sitting in his mom’s basement in Ohio launching these attacks.”


“It used to be DDoS attackers would try one method and they were kind of one-trick ponies,” Matthew Prince, CEO and founder ofCloudFlare, told Ars. “What these attacks appear to have shown is there are some attackers that have a full suite of DDoS methods, and they’re trying all kinds of different things and continually shifting until they find something that works. It’s still cavemen using clubs, but they have a whole toolbox full of different clubs they can use depending on what the situation calls for.”

The compromised servers were outfitted with itsoknoproblembro (pronounced “it’s OK, no problem, bro”) and other DDoS tools that allowed the attackers to unleash network packets based on the UDP, TCP, HTTP, and HTTPS protocols. These flooded the banks’ routers, servers, and server applications—layers 3, 4, and 7 of the networking stack—with junk traffic. Even when targets successfully repelled attacks against two of the targets, they would still fall over if their defenses didn’t adequately protect against the third.

“It’s not that we have not seen this style of attacks or even some of these holes before,” said Dan Holden, the director of research for the security engineering and response team at Arbor Networks. “Where I give them credit is the blending of the threats and the effort they’ve done. In other words, it was a focused attack.”

Adding to its effectiveness was the fact that banks are mandated to provide Web encryption, protected login systems, and other defenses for most online services. These “logic” applications are naturally prone to bottlenecks—and bottlenecks are particularly vulnerable to DDoS techniques. Regulations that prevent certain types of bank traffic from running over third-party proxy servers often deployed to mitigate attacks may also have reduced the mitigation options available once the disruptions started.

No “root” needed

A key ingredient in the success of the attacks was the use of compromised Web servers. These typically have the capacity to send 100 megabits of data every second, about a 100-fold increase over PCs in homes and small offices, which are more commonly seen in DDoS attacks.

In addition to overwhelming targets with more data than their equipment was designed to handle, the ample supply of bandwidth allowed the attackers to work with fewer attack nodes. That made it possible for attackers to more quickly start, stop, and recalibrate the attacks. The nimbleness that itsoknoproblembro and other tools make possible is often available when DDoS attackers wield tens of thousands, or even hundreds of thousands, of infected home and small-office computers scattered all over the world.

“This one appears to exhibit a fair bit of knowledge about how people would go about mediating this attack,” Neal Quinn, the chief operating officer of Prolexic, said, referring to itsoknoproblembro. “Also, it adapts very quickly over time. We’ve been tracking it for a long time now and it evolves in response to defenses that are erected against it.”

Another benefit of itsoknoproblembro is that it runs on compromised Linux and Windows servers even when attackers have gained only limited privileges. Because the DDoS tool doesn’t require the almost unfettered power of “root” or “administrator” access to run, attackers can use it on a larger number of machines, since lower-privileged access is usually much easier for hackers to acquire.

Use of Web servers to disrupt online banking operations also underscores the damage that can result when administrators fail to adequately lock down their machines.

“You’re talking about a server that has a very lackadaisical security process,” said Holden, the Arbor Networks researcher. “Whoever’s servers and bandwidth are being used obviously don’t realize and understand that they’ve got unpatched servers and appliances. [The attackers] have compromised them and are taking advantage of that.”

State sponsored—or a kid in his basement?

Almost all of the attacks were preceded by online posts in which the writer named the specific target and the day its website operations would be attacked. The posts demonstrate that the writer had foreknowledge of the attacks, but there’s little evidence to support claims made in those posts that members of Izz ad-Din al-Qassam Brigades, the military wing of the Hamas organization in the Palestinian Territories, were responsible.

In addition, none of the five experts interviewed for this article had any evidence to support claims the attacks were sponsored or carried out by Iran, as recently claimed by US Senator Joseph Lieberman.

“I don’t think there’s anything about these attacks that’s so large or so sophisticated that it would have to be state sponsored,” said Prince, the CloudFlare CEO. “This very well could be a kid sitting in his mom’s basement in Ohio launching these attacks. I think it’s dangerous to start speculating that this is actually state sponsored.”

“Those are big attacks, but they’re not so unprecedented that it’s worth a press release.”

Indeed, the assaults seen to date lack most of the characteristics found in other so-called “hacktivist” attacks, in which attackers motivated by nationalist, political, or ideological leanings strike out at people or groups they view as adversaries. Typical hacktivist DDoS attacks wield bandwidth in the range of 1Gbps to 4Gbps, far less than the 60Gbps torrents seen in these attacks, said Michael Smith, senior security evangelist for Akamai Technologies. Also missing from these attacks is what he called “primary recruitment,” in which organizers seek grass-roots supporters and provide those supporters with the tools to carry out the attacks.

“Hacktivists will use many different tools,” he explained. “You will see various signatures of tools hitting you. To us, the traffic is homogeneous.”

Based on the nimbleness of the attacks, Smith speculated that a disciplined group, possibly tied to an organized crime outfit, may be responsible. Organized crime groups sometimes wage DDoS attacks on banks at the same time they siphon large amounts of money from customers’ accounts. The disruption caused by attacks is intended to distract banking officials until the stolen funds are safely in the control of the online thieves.

The cavemen get better clubs

When websites in the late 1990s began buckling under mysterious circumstances, many observers attributed almost super-human attributes to the people behind the disruptions. In hindsight, we know the DDoS attacks that brought these sites down were, as Prince puts it, no more sophisticated than a caveman wielding a club. It’s fair to say that the groups responsible for a string of attacks over the past year or so—including the recent attacks on banks—have identified a technical innovation that allows those clubs to pierce current defenses in a way that hadn’t been seen before. Such breakthroughs are common in the security world, but more often than not, they’re quickly rebuffed by countermeasures assembled by defenders.

More importantly, it’s grossly premature to compare these attacks to Stuxnet, the highly sophisticated malware the US and Israel designed to disrupt Iran’s nuclear program, or to declare the spate of attacks “Financial Armageddon.”

It’s also important to remember that DDoS attacks aren’t breaches of a bank’s internal security. No customer data is ever accessed, and no funds are affected. And while torrents of 60Gbps are impressive, they are by no means historical; CloudFlare’s Prince said that he sees attacks of that magnitude about once a month.

“Those are big attacks,” he said, “but they’re not so unprecedented that it’s worth a press release.”


Changing Architectures: New Datacenter Networks Will Set Your Code And Data Free

One consequence of IT standardization and commodification has been Google’s datacenter is the computer view of the world. In that view all compute resources (memory, CPU, storage) are fungible. They are interchangeable and location independent, individual computers lose identity and become just a part of a service.

Thwarting that nirvana has been the abysmal performance of commodity datacenter networks which have caused the preference of architectures that favor the collocation of state and behaviour on the same box. MapReduce famously ships code over to storage nodes for just this reason.

Change the network and you change the fundamental assumption driving collocation based software architectures. You are then free to store data anywhere and move compute anywhere you wish. The datacenter becomes the computer.

On the host side with an x8 slot running at PCI-Express 3.0 speeds able to push 8GB/sec (that’s bytes) of bandwidth in both directions, we have enough IO to feed Moore’s progeny, wild packs of hungry hungry cores. And in the future System on a Chip architectures will integrate the NIC into the CPU and even faster speeds will be possible. Why we are still using TCP and shoving data through OS stacks in the datacenter is a completely separate question.

The next dilemma is how to make the network work. The key to bandwidth nirvana is explained by Microsoft in MinuteSort with Flat Datacenter Storage, which shows how in a network with enough bisectional bandwidth every computer can send data at full speed to every computer, which allows data to be stored remotely, which means data doesn’t have to be stored locally anymore.

What the heck is bisectional bandwidth? If you draw a line somewhere in a network bisectional bandwidth is the rate of communication at which servers on one side of the line can communicate with servers on the other side. With enough bisectional bandwidth any server can communicate with any other server at full network speeds.

Wait, don’t we have high bisectional bandwidth in datacenters now? Why no, no we don’t. We typically have had networks optimized for sending traffic North-South rather than East-West. North-South means your server is talking to a client somewhere out in the Internet. East-West means you are talking to another server within the datacenter. Pre cloud software architectures communicated mostly North-South, to clients located outside in the Internet. Post cloud most software functionality is implemented by large clusters that talk mostly to each other, that is East-West, with only a few tendrils of communication shooting North-South. Recall how Google has pioneered large fanout architectures where creating a single web page can take a 1000 requests. Large fanout architectures are the new normal.

Datacenter networks have not kept up with the change in software architectures. But it’s even worse than that. To support mostly North-South traffic with a little East-West traffic, datacenters used a tree topology with core, aggregation, and access layers. The idea being that the top routing part of the network has enough bandwidth to handle all the traffic from all the machines lower down in the tree. Economics made it highly attractive to highly oversubscribe, like 240-1, the top layer of the network. So if you want to talk to a machine in some other part of the datacenter you are in for a bad experience. Traffic has to traverse highly oversubscribed links. Packets go drop drop fizz fizz.

Creating an affordable high bisectional bandwidth network requires a more thoughtful approach. The basic options seem to be to change the protocols, change the routers, or change the hosts. The approach Microsoft came up with was to change the host and add a layer of centralized control.

Their creation is fully described in VL2: A Scalable and Flexible Data Center Network:

A practical network architecture that scales to support huge data centers with uniform high capacity between servers, performance isolation between services, and Ethernet layer-2 semantics. VL2 uses (1) flat addressing to allow service instances to be placed anywhere in the network, (2) Valiant Load Balancing to spread traffic uniformly across network paths, and (3) end-system based address resolution to scale to large server pools, without introducing complexity to the network control plane.

The general idea is to create a flat L2 network using a CLOS topology. VMs keep their IP addresses forever and can move anywhere in the datacenter. L2 ARP related broadcast problems are sidestepped by changing ARP to use a centralized registration service to resolve addresses. No more broadcast storms.

This seems strange, but I attended a talk at Hot Interconnects on VL2 and the whole approach is quite clever and seems sensible. The result delivers the low cost, high bandwidth, low latency East-West flows needed by modern software architectures. A characteristic that seems to be missing in Route Anywhere vSwitch type approaches. You can’t just overlay in performance when the underlying topology isn’t supportive.

Now that you have this super cool datacenter topology what do you do with it? Microsoft implemented a version of  the MinuteSort benchmark that was 3 times faster than Hadoop, sorting nearly three times the amount of data with about one-sixth the hardware resources (1,033 disks across 250 machines vs. 5,624 disks across 1,406 machines).

Microsoft built the benchmark code on top of the Flat Datacenter Storage (FDS) system, which is distributed blob storage system:

Notably, no compute node in our system uses local storage for data; we believe FDS is the first system with competitive sort performance that uses remote storage. Because files are all remote, our 1,470 GB runs actually transmitted 4.4 TB over the network in under a minute

FDS always sends data over the network. FDS mitigates the cost of data transport in two ways. First, we give each storage node network bandwidth that matches its storage bandwidth. SAS disks have read performance of about 120MByte/sec, or about 1 gigabit/sec, so in our FDS cluster a storage node is always provisioned with at least as many gigabits of network bandwidth as it has disks. Second, we connect the storage nodes to compute nodes using a full bisection bandwidth network—specifically, a CLOS network topology, as used in projects such as Monsoon. The combination of these two factors produces an uncongested path from remote disks to CPUs, giving the  system an aggregate I/O bandwidth essentially equivalent to a system such as MapReduce that uses local storage. There is, of course, a latency cost. However, FDS by its nature allows any compute node to access any data with equal throughput.

Details are in the paper, but as distributed file systems have become key architectural components it’s important for bootstrapping purposes to have one that takes advantage of this new datacenter topology.

With 10/100 Gbps networks on the way and technologies like VL2 and FDS, we’ve made good progress at making CPU, RAM, and storage fungible pools of resources within a datacenter. Networks still aren’t fungible, though I’m not sure what that would even mean.Software Defined Networking will help networks to become first class objects, which seems close, but for performance reasons networks can never really be disentangled from their underlying topology.

What can we expect from these developments? As fungibility is really a deeper level of commoditization we should expect to see the destruction of approaches based on resource asymmetry, even higher levels of organization, greater levels of consumption, the development of new best practices, and even greater levels of automation should drive even more competition in the ecosystem space.