A Sneak Peak at the Next-Gen Exascale Operating System

There are several scattered pieces in the exascale software stack being developed and clicked together worldwide. Central to that jigsaw effort is the eventual operating system to power such machines.

This week the Department of Energy snapped in a $9.75 million investment to help round out the picture of what such as OS will look like. The grant went to Argonne National Lab for a multi-institutional project (including Pacific Northwest and Lawrence Livermore labs, as well as other universities) aimed at developing a prototype exascale operating system and associated runtime software.

To better understand“Argo”, the exascale OS effort, we spoke with Pete Beckman, Director of the Exascale Technology and Computing Institute and chief architect of the Argo project. Beckman says that as we look forward to the features of these ultra-scale machines, power management, massive concurrency and heterogeneity, as well as overall resiliency, can all be addressed at the OS level.

These are not unfamiliar concerns, but attacking them at the operating system lends certain benefits, argues Beckman. For instance, fine-tuning power control and management at the core operational and workload level becomes possible with a pared-down, purpose-built and HPC-optimized OS.

Outside of power, the team describes the “allowance for massive concurrency, [met by] a hierarchical framework for power and fault management, as well as a “beacon” mechanism that allows resource managers and optimizers to communicate and control the platform.

Beckman and team describe this hierarchy as an “enclave”–in this model the OS is more hierarchical in nature than we traditionally think of it as. In other words, it’s easy to think of a node-level OS–with Argo, there is a global OS that runs across the machine. This, combined with the platform-neutral design of Argo, will make it flexible enough to change with architectures and manageable at both a system and workload level–all packaged in familiar Linux wrappings.

As shown above, these “enclaves,” are defined as a set of resources dedicated to a particular service, and capable of introspection and autonomic response. As Argonne describes, “They can shape-shift the system configuration of nodes and the allocation of power to different nodes or to migrate data or computations from one node to another.” On the reliability front, the enclaves that tackle failure can do so “by means of global restart and other enclaves supporting finer-level recovery.”

The recognizable Linux core of Argo will have been enhanced and modified to meet the needs of more dynamic, next-gen applications. While development on those prototype applications are ongoing, Beckman and the distributed team plan to test Argo’s ability to dive into a host of common HPC applications. Again, all of this will be Linux-flavored, but with an HPC shell that narrows the focus on the problems at hand.

As a side note, leveraging the positive elements of Linux and building into it a robustness and eye on taking on critical power management, concurrency and resiliency features seems like a good idea. If the trend holds, Linux itself will continue to enjoy the lion’s share (by far–96% according to reporting from yesterday) of the OS market on the Top500.

It’s more about refining the role of Linux versus rebuilding it, Beckman explains. While Linux currently is tasked with managing a multi-user, multi-program balancing act with its resources, doling them out fairly, the Argo approach would hone in on the parts of code that need to blaze–wicking away some of the resource balancing functions. “We can rewrite some of those pieces and design runtime systems that are specifically adapted to run those bits of code fast and not try to deal with the balancing of many users ad many programs.”

The idea is to have part of the chip be capable of running the Linux kernel for the basics; things like control systems, booting, command and interface functions, debugging and the like–but as Beckman says, “for the HPC part, we can specialize and have a special component that lives in the chip.”

In that case, there are some middleware pieces that are hidden inside Argo. Beckman said that that software will move closer to the OS. And just as that happens, more software will be tied to the chips–whatever those happen to look like in the far-flung future (let’s say 2020 to be fair). This is all Argonne’s domain–they’ve been one of the leading labs that have worked on the marriage between processor and file systems, runtimes, message passing and other software workings. Beckman expects a merging between many lines–middleware and OS, and of course, both of those with the processor.

“Bringing together these multiple views and the corresponding software components through a whole-system approach distinguishes our strategy from existing designs,” said Beckman. “We believe it is essential for addressing the key exascale challenges of power, parallelism, memory hierarchy, and resilience.”

As of now, numerous groups are working on various pieces of the power puzzle in particular. This is an especially important issue going forward. Although power consumption has always been a concern, Beckman says that the approach now is to optimize systems in advance, “turn the power on, and accept that they will draw what they’re going to draw.” In addition to the other work being done inside the stack to create efficient supers, there is a role for the operating system to play  in orchestrating “smart” use of power for certain parts of the computation or parts of the machine.

(Source: HPCwire.com)

 

Does WebKit face a troubled future now that Google is gone?

chromium-webkit-hands

Now that Google is going its own way and developing its rendering engine independently of the WebKit project, both sides of the split are starting the work of removing all the things they don’t actually need.

This is already causing some tensions among WebKit users and Web developers, as it could lead to the removal of technology that they use or technology that is in the process of being standardized. This is leading some to question whether Apple is willing or able to fill in the gaps that Google has left.

Since Google first released Chrome in 2008, WebCore, the part of WebKit that does the actual CSS and HTML processing, has had to serve two masters. The major contributors to the project, and the contributors with the most widely used browsers, were Apple and Google.

While both used WebCore, the two companies did a lot of things very differently. They used different JavaScript engines (JavaScriptCore [JSC] for Apple, V8 for Google). They adopted different approaches to handling multiple processes and sandboxing. They used different options when compiling the software, too, so their browsers actually had different HTML and CSS features.

The WebCore codebase had to accommodate all this complexity. JavaScript, for example, couldn’t be integrated too tightly with the code that handles the DOM (the standard API for manipulating HTML pages from JavaScript), because there was an intermediary layer to ensure that JSC and V8 could be swapped in and out.

Google said that the decision to fork was driven by engineering concerns and that forking would enable faster development by both sides. That work is already under way, and both teams are now preparing to rip all these unnecessary bits out.

Right now, it looks like Google has it easier. So far, only Google and Opera are planning to use Blink, and Opera intends to track Chromium (the open source project that contains the bulk of Chrome’s code) and Blink anyway, so it won’t diverge too substantially from either. This means that Google has a fairly free hand to turn features that were optional in WebCore into ones that are permanent in Blink if Chrome uses them, or eliminate them entirely if it doesn’t.

Apple’s position is much trickier, because many other projects use WebKit, and no one person knows which features are demanded by which projects. Apple also wants to remove the JavaScript layers and just bake in the use of JSC, but some WebKit-derived projects may depend on them.

Samsung, for example, is using WebKit with V8. But with Google’s fork decision, there’s now nobody maintaining the code that glues V8 to WebCore. The route that Apple wants to take is to purge this stuff and leave it up to third-party projects to maintain their variants themselves. This task is likely to become harder as Cupertino increases the integration between JSC and WebCore.

Oracle is working on a similar project: a version of WebKit with its own JavaScript engine, “Nashorn,” that’s based on the Java virtual machine. This takes advantage of the current JavaScript abstractions, so it’s likely to be made more complicated as Apple removes them.

One plausible outcome for this is further consolidation among the WebKit variants. For those dead set on using V8, switching to Blink may be the best option. If sticking with WebKit is most important, reverting to JSC may be the only practical long-term solution.

Google was an important part of the WebKit project, and it was responsible for a significant part of the codebase’s maintenance. The company’s departure has left various parts of WebKit without any developers to look after them. Some of these, such as some parts of the integrated developer tools, are probably too important for Apple to abandon—even Safari uses them.

Others, however, may be culled—even if they’re on track to become Web standards. For example, Google developed code to provide preliminary support for CSS Custom Properties (formerly known as CSS Variables). It was integrated into WebKit but only enabled in Chromium. That code now has nobody to maintain it, so Apple wants to remove it.

This move was immediately criticized by Web developer Jon Rimmer, who pointed out that the standard was being actively developed by the World Wide Web Consortium (W3C), was being implemented by Mozilla, and was fundamentally useful. The developer suggested that Apple had two options for dealing with Google’s departure from the project: either by “cutting out [Google-developed] features and continuing at a reduced pace, or by stepping up yourselves to fill the gap.”

Discussion of Apple’s ability to fill the Google-sized gap in WebKit was swiftly shut down, but Rimmer’s concern remains the elephant in the room. Removing the JavaScript layer is one thing; this was a piece of code that existed only to support Google’s use of V8, and with JavaScriptCore now the sole WebKit engine, streamlining the code makes sense. Google, after all, is doing the same thing in Blink. But removing features headed toward standardization is another thing entirely.

If Apple doesn’t address Rimmer’s concerns, and if Blink appears to have stronger corporate backing and more development investment, one could see a future in which more projects switch to using Blink rather than WebKit. Similarly, Web developers could switch to Blink—with a substantial share of desktop usage and a growing share of mobile usage—and leave WebKit as second-best.

(via arstechnica.com)

DDoS attacks on major US banks are no Stuxnet—here’s why

The attacks that recently disrupted website operations at Bank of America and at least five other major US banks used compromised Web servers to flood their targets with above-average amounts of Internet traffic, according to five experts from leading firms that worked to mitigate the attacks.

The distributed denial-of-service (DDoS) attacks—which over the past two weeks also caused disruptions at JP Morgan Chase, Wells Fargo, US Bancorp, Citigroup, and PNC Bank—were waged by hundreds of compromised servers. Some were hijacked to run a relatively new attack tool known as “itsoknoproblembro.” When combined, the above-average bandwidth possessed by each server created peak floods exceeding 60 gigabits per second.

More unusually, the attacks also employed a rapidly changing array of methods to maximize the effects of this torrent of data. The uncommon ability of the attackers to simultaneously saturate routers, bank servers, and the applications they run—and to then recalibrate their attack traffic depending on the results achieved—had the effect of temporarily overwhelming the targets.

“This very well could be a kid sitting in his mom’s basement in Ohio launching these attacks.”

 

“It used to be DDoS attackers would try one method and they were kind of one-trick ponies,” Matthew Prince, CEO and founder ofCloudFlare, told Ars. “What these attacks appear to have shown is there are some attackers that have a full suite of DDoS methods, and they’re trying all kinds of different things and continually shifting until they find something that works. It’s still cavemen using clubs, but they have a whole toolbox full of different clubs they can use depending on what the situation calls for.”

The compromised servers were outfitted with itsoknoproblembro (pronounced “it’s OK, no problem, bro”) and other DDoS tools that allowed the attackers to unleash network packets based on the UDP, TCP, HTTP, and HTTPS protocols. These flooded the banks’ routers, servers, and server applications—layers 3, 4, and 7 of the networking stack—with junk traffic. Even when targets successfully repelled attacks against two of the targets, they would still fall over if their defenses didn’t adequately protect against the third.

“It’s not that we have not seen this style of attacks or even some of these holes before,” said Dan Holden, the director of research for the security engineering and response team at Arbor Networks. “Where I give them credit is the blending of the threats and the effort they’ve done. In other words, it was a focused attack.”

Adding to its effectiveness was the fact that banks are mandated to provide Web encryption, protected login systems, and other defenses for most online services. These “logic” applications are naturally prone to bottlenecks—and bottlenecks are particularly vulnerable to DDoS techniques. Regulations that prevent certain types of bank traffic from running over third-party proxy servers often deployed to mitigate attacks may also have reduced the mitigation options available once the disruptions started.

No “root” needed

A key ingredient in the success of the attacks was the use of compromised Web servers. These typically have the capacity to send 100 megabits of data every second, about a 100-fold increase over PCs in homes and small offices, which are more commonly seen in DDoS attacks.

In addition to overwhelming targets with more data than their equipment was designed to handle, the ample supply of bandwidth allowed the attackers to work with fewer attack nodes. That made it possible for attackers to more quickly start, stop, and recalibrate the attacks. The nimbleness that itsoknoproblembro and other tools make possible is often available when DDoS attackers wield tens of thousands, or even hundreds of thousands, of infected home and small-office computers scattered all over the world.

“This one appears to exhibit a fair bit of knowledge about how people would go about mediating this attack,” Neal Quinn, the chief operating officer of Prolexic, said, referring to itsoknoproblembro. “Also, it adapts very quickly over time. We’ve been tracking it for a long time now and it evolves in response to defenses that are erected against it.”

Another benefit of itsoknoproblembro is that it runs on compromised Linux and Windows servers even when attackers have gained only limited privileges. Because the DDoS tool doesn’t require the almost unfettered power of “root” or “administrator” access to run, attackers can use it on a larger number of machines, since lower-privileged access is usually much easier for hackers to acquire.

Use of Web servers to disrupt online banking operations also underscores the damage that can result when administrators fail to adequately lock down their machines.

“You’re talking about a server that has a very lackadaisical security process,” said Holden, the Arbor Networks researcher. “Whoever’s servers and bandwidth are being used obviously don’t realize and understand that they’ve got unpatched servers and appliances. [The attackers] have compromised them and are taking advantage of that.”

State sponsored—or a kid in his basement?

Almost all of the attacks were preceded by online posts in which the writer named the specific target and the day its website operations would be attacked. The posts demonstrate that the writer had foreknowledge of the attacks, but there’s little evidence to support claims made in those posts that members of Izz ad-Din al-Qassam Brigades, the military wing of the Hamas organization in the Palestinian Territories, were responsible.

In addition, none of the five experts interviewed for this article had any evidence to support claims the attacks were sponsored or carried out by Iran, as recently claimed by US Senator Joseph Lieberman.

“I don’t think there’s anything about these attacks that’s so large or so sophisticated that it would have to be state sponsored,” said Prince, the CloudFlare CEO. “This very well could be a kid sitting in his mom’s basement in Ohio launching these attacks. I think it’s dangerous to start speculating that this is actually state sponsored.”

“Those are big attacks, but they’re not so unprecedented that it’s worth a press release.”

Indeed, the assaults seen to date lack most of the characteristics found in other so-called “hacktivist” attacks, in which attackers motivated by nationalist, political, or ideological leanings strike out at people or groups they view as adversaries. Typical hacktivist DDoS attacks wield bandwidth in the range of 1Gbps to 4Gbps, far less than the 60Gbps torrents seen in these attacks, said Michael Smith, senior security evangelist for Akamai Technologies. Also missing from these attacks is what he called “primary recruitment,” in which organizers seek grass-roots supporters and provide those supporters with the tools to carry out the attacks.

“Hacktivists will use many different tools,” he explained. “You will see various signatures of tools hitting you. To us, the traffic is homogeneous.”

Based on the nimbleness of the attacks, Smith speculated that a disciplined group, possibly tied to an organized crime outfit, may be responsible. Organized crime groups sometimes wage DDoS attacks on banks at the same time they siphon large amounts of money from customers’ accounts. The disruption caused by attacks is intended to distract banking officials until the stolen funds are safely in the control of the online thieves.

The cavemen get better clubs

When websites in the late 1990s began buckling under mysterious circumstances, many observers attributed almost super-human attributes to the people behind the disruptions. In hindsight, we know the DDoS attacks that brought these sites down were, as Prince puts it, no more sophisticated than a caveman wielding a club. It’s fair to say that the groups responsible for a string of attacks over the past year or so—including the recent attacks on banks—have identified a technical innovation that allows those clubs to pierce current defenses in a way that hadn’t been seen before. Such breakthroughs are common in the security world, but more often than not, they’re quickly rebuffed by countermeasures assembled by defenders.

More importantly, it’s grossly premature to compare these attacks to Stuxnet, the highly sophisticated malware the US and Israel designed to disrupt Iran’s nuclear program, or to declare the spate of attacks “Financial Armageddon.”

It’s also important to remember that DDoS attacks aren’t breaches of a bank’s internal security. No customer data is ever accessed, and no funds are affected. And while torrents of 60Gbps are impressive, they are by no means historical; CloudFlare’s Prince said that he sees attacks of that magnitude about once a month.

“Those are big attacks,” he said, “but they’re not so unprecedented that it’s worth a press release.”

(via Arstechnica.com)