Google Launches Cloud CDN Alpha

Earlier this month, Google announced an Alpha Cloud Content Delivery Network (CDN) offering. The service aims to provide static content closer to end users by caching content in Google’s globally distributed edge caches.  Google provides many more edge caches than it does data centers and as a result content can be provided quicker than making full round trip requests to a Google data center. In total, Google provides over 70 Edge points of presence which will help address customer CDN needs.

In order to use Cloud CDN, you must use Google’s Compute Engine HTTP(s) load balancers on your instances. Enabling an HTTP(S) load balancer is achieved through a simple command.

In a recent post, Google explains the mechanics of the service in the following way: “When a user requests content from your site, that request passes through network locations at the edges of Google’s network, usually far closer to the user than your actual instances. The first time that content is requested, the edge cache sees that it can’t fulfill the request and forwards the request on to your instances. Your instances respond back to the edge cache, and the cache immediately forwards the content to the user while also storing it for future requests. For subsequent requests for the same content that pass through the same edge cache, the cache responds directly to the user, shortening the round trip time and saving your instances the overhead of processing the request.”

The following image illustrates how Google leverages Edge point of presence caches to improve responsiveness.

Architecture

Image Source: https://cloud.google.com/compute/docs/load-balancing/http/cdn

Once the CDN service has been enabled, caching will automatically occur for all cacheable content. Cacheable content is typically defined by requests made through an HTTP GET request.  The service will respect explicit Cache-Control headers taking into account for expiration or make age headers.  Some responses will not be cached including ones that include Set-Cookie headers, message bodies that exceed 4 mb in size or where caching has been explicitly disabled through no-cache directives.  A complete list of cached rules can be found in Google’s documentation.

Google has traditionally partnered with 3rd parties in order to speed up the delivery of content to consumers.  These partnerships include Akamai, Cloudflare, Fastly, Level 3 Communications and Highwinds.

Other cloud providers also have CDN offerings including Amazon’s CloudFront and Microsoft’s Azure CDN.  Google will also see competition from Akamai, one of the aforementioned partners, who has approximately 16.3% CDN market share of the Alexa top 1 million sites.
(via InfoQ.com)

Google’s Cloud Pub/Sub Real-Time Messaging Service Is Now In Public Beta

2856173673_27b7e5e521_o

Google is launching the first public beta of Cloud Pub/Sub today, its backend messaging service that makes it easier for developers to pass messages between machines and to gather data from smart devices. It’s basically a scalable messaging middleware service in the cloud that allows developers to quickly pass information between applications, no matter where they’re hosted. Snapchat is already using it for its Discover feature and Google itself is using it in applications like its Cloud Monitoring service.

Pub/Sub was in alpha for quite a while. Google first (quietly) introduced it at its I/O developer conference last year, it never made a big deal about the service. Until now, the service was in private alpha, but starting today, all developers can use the service.

cps_integration

 

Using the Pub/Sub API, developers can create up to 10,000 topics (that’s the entity the application sends its messages to) and send up to 10,000 messages per second. Google says notifications should go out in under a second “even when tested at over 1 million messages per second.”

The typical use cases for this service, Google says, include balancing workloads in network clusters, implementing asynchronous workflows, logging to multiple systems, and data streaming from various devices.

During the beta period, the service is available for free. Once it comes out of beta, developers will have to pay $0.40 per million for the first 100 million API calls each month. Users who need to send more messages will pay $0.25 per million for the next 2.4 billion operations (that’s about 1,000 messages per second) and $0.05 per million for messages above that.

Now that Pub/Sub has hit beta — and Google even announced the pricing for the final release — chances are we will see a full launch around Google I/O this summer.

many-to-many

(via Techcrunch.com)

The Stunning Scale Of AWS And What It Means For The Future Of The Cloud

16238755496_4a3014ebbb_mJames Hamilton, VP and Distinguished Engineer at Amazon, and long time blogger of interesting stuff, gave an enthusiastic talk at AWS re:Invent 2014 on AWS Innovation at Scale. He’s clearly proud of the work they are doing and it shows.

James shared a few eye popping stats about AWS:

  • 1 million active customers
  • All 14 other cloud providers combined have 1/5th the aggregate capacity of AWS (estimate by Gartner)
  • 449 new services and major features released in 2014
  • Every day, AWS adds enough new server capacity to support all of Amazon’s global infrastructure when it was a $7B annual revenue enterprise (in 2004).
  • S3 has 132% year-over-year growth in data transfer
  • 102Tbps network capacity into a datacenter.

The major theme of the talk is the cloud is a different world. It’s a special environment that allows AWS to do great things at scale, things you can’t do, which is why the transition from on premise x86 servers to the public cloud is happening at a blistering pace. With so many scale driven benefits to the public cloud, it’s a transition that can’t be stopped. The cloud will keep getting more reliable, more functional, and cheaper at a rate that you can’t begin to match with your limited resources, generalist gear, bloated software stacks, slow supply chains, and outdated innovation paradigms.

That’s the PR message at least. But one thing you can say about Amazon is they are living it. They are making it real. So a healthy doubt is healthy, but extrapolating out the lines of fate would also be wise.

One of the fickle finger of fate advantages AWS has is resources. At one million customers they have the scale to keep the engine of expansion and improvement going. Profits aren’t being taken out, money is being reinvested. This is perhaps the most important advantage of scale.

But money without smarts is simply waste. Amazon wants you to know they have the smarts. We’ve heard how Google and Facebook build their own gear, Amazon does too. They build their own networking gear, networking software, racks, and they work with Intel to get faster processor versions of processors than are available on the market. The key is they know everything and control everything about their environment, so they can build simpler gear that does exactly what they want, which turns out to be cheaper and more reliable in the end.

Complete control allows quality metrics to be built into everything. Metrics drive a constant quality increase in all parts of the system, which is why against all odds AWS is getting more reliable as the pace of innovation quickens. Great pools of actionable data turned into knowledge is another huge advantage of scale.

Another thing AWS can do that you can’t is the Availability Zone architecture itself. Each AZ is its own datacenter and AZs within a region are located very close together. This reduces messaging latencies, which means state can be synchronously replicated between AZs, which greatly improves availability compared to the typical approach where redundant datacenters are very far apart.

It’s a talk rich with information and…well, spunk. The real meta-theme of the talk is how Amazon consciously uses scale to their competitive advantage. For Amazon scale isn’t just an expense to be dealt with, scale is a resource to exploit, if you know how.

Here’s my gloss of James Hamilton’s incredible talk…

Everything In The Talk Has A Foundation In Scale

  • Every day, AWS adds enough new server capacity to support all of Amazon’s global infrastructure when it was a $7B annual revenue enterprise (in 2004).

  • 365 days a year component manufacturers have to get gear to server and storage manufacturers, the server and storage manufacturers have to produce the gear and push it into the logistics channel, it has to get from the logistics channel over to the correct datacenter, it has to arrive at the loading dock, people have be there to wheel the racks to the right place in the DC, there has to be power, cooling, networking, the app stack has to be loaded up, it has to be tested, it has to be released to customers.

  • S3 usage: 132% year-over-year growth in data transfer; EC2 usage: 99% YoY usage growth; AWS overall business: over 1 million active customers.

  • All 14 other cloud providers combined have 1/5th the aggregate capacity of AWS (estimate by Gartner)

  • With over a million customers it means you are in a rich ecosystem. You have your pick of software vendors, if you have a problem someone has likely had it before, it’s easier to get your job done fast.

  • Such high growth means Amazon has the resources to keep reinvesting and innovating by increasing breadth and depth of services they offer.

  • Big Transitions generally occur when the economics are far superior, like mainframes to UNIX servers and then UNIX servers to x86 servers. These transitions usually take 10+ years. What’s different about the x86 on premise transition to the cloud is the speed at which it is happening.  The speed of the cloud transition is a function of great economic value along with the low friction for adoption. You don’t need software, you don’t hardware, you can just do it.

There Are Big Cost Problems In Networking

  • Networking is a red alert situation across the industry. It’s the perfect storm.

  • Problem #1: The cost of networking is escalating relative the cost of all other equipment. It’s anti-Moore’s law. All other gear is going down in cost, networking is getting relatively more expensive over time. Relative monthly costs: servers: 57%; networking equipment: 8%; power distribution and cooling: 18%; power: 13%; other: 4%.

  • Problem #2: At the same time networking is getting more expensive, the ratio of networking to compute is going up. That’s partly because Moore’s law is working (still) with servers and compute density is going up. Partly it’s because as the cost of compute falls the amount of advanced analytics performed goes up and analytics are network intensive. Solving big problems using a large number of servers requires a lot of networking. Network traffic has moved east-west rather than the traditional north-south direction.

  • Amazon’s solution 5 years ago was data driven and radical: they built to their own networking designs. Special routers were built. A team was hired to build the protocol stack all the way to the top. And they deployed all this themselves in their network. All services worldwide run on this gear.

    • This strategy turned out to be a lot cheaper. Just the support contract for networking gear was running 10s of millions of dollars.

    • Availability went up. The source of the improvement was simplicity. The problem AWS was trying to solve was simpler than the problem enterprise gear tries to solve. Enterprise gear must adhere to a lot of complicated specs that go unused and only make the system more fragile. By implementing just the functionality that was required meant a much simpler system which lead to higher availability. Any way to win is a good way to win.

    • A cornucopia of metrics. They measure everything. The rule is if a customer has a bad experience using their system their metrics must show it. This means metrics are getting better all the time because customer problems drive better metrics. Once you have metrics that accurately reflect customer experience then goals can be set on making the system better. Every week improvements are made to make things better. If the code didn’t start off better, it gets better over time.

    • Testability. Their gear was better because they tested it better. Enterprise gear is hard to test at scale. They created a $40 million test bed of 8000 servers (3 megawatts). But since this was the cloud they effectively rented the servers for a few months, so it was relatively cheap.

Networking Explained Layer By Layer, From The Very Top To The Network Interface Card

AWS Worldwide Network Backbone

  • 11 AWS regions worldwide. Choose which ones to use by nearness to customers or required jurisdictional boundaries.

  • Private fiber links interconnect most of the major regions. This avoids all the capacity planning problems (Amazon can do better capacity planning), peering issues, and buffering problems that occur on public links. So it’s faster to run their own network, it’s more reliable, cheaper, and lower latency.

Example AWS Region (US East ((Northern Virginia))

  • All regions have at least two availability zones. US East has five AZs.

  • Redundant paths run to transit centers.

  • Each region has redundant transit centers. A transit center connects private links to other AWS regions, private links to AWS Direct Connect customers, and to the internet through peering and paid transit.

  • If one AZ fails all the other AZs keep working.

  • Metro-area DWDM links between AZs

  • 82,864 fiber strands in a regions

  • AZs are less than 2ms apart and usually less than 1ms apart. From a latency perspective they are fairly close, within a few kilometers. Far enough apart for safety, close enough for good latency.

  • 25Tbps peak traffic between AZs

  • AWS offers AZs because:

    • With a single hardened datacenter the best reliability you’ll get is around 99.9% over a mix of applications over a large period of time. High reliability requires running in two datacenters. Traditionally datacenter diversity is from two datacenters that are very far apart because it’s not cost effective to keep datacenters close together. This means longer latencies. LA to NEW is 74ms round trip. Committing to an SSD is 1 to 2ms. You can’t wait 70+ milliseconds for a transaction to commit. Which means applications commit locally and then replicate to the second datacenter. This design in a failure case loses data during the failover. While a true failure is rare, like a building burning down, transient failures are more common, like a load balancer problem for example. So would you failover your connection was down for 3 minutes? No, because data would be lost and it would take a long time to recover that data from other sources. So you lose availability for common events.

    • AZs are milliseconds apart so you can commit to both at the same time. That means if you failover a customer won’t be able to tell because the data replication was synchronous. It’s invisible. It’s hard to write code for this model so you won’t do it for everything. And for some apps a concern for multi-AZ failures might also prevent you from using multiple AZs, but for the rest of apps this is a very powerful model. It’s more costly, but it gives AWS certain advantages.

Example AWS Availability Zone

  • An Availability Zone is always a datacenter in a completely independent building.

  • Amazon has 28+ datacenters. The plus means there are more datacenters in an AZ as a way of extending capacity for an AZ. More datacenters are added within an AZ to extend the capacity of an AZ. Otherwise you would be forced to split your app across AZs, even if you didn’t want to.

  • Some AZs have size fairly substantial datacenters.

  • DCs in an AZ are less than ¼ms apart.

Example Datacenter

  • AWS datacenters are purposely not gigantic. A single datacenter is 25 – 30 megawatts, with between 50,000 – 80,000 servers

  • The return on datacenter largeness diminishes. The advantage of datacenter scale as you build bigger and bigger goes down. Early advantages are huge. Later advantages are small. Going from 2000 to 2500 racks is a little better. A tiny datacenter is too expensive. A really large datacenter is only marginally more expensive per rack than a medium datacenter.

  • Risk increases with larger datacenters. The blast radius if something goes wrong and the datacenter is destroyed, the loss is too big.

  • 102Tbps network capacity into a datacenter.

Example Rack, Server & NIC

  • The only thing that matters for latency is the software latency at either end of a connection. Latency between two servers when sending a message:

    • Your app -> guest OS -> hypervisor -> NIC : the latency is milliseconds

    • through the NIC : the latency is microseconds

    • over the fiber : the latency is nanoseconds

  • SR-IOV (Single Root I/O Virtualization) allows a NIC to provide virtualize in hardware network cards. Each guest gets its own network card. The benefit is > 2x average latency reduction and > 10x latency jitter improvement. This means outliers a down to a 1/10th of what they were. SR-IOV is being deployed now on newer instances types and will eventually be everywhere. The hard part wasn’t adding SR-IOV, it was adding isolation, metering, DDoS protection, and the capacity limits that make SR-IOV useful in a cloud environment.

AWS Custom Server & Storage Designs

  • Cost of a negative situation is not high so expensive unneeded protection can be removed. Servers are designed for what they do, not a general population of users. Amazon knows exactly in what environment the server will run in, they’ll know exactly when something goes wrong, so the servers can be designed with less engineering headroom. The cost of server failure is not that big for them. They are already on site and are very good at replacing harddisks, etc. So a lot of the carefulness in enterprise equipment is not necessary.

  • Processors can be pushed harder. They know the cooling requirements, they influence the mechanical design, they just design good servers, so they can get more performance out of a server. Though a partnership with Intel Amazon has processors that run faster than can be bought on the open market.

  • An example is the design for their own storage rack. It weighs over a ton, 19” wide, and holds 864 disk drives. For some workloads this a wonderful game changing design that helps them get better prices in some areas.

Power Infrastructure

  • Amazon has designed and built their own power substations. It only saves a little money, but they can build them much faster. Utility companies are not used to dealing with the rate AWS is growing at, so they had to build their own.

  • 3 100% carbon neutral regions: US West (Oregon), AWS GovCloud (US), EU (Frankfurt)

Rapid Pace Of Innovation

  • 449 new services and major features released in 2014. 24 in 2008, 48 in 2009, 61 in 2010, 82 in 2011, 159 in 2012, 280 in 2013.

  • AWS is getting more reliable as the pace of innovation quickens. Their goal is to make available to customers the same tools that use to achieve this rate of innovation and high quality.

( via HighScalability.com )

Spark Sets New Record in Sort Performance

Databricks, a company founded by the creators of Apache Spark, has recently announced a new record in the Daytona GraySort contest using the Spark processing engine. The Daytona GraySort contest is a 3rd party benchmark measuring how fast a system can sort 100 Terabytes of data.Databricks posted a throughput of 4.27 TB/min over a cluster of 206 machines for their official run which constitutes a 3x performance improvement, using 10x fewer machines when compared to the previous record submitted by Yahoo! running Hadoop MapReduce.

In a blog post announcing their submission to the Daytona GraySort contest, Databricks explained some of the technological improvements recently introduced to Spark that allowed it to sustain such a large throughput.

Spark 1.1 introduced a new shuffle implementation called sort-based shuffle. The previous shuffle implementation required an in-memory buffer for each partition in the shuffle which lead to notable memory overhead. The new sort-based shuffle requires only one in-memory buffer at a time. This significantly reduced the memory usage and allowed for considerably more tasks to be run concurrently on the same hardware.

In addition to the new shuffle algorithm, the network module was revamped based on Netty’s native Epoll socket transport which maintains its on pool of memory, bypassing the JVM’s memory allocator and reducing the impact of garbage collection. The new network module was then used to build an external shuffle service to allow shuffled files to be served even during garbage collection pauses in the main Spark executor.

Finally, Spark 1.1 included TimSort as its new default sorting algorithm. TimSort is derived from merge sort and insertion sort and performs better than quicksort in most real-world datasets, especially for datasets that are partially ordered.

All of these improvements allowed the Spark cluster to sustain 3GB/s/node I/O activity during the map phase, and 1.1 GB/s/node network activity during the reduce phase which saturated the 10Gbps ethernet link.

Spark is an advanced execution engine born out of research done at the AMPLab at UC Berkley. It allows programs to run up to 10x faster than Hadoop MapReduce and when data is on disk, and up to 100x faster when data resides in memory. Spark supports programs written in Java, Scala or Python and uses familiar functional programming constructs to build data processing flows.

Spark has garnered significant attention as a next generation execution platform for Hadoop and is seen by some as a replacement for MapReduce. It graduated to a top level Apache project in February and since then has been included in the Cloudera, Hortonworks and MapR’s Hadoop distributions. More recently, Hortonworks announced they will support running Hive on Spark as part of their Stinger.next initiative.

Databricks was founded in 2013 as a commercial entity supporting Spark and its associated projects. Those projects include Spark Streaming for stream processing, Spark SQL for querying Hive data and MLlib for machine learning.

(Via InfoQ.com)

Amazon Announces EC2 Container Service For Managing Docker Containers On AWS

Earlier this year I wrote about container computing and enumerated some of the benefits that you get when you use it as the basis for a distributed application platform: consistency & fidelity, development efficiency, and operational efficiency. Because containers are lighter in weight and have less memory and computational overhead than virtual machines, they make it easy to support applications that consist of hundreds or thousands of small, isolated “moving parts.” A properly containerized application is easy to scale and maintain, and makes efficient use of available system resources.

Introducing Amazon EC2 Container Service

meter80_1
In order to help you to realize these benefits, we are announcing a preview of our new container management service, EC2 Container Service (or ECS for short). This service will make it easy for you to run any number of Docker containers across a managed cluster of Amazon Elastic Compute Cloud (EC2) instances using powerful APIs and other tools. You do not have to install cluster management software, purchase and maintain the cluster hardware, or match your hardware inventory to your software needs when you use ECS. You simply launch some instances in a cluster, define some tasks, and start them. ECS is built around a scalable, fault-tolerant, multi-tenant base that takes care of all of the details of cluster management on your behalf.

By the way, don’t let the word “cluster” scare you off! A cluster is simply a pool of compute, storage, and networking resources that serves as a host for one or more containerized applications. In fact, your cluster can even consist of a single t2.micro instance. In general, a single mid-sized EC2 instance has sufficient resources to be used productively as a starter cluster.

EC2 Container Service Benefits
Here’s how this service will help you to build, run, and scale Docker-based applications:

  • Easy Cluster Management – ECS sets up and manages clusters made up of Docker containers. It launches and terminates the containers and maintains complete information about the state of your cluster. It can scale to clusters that encompass tens of thousands of containers across multiple Availability Zones.
  • High Performance – You can use the containers as application building blocks. You can start, stop, and manage thousands of containers in seconds.
  • Flexible Scheduling – ECS includes a built-in scheduler that strives to spread your containers out across your cluster to balance availability and utilization. Because ECS provides you with access to complete state information, you can also build your own scheduler or adapt an existing open source scheduler to use the service’s APIs.
  • Extensible & Portable – ECS runs the same Docker daemon that you would run on-premises. You can easily move your on-premises workloads to the AWS cloud, and back.
  • Resource Efficiency – A containerized application can make very efficient use of resources. You can choose to run multiple, unrelated containers on the same EC2 instance in order to make good use of all available resources. You could, for example, decide to run a mix of short-term image processing jobs and long-running web services on the same instance.
  • AWS Integration – Your applications can make use of AWS features such as Elastic IP addresses, resource tags, and Virtual Private Cloud (VPC). The containers are, in effect, a new base-level building block in the same vein as EC2 and S3.
  • Secure – Your tasks run on EC2 instances within an Amazon Virtual Private Cloud. The tasks can take advantage of IAM roles, security groups, and other AWS security features. Containers run in a multi-tenant environment and can communicate with each other only across defined interfaces. The containers are launched on EC2 instances that you own and control.

Using EC2 Container Service
ECS was designed to be easy to set up and to use!

You can launch an ECS-enabled AMI and your instances will be automatically checked into your default cluster. If you want to launch into a different cluster you can specify it by modifying the configuration file in the image, or passing in User Data on launch. To ECS-enable a Linux AMI, you simply install the ECS Agent and the Docker daemon.

ECS will add the newly launched instance to its capacity pool and run containers on it as directed by the scheduler. In other words, you can add capacity to any of your clusters by simply launching additional EC2 instances in them!

The ECS Agent will be available in open source form under an Apache license. You can install it on any of your existing Linux AMIs and call registerContainerInstances to add them to your cluster.

grid80_1

Here are a few vocabulary items to help you to get familiar with the terminology used by ECS:
  • Cluster – A cluster is a pool of EC2 instances in a particular AWS Region, all managed by ECS. One cluster can contain multiple instance types and sizes, and can reside within one or more Availability Zones.
  • Scheduler – A scheduler is associated with each cluster. The scheduler is responsible for making good use of the resources in the cluster by assigning containers to instances in a way that respects any placement constraints and simultaneously drives as much parallelism as possible, while also aiming for high availability.
  • Container – A container is a packaged (or “Dockerized,” as the cool kids like to say) application component. Each EC2 instance in a cluster can serve as a host to one or more containers.
  • Task Definition – A JSON file that defines a Task as a set of containers. Fields in the file define the image for each container, convey memory and CPU requirements, and also specify the port mappings that are needed for the containers in the task to communicate with each other.
  • Task – A task is an instantiation of a Task Definition consisting of one or more containers, defined by the work that they do and their relationship to each other.
  • ECS-Enabled AMI – An Amazon Machine Image (AMI) that runs the ECS Agent and dockerd. We plan to ECS-enable the Amazon Linux AMI and are working with our partners to similarly enable their AMIs.

EC2 Container Service includes a set of APIs that are both simple and powerful. You can create, describe, and destroy clusters and you can register EC2 instances therein. You can create task definitions and initiate and manage tasks.

Here is the basic set of steps that you will follow in order to run your application on ECS. I am making the assumption that you have already Dockerized your application by breaking it down in to fine-grained components, each described by a Dockerfile and each running nicely on your existing infrastructure. There are plenty of good resources online to help you with this process. Many popular applications application components have already been Dockerized and can be found on Docker Hub. You can use ECS with any public or private Docker repository that you can access. Ok, so here are the steps:

  1. Create a cluster, or decide to use the default one for your account in the target Region.
  2. Create your task definitions and register them with the cluster.
  3. Launch some EC2 instances and register them with the cluster.
  4. Start the desired number of copies of each task.
  5. Monitor the overall utilization of the cluster and the overall throughput of your application, and make adjustments as desired. For example, you can launch and then register additional EC2 instances in order to expand the cluster’s pool of available resources.

EC2 Container Service Pricing and Availability
The service is launch today in preview form. If you are interested in signing up, click here to join the waiting list.

There is no extra charge for ECS. As usual, you pay only for the resources that you use.

( Via Amazon blog)

Google Launches Managed Service For Running Docker-Based Applications On Its Cloud Platform

docker-google

Google today announced the alpha launch of Google Container Engine, a new managed service for building and running Docker container-based applications on its cloud platform.

Docker is probably the hottest technology in developer circles these days — it’s almost impossible to have a discussion with a developer without it coming up — and Google’s Cloud Platform team has decided to go all in on this technology that makes it easier for developers to run distributed applications.

In essence, this new service is a “Cluster-as-a-Service” platform based on Google’s open source Kubernetes project. Kubernetes, which helps developers manage their container clusters, is based on Google’s own work with containers in its massive data centers. In this new service, Kubernetes dynamically manages the different Docker containers that make up an application for the user.

Google says the combination of “fast booting, efficient VM hosts and seamless virtualized network integration” will make its cloud computing service “the best place to run container-based applications.” The company’s competitors would likely argue with that, but none of them offer a similar service at this point.

Google initially launched support for Docker images in May as part of its new Managed VM service. These managed VMs are coming out of Google’s limited alpha — with the addition of auto-scaling support — and with that, developers can now use Docker containers on Google’s platform without having to jump through any hoops. Managed VMs will remain in beta for now, though, which in Google’s new language means there is no access control and that charges may be waived, but there is no SLA or technical support obligation either.

And remember, because this new service is officially in alpha, it isn’t feature-complete and the whole infrastructure could melt down at any minute.

(Via TechCrunch.com)

Is Red Hat the new Oracle?

4728543452_bc75fbe9fd_b
Locking in? Trying to replicate its Linux success in the cloud, Red Hat said it will not support Red Hat Linux customers who run a non-Red Hat OpenStack distribution.

Everyone knows that Red Hat, the king of enterprise Linux, is banking on OpenStack as its next big opportunity. And most figured it would be aggressive in competing with rival OpenStack distributions — from Canonical, HP, Suse, and others.

What we didn’t necessarily know until the Wall Street Journal  (registration required) reported it Tuesday night is that Red Hat — which makes its money selling support and maintenance for its open-source products – would refuse to support of users of Red Hat Enterprise Linux who also run non-Red Hat versions of OpenStack.

Since Red Hat accounts for more than 60 percent of the paid enterprise Linux market, that policy could stem adoption of rival OpenStack distributions. It could also irritate customers — many of whom don’t like the specter of vendor lock-in.

In this policy — which the company confirmed to the Journal — Red Hat seems to have ripped a page out of Oracle’s playbook. The database giant, as it expanded into other software areas, decided that it would not support customers running  non-Oracle virtualization, non-Oracle Linux etc., unless the customer could prove that its issue originated in the Oracle part of the stack. That went over like a lead balloon with users. Since Oracle announced its own OpenStack distribution this week — and it also fields its own Linux distribution — the stage is set for dueling single-source OpenStack implementations going forward. Needless to say, that could ding OpenStack’s promise of no-vendor-lock-in.

The Journal also reported that Red Hat employees were told to stop working with Mirantis, an OpenStack systems integrator that late last year started offering its own OpenStack distribution. Red Hat’s president of products and technologies (pictured above) Paul Cormier told the paper that Red Hat would not bring a competitor into its accounts.

Reached late Tuesday for comment, a Red Hat spokeswoman noted that OpenStack “is not simply a layered product on top of Linux — [RHEL] is tightly integrated into and part of OpenStack. It is much more complex and intertwined than, say, Microsoft choosing to run PowerPoint on iOS.” I will update this story with additional Red Hat comment when it becomes available. Mirantis could not be reached for comment.

This news, which came out of the OpenStack Summit in Atlanta, may unsettle corporate customers wary of committing too much of their IT budget to any one vendor. But it’s hardly surprising. All of these vendors, while pledging open-source goodness of OpenStack, also want to expand their own reach in customers’ shops. OpenStack competitors have been wary of Red Hat for quite some time, expecting it to try to replicate its dominance in the enterprise Linux realm with cloud with OpenStack.

To be sure this problem is sort of theoretical now, as IDC Analyst Al Gillen pointed out. “From a practical standpoint, this is a non-issue today since there are precious few OpenStack clouds in production use,” he said via email. He agreed that this smacks of Oracle’s support policies but attributed Red Hat’s action more to the “immaturity and fast release of OpenStack more than anything else.”

Mark Shuttleworth, founder of Canonical, which competes with Red Hat both in Linux and OpenStack, recently acknowledged that the battle front has moved from the single-node Linux server realm — where Red Hat won — to multi-node cloud deployments where Red Hat’s enterprise software licensing mentality could put off customers. This new policy will test how compliant big customers will be to such enterprise sales tactics in the age of cloud.

(Via Gigaom.com)