- Hits: 626
- 0 Comments
- Bookmark
976RA Cloud IT Services: Designed to Fail?
What is Happening? Apple’s Siri outage highlights increasing reliance on Cloud for even the most simple tasks, such as getting directions or finding a telephone number. It’s a terrific example of how the simplicity of a natural user interface can mask the tremendous complexities required to enable and accomplish that simplicity. It’s also a terrific example of continuing, increasing Business user expectations of Cloud use, and a complacency toward Cloud resiliency.
It’s almost paradoxical: Cloud service outages such as those experienced by users of Siri, RIM, and AWS highlight the challenges inherent in relying on Cloud IT for business, but they also fail to slow most business users (and consumers) from migrating more and more of their data, applications, and operations (and lives) into Cloud-based IT.
Saugatuck sees this as an acceptance of, and even an increasing expectation of, occasional failure. That approach may work adequately for most individual users, but cannot be acceptable for business applications and operations management.
Unfortunately, most Cloud providers are architecting their systems in ways that satisfy the “adequate” expectations of individual users, while trying to woo enterprise and SMB IT organizations into placing ever-more-critical capabilities and operations into Cloud-based services. Until these approaches change, we are likely to see more, and likely more significant, Cloud service outages based on server failure, resulting in a slowing down of enterprise-level business application and management migration to the Cloud.
Why is it Happening? As we have said many times before, we see Cloud-based infrastructures as among the most capable, reliable, and secure IT infrastructures available. They tend to be designed and built from the ground to the Cloud to enable immensely scalable demand, while delivering performance, security, and reliability beyond the capabilities of all but the largest and most technologically sophisticated enterprise IT organizations.
Even so, most Cloud infrastructures that we have seen are still built to fail. The low cost of server hardware and the efficiencies available via virtualization mean that clusters of servers with virtualized images and failover functionality are the norm. The systems are managed and monitored so that failover takes place as planned or as needed.
A key challenge with such (very traditional) infrastructure approaches is that server failure, for whatever reasons (e.g., power outages, overheating, power spikes, viral infections) is something that tends to be responded to, rather than prevented or avoided. Most Cloud SLAs are built around an actuarial approach that combines average and predictable uptime expectations and downtime recovery periods.
This approach is more than adequate for many business uses, especially the types of Cloud use we see by individuals. And downtime can be mitigated for many enterprise workloads that might be driven to the Cloud. Figure 1 summarizes five key types of factors that tend to cause server downtime, along with simple mitigation approaches and alternatives.
Figure 1: Server Downtime Drivers vs. Mitigation Approaches
Source: Saugatuck Technology Inc.
But failure during a transaction means lost data, lost revenue, and/or lost customer satisfaction, and response to failure takes a variable amount of time that tends to be unpredictable. Immensely-scalable Cloud server capabilities lead to immensely-scalable failures of indefinite duration. This is magnified by the growing use of increasingly-complex systems that are required to enable a series of relatively common business tasks for millions of users simultaneously (e.g., Apple’s Siri).
Market Impact Cloud-based infrastructures are among the most bullet-proof available, and their reliability far exceeds that of most enterprise and SMB IT. Cloud-based services are thus among the most reliable forms of IT available.
But whether on-premise or in the Cloud, IT components and infrastructures are not impervious to failure (Saugatuck Lens 360 Blog, The Last Word: Clouds Fail, So Plan and Manage Accordingly). This is particularly true for something as complex as the infrastructure underlying any Cloud IT offering.
As long as servers and virtualization are relatively inexpensive, and as long as customers are willing to accept a certain level of outages and downtime, we see little to compel Cloud providers to change their architecture approaches. Failure and failover will be built in, as will be their consequences. Experienced IT management teams inherently understand how complexity simultaneously mitigates and amplifies availability challenges. And yet, they, and their associated business executives / leaders, appear to be maintaining dangerous complacency and/or unrealistic expectations about availability of Cloud-based workloads (929RA, Cloud IT Failures Emphasize Need for Expectation Management, 10August2011).
While we do not yet see any strong movement away from using Cloud for business IT, we can see the factors and Click Here to Read the Full RA

