Reliability And Availability Of Cloud Computing - Ase

1y ago
13 Views
2 Downloads
4.31 MB
345 Pages
Last View : 2m ago
Last Download : 2m ago
Upload by : Audrey Hope
Transcription

RELIABILITY ANDAVAILABILITY OFCLOUD COMPUTING

IEEE Press445 Hoes LanePiscataway, NJ 08854IEEE Press Editorial Board 2012John Anderson, Editor in ChiefRamesh AbhariGeorge W. ArnoldFlavio CanaveroDmitry GoldgofBernhard M. HaemmerliDavid JacobsonMary LanzerottiOm P. MalikSaeid NahavandiTariq SamadGeorge ZobristKenneth Moore, Director of IEEE Book and Information Services (BIS)Technical ReviewersXuemei ZhangPrincipal Member of Technical StaffNetwork Design and Performance AnalysisAT&T LabsRocky Heckman, CISSPArchitect AdvisorMicrosoft

RELIABILITY ANDAVAILABILITY OFCLOUD COMPUTINGEric BauerRandee AdamsIEEE PRESSA JOHN WILEY & SONS, INC., PUBLICATION

cover image: iStockphotocover design: Michael RutkowskiITIL is a Registered Trademark of the Cabinet Office in the United Kingdom and other countries.Copyright 2012 by the Institute of Electrical and Electronics Engineers. All rights reserved.Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form orby any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except aspermitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the priorwritten permission of the Publisher, or authorization through payment of the appropriate per-copy fee tothe Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax(978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should beaddressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030,(201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best effortsin preparing this book, they make no representations or warranties with respect to the accuracy orcompleteness of the contents of this book and specifically disclaim any implied warranties ofmerchantability or fitness for a particular purpose. No warranty may be created or extended by salesrepresentatives or written sales materials. The advice and strategies contained herein may not be suitablefor your situation. You should consult with a professional where appropriate. Neither the publisher norauthor shall be liable for any loss of profit or any other commercial damages, including but not limitedto special, incidental, consequential, or other damages.For general information on our other products and services or for technical support, please contact ourCustomer Care Department within the United States at (800) 762-2974, outside the United States at (317)572-3993 or fax (317) 572-4002.Wiley also publishes its books in a variety of electronic formats. Some content that appears in print maynot be available in electronic formats. For more information about Wiley products, visit our web site atwww.wiley.com.Library of Congress Cataloging-in-Publication Data:Bauer, Eric.Reliability and availability of cloud computing / Eric Bauer, Randee Adams.p. cm.ISBN 978-1-118-17701-3 (hardback)1. Cloud computing. 2. Computer software–Reliabillity. 3. Computer software–Qualitycontrol. 4. Computer security. I. Adams, Randee. II. Title.QA76.585.B394 2012004.6782–dc232011052839Printed in the United States of America.10987654321

To our families and friendsfor their continued encouragement and support.

tionxxvI1BASICS1CLOUD COMPUTING1.1Essential Cloud Characteristics1.1.1On-Demand Self-Service1.1.2Broad Network Access1.1.3Resource Pooling1.1.4Rapid Elasticity1.1.5Measured Service1.2Common Cloud Characteristics1.3But What, Exactly, Is Cloud Computing?1.3.1What Is a Data Center?1.3.2How Does Cloud Computing Differ from TraditionalData Centers?1.4Service Models1.5Cloud Deployment Models1.6Roles in Cloud Computing1.7Benefits of Cloud Computing1.8Risks of Cloud Computing34445566789911121415vii

viiiCONTENTS2VIRTUALIZATION2.1Background2.2What Is Virtualization?2.2.1Types of Hypervisors2.2.2Virtualization and Emulation2.3Server Virtualization2.3.1Full Virtualization2.3.2Paravirtualization2.3.3OS Virtualization2.3.4Discussion2.4VM Lifecycle2.4.1VM Snapshot2.4.2Cloning VMs2.4.3High Availability Mechanisms2.5Reliability and Availability Risks of CE RELIABILITY AND SERVICE AVAILABILITY3.1Errors and Failures3.2Eight-Ingredient Framework3.3Service Availability3.3.1Service Availability Metric3.3.2MTBF and MTTR3.3.3 Service and Network Element Impact Outages3.3.4Partial Outages3.3.5Availability Ratings3.3.6Outage Attributability3.3.7Planned or Scheduled Downtime3.4Service Reliability3.4.1Service Reliability Metrics3.4.2Defective Transactions3.5Service Latency3.6Redundancy and High Availability3.6.1Redundancy3.6.2High Availability3.7High Availability and Disaster Recovery3.8Streaming Services3.8.1 Control and Data Planes3.8.2Service Quality Metrics3.8.3Isochronal Data3.8.4Latency Expectations3.8.5Streaming Quality Impairments3.9Reliability and Availability Risks of Cloud 85960606162

CONTENTSII4ixANALYSIS63ANALYZING CLOUD RELIABILITY AND AVAILABILITY4.1Expectations for Service Reliability andAvailability4.2Risks of Essential Cloud Characteristics4.2.1On-Demand Self-Service4.2.2Broad Network Access4.2.3Resource Pooling4.2.4Rapid Elasticity4.2.5Measured Service4.3Impacts of Common Cloud Characteristics4.3.1Virtualization4.3.2Geographic Distribution4.3.3Resilient Computing4.3.4Advanced Security4.3.5Massive Scale4.3.6Homogeneity4.4Risks of Service Models4.4.1Traditional Accountability4.4.2Cloud-Based Application Accountability4.5IT Service Management and Availability Risks4.5.1ITIL Overview4.5.2Service Strategy4.5.3Service Design4.5.4Service Transition4.5.5Service Operation4.5.6Continual Service Improvement4.5.7IT Service Management Summary4.5.8 Risks of Service Orchestration4.5.9IT Service Management Risks4.6Outage Risks by Process Area4.6.1Validating Outage Attributability4.7Failure Detection Considerations4.7.1Hardware Failures4.7.2Programming Errors4.7.3Data Inconsistency and Errors4.7.4Redundancy Errors4.7.5System Power Failures4.7.6Network Errors4.7.7Application Protocol Errors4.8Risks of Deployment Models4.9Expectations of IaaS Data 6777778797980808283838585868686868787

xCONTENTS56RELIABILITY ANALYSIS OF VIRTUALIZATION5.1Reliability Analysis Techniques5.1.1Reliability Block Diagrams5.1.2Single Point of Failure Analysis5.1.3Failure Mode Effects Analysis5.2Reliability Analysis of Virtualization Techniques5.2.1Analysis of Full Virtualization5.2.2Analysis of OS Virtualization5.2.3Analysis of Paravirtualization5.2.4Analysis of VM Coresidency5.2.5Discussion5.3Software Failure Rate Analysis5.3.1Virtualization and Software Failure Rate5.3.2Hypervisor Failure Rate5.3.3Miscellaneous Software Risks of Virtualizationand Cloud5.4Recovery Models5.4.1Traditional Recovery Options5.4.2Virtualized Recovery Options5.4.3Discussion5.5Application Architecture Strategies5.5.1On-Demand Single-User Model5.5.2Single-User Daemon Model5.5.3Multiuser Server Model5.5.4Consolidated Server Model5.6Availability Modeling of Virtualized Recovery Options5.6.1Availability of Virtualized Simplex Architecture5.6.2Availability of Virtualized Redundant Architecture5.6.3Critical Failure Rate5.6.4Failure Coverage5.6.5Failure Detection Latency5.6.6Switchover Latency5.6.7Switchover Success Probability5.6.8Modeling and “Fast Failure”5.6.9Comparison of Native and Virtualized DeploymentsHARDWARE RELIABILITY, VIRTUALIZATION,AND SERVICE AVAILABILITY6.1Hardware Downtime Expectations6.2Hardware Failures6.3Hardware Failure 6117119

xiCONTENTS6.46.56.66.7Hardware Failure DetectionHardware Failure ContainmentHardware Failure MitigationMitigating Hardware Failures via Virtualization6.7.1Virtual CPU6.7.2Virtual Memory6.7.3Virtual StorageVirtualized Networks6.8.1Virtual Network Interface Cards6.8.2Virtual Local Area Networks6.8.3 Virtual IP Addresses6.8.4Virtual Private NetworksMTTR of Virtualized 9129129131CAPACITY AND ELASTICITY7.1System Load Basics7.1.1Extraordinary Event Considerations7.1.2Slashdot Effect7.2Overload, Service Reliability, and Service Availability7.3Traditional Capacity Planning7.4Cloud and Capacity7.4.1Nominal Cloud Capacity Model7.4.2Elasticity Expectations7.5Managing Online Capacity7.5.1Capacity Planning Assumptions of CloudComputing7.6Capacity-Related Service Risks7.6.1Elasticity and Elasticity Failure7.6.2Partial Capacity Failure7.6.3Service Latency Risk7.6.4Capacity Impairments and Service Reliability7.7Capacity Management Risks7.7.1Brittle Application Architecture7.7.2Faulty or Inadequate Monitoring Data7.7.3Faulty Capacity Decisions7.7.4Unreliable Capacity Growth7.7.5Unreliable Capacity Degrowth7.7.6Inadequate Slew Rate7.7.7Tardy Capacity Management Decisions7.7.8Resource Stock Out Not 5147147149150152153154155155155156156156157

xiiCONTENTS7.87.9897.7.9Cloud Burst Fails7.7.10 Policy ConstraintsSecurity and Service Availability7.8.1Security Risk to Service Availability7.8.2Denial of Service Attacks7.8.3Defending against DoS Attacks7.8.4Quantifying Service Availability Impactof Security Attacks7.8.5RecommendationsArchitecting for Elastic Growth and DegrowthSERVICE ORCHESTRATION ANALYSIS8.1Service Orchestration Definition8.2Policy-Based Management8.2.1 The Role of SLRs8.2.2Service Reliability and AvailabilityMeasurements8.3Cloud Management8.3.1Role of Rapid Elasticity in Cloud Management8.3.2Role of Cloud Bursting in Cloud Management8.4Service Orchestration’s Role in Risk ory8.4.4Security8.5SummaryGEOGRAPHIC DISTRIBUTION, GEOREDUNDANCY,AND DISASTER RECOVERY9.1Geographic Distribution versus Georedundancy9.2Traditional Disaster Recovery9.3Virtualization and Disaster Recovery9.4Cloud Computing and Disaster Recovery9.5Georedundancy Recovery Models9.6Cloud and Traditional Collateral Benefitsof Georedundancy9.6.1 Reduced Planned Downtime9.6.2 Mitigate Catastrophic Network Element Failures9.6.3 Mitigate Extended Uncovered and DuplexFailure 7178180180180181181182

xiiiCONTENTSIII RECOMMENDATIONS10 APPLICATIONS, SOLUTIONS, AND ation Configuration ScenariosApplication Deployment ScenarioSystem Downtime Budgets10.3.1 Traditional System Downtime Budget10.3.2 Virtualized Application Downtime Budget10.3.3 IaaS Hardware Downtime Expectations10.3.4 Cloud-Based Application Downtime Budget10.3.5 SummaryEnd-to-End Solutions Considerations10.4.1 What is an End-to-End Solution?10.4.2 Consumer-Specific Architectures10.4.3 Data Center RedundancyAttributability for Service ImpairmentsSolution Service Measurement10.6.1 Service Availability Measurement PointsManaging Reliability and Service of CloudComputingRECOMMENDATIONS FOR ARCHITECTINGA RELIABLE SYSTEM11.1Architecting for Virtualization and Cloud11.1.1 Mapping Software into VMs11.1.2 Service Load Distribution11.1.3 Data Management11.1.4 Software Redundancy and High AvailabilityMechanisms11.1.5 Rapid Elasticity11.1.6 Overload Control11.1.7 Coresidency11.1.8 Multitenancy11.1.9 Isochronal Applications11.2Disaster Recovery11.3IT Service Management Considerations11.3.1 Software Upgrade and Patch11.3.2 Service Transition Activity EffectAnalysis11.3.3 Mitigating Service Transition ActivityEffects via VM Migration11.3.4 Testing Service Transition 217217218219221

xivCONTENTS11.411.511.61211.3.5 Minimizing Procedural Errors11.3.6 Service Orchestration ConsiderationsMany Distributed Clouds versus Fewer Huge CloudsMinimizing Hardware-Attributed Downtime11.5.1 Hardware Downtime in Traditional HighAvailability ConfigurationsArchitectural Optimizations11.6.1 Reliability and Availability Criteria11.6.2 Optimizing Accessibility11.6.3 Optimizing High Availability, Retainability, Reliability,and Quality11.6.4 Optimizing Disaster Recovery11.6.5 Operational Considerations11.6.6 Case Study11.6.7 Theoretically Optimal Application 1DESIGN FOR RELIABILITY OF VIRTUALIZED APPLICATIONS12.1Design for 2.412.5Tailoring DfR for Virtualized Applications12.2.1 Hardware Independence Usage Scenario12.2.2 Server Consolidation Usage Scenario12.2.3 Multitenant Usage Scenario12.2.4 Virtual Appliance Usage Scenario12.2.5 Cloud Deployment Usage ScenarioReliability Requirements12.3.1 General Availability Requirements12.3.2 Service Reliability and LatencyRequirements12.3.3 Overload Requirements12.3.4 Online Capacity Growth and Degrowth12.3.5 (Virtualization) Live Migration Requirements12.3.6 System Transition Activity Requirements12.3.7 Georedundancy and Service ContinuityRequirementsQualitative Reliability Analysis12.4.1 SPOF Analysis for Virtualized Applications12.4.2 Failure Mode Effects Analysis for VirtualizedApplications12.4.3 Capacity Growth and Degrowth AnalysisQuantitative Reliability Budgeting and Modeling12.5.1 Availability (Downtime) Modeling12.5.2 Converging Downtime Budgets and Targets12.5.3 Managing Maintenance Budget 60

xvCONTENTS12.612.712.812.912.1013Robustness Testing12.6.1 Baseline Robustness Testing12.6.2 Advanced Topic: Can Virtualization Enable BetterRobustness Testing?Stability TestingField Performance AnalysisReliability RoadmapHardware ReliabilityDESIGN FOR RELIABILITY OF CLOUD SOLUTIONS13.1Solution Design for Reliability13.2Solution Scope and Expectations13.3Reliability Requirements13.3.1 Solution Availability Requirements13.3.2 Solution Reliability Requirements13.3.3 Disaster Recovery Requirements13.3.4 Elasticity Requirements13.3.5 Specifying Configuration Parameters13.4Solution Modeling and Analysis13.4.1 Reliability Block Diagram of Cloud DataCenter Deployment13.4.2 Solution Failure Mode Effects Analysis13.4.3 Solution Service Transition Activity Effects Analysis13.4.4 Cloud Data Center Service Availability(MP 2) Analysis13.4.5 Aggregate Service Availability (MP 3) Modeling13.4.6 Recovery Point Objective Analysis13.5Element Reliability Diligence13.6Solution Testing and Validation13.6.1 Robustness Testing13.6.2 Service Reliability Testing13.6.3 Georedundancy Testing13.6.4 Elasticity and Orchestration Testing13.6.5 Stability Testing13.6.6 In Service Testing13.7Track and Analyze Field Performance13.7.1 Cloud Service Measurements13.7.2 Solution Reliability Roadmapping13.8Other Solution Reliability Diligence Topics13.8.1 Service-Level Agreements13.8.2 Cloud Service Provider Selection13.8.3 Written Reliability 289291292292293293

xvi14CONTENTSSUMMARY14.1Service Reliability and Service Availability14.2Failure Accountability and Cloud Computing14.3Factoring Service Downtime14.4Service Availability Measurement Points14.5Cloud Capacity and Elasticity Considerations14.6Maximizing Service Availability14.6.1 Reducing Product Attributable Downtime14.6.2 Reducing Data Center Attributable Downtime14.6.3 Reducing IT Service Management Downtime14.6.4 Reducing Disaster Recovery Downtime14.6.5 Optimal Cloud Service Availability14.7Reliability Diligence14.8Concluding bbreviations311References314About the Authors318Index319

1.21.32.12.22.32.42.52.63.13.23.33.43.53.6Figure 3.7Figure 3.8Figure e3.103.113.123.133.143.153.164.1Figure 4.2Service ModelsOpenCrowd’s Cloud TaxonomyRoles in Cloud ComputingVirtualizing ResourcesType 1 and Type 2 HypervisorsFull VirtualizationParavirtualizationOperating System VirtualizationVirtualized Machine Lifecycle State TransitionsFault Activation and FailuresMinimum Chargeable Service DisruptionEight-Ingredient (“8i”) FrameworkEight-Ingredient plus Data plus Disaster (8i 2d) ModelMTBF and MTTRService and Network Element Impact Outages of RedundantSystemsSample DSL SolutionTransaction Latency Distribution for Sample ServiceRequirements Overlaid on Service Latency Distributionfor Sample SolutionMaximum Acceptable Service LatencyDowntime of Simplex SystemsDowntime of Redundant SystemsSimplified View of High AvailabilityHigh Availability ExampleDisaster Recovery ObjectivesITU-T G.114 Bearer Delay GuidelineTL 9000 Outage Attributability Overlaid on Augmented8i 2d FrameworkOutage Responsibilities Overlaid on Cloud 8i 2d 1545557617273xvii

xviiiFIGURESFigure 4.3Figure 4.4Figure 4.5Figure 4.6Figure 4.7Figure 25.35.45.55.65.7Figure 5.8Figure 5.9Figure 5.15Figure 5.16Figure 5.17Figure 5.18Figure 6.1Figure 6.2Figure 6.3Figure 6.4Figure 7.1Figure 7.2Figure 7.3Figure 7.4ITIL Service Management VisualizationIT Service Management Activities to Minimize ServiceAvailability Risk8i 2d Attributability by Process or Best Practice AreasTraditional Error VectorsIaaS Provider Responsibilities for Traditional Error VectorsSoftware Supplier (and SaaS) Responsibilities for TraditionalError VectorsSample Reliability Block DiagramTraversal of Sample Reliability Block DiagramNominal System Reliability Block DiagramReliability Block Diagram of Full virtualizationReliability Block Diagram of OS VirtualizationReliability Block Diagram of ParavirtualizationReliability Block Diagram of Coresident ApplicationDeploymentCanonical Virtualization RBDLatency of Traditional Recovery OptionsTraditional Active-Standby Redundancy via ActiveVM VirtualizationReboot of a Virtual MachineReset of a Virtual MachineRedundancy via Paused VM VirtualizationRedundancy via Suspended VM VirtualizationNominal Recovery Latency of Virtualized and TraditionalOptionsServer Consolidation Using VirtualizationSimplified Simplex State DiagramDowntime Drivers for Redundancy PairsHardware Failure Rate QuestionsApplication Reliability Block Diagram with Virtual DevicesVirtual CPUVirtual NICSample Application Resource Utilization by Time of DayExample of Extraordinary Event Traffic SpikeThe Slashdot Effect: Traffic Load Over Time (in Hours)Offered Load, Service Reliability, and Service Availability ofa Traditional 06107110111112120124125128133134134135

xixFIGURESFigure 7.5Visualizing VM Growth Scenarios138Figure 7.6Nominal Capacity Model139Figure 7.7Implementation Architecture of Compute Capacity Model139Figure 7.8Orderly Reconfiguration of the Capacity Model140Figure 7.9Slew Rate of Square Wave Amplification141Figure 7.10Slew Rate of Rapid Elasticity142Figure 7.11Elasticity Timeline by ODCA SLA Level143Figure 7.12Capacity Management Process144Figure 7.13Successful Cloud Elasticity148Figure 7.14Elasticity Failure Model148Figure 7.15Virtualized Application Instance Failure Model150Figure 7.16Canonical Capacity Management Failure Scenarios154Figure 7.17ITU X.805 Security Dimensions, Planes, and Layers158Figure 7.18Leveraging Security and Network Infrastructure to MitigateOverload Risk161Figure 8.1Service Orchestration167Figure 8.2Example of Cloud Bursting170Figure 10.1Canonical Single Data Center Application DeploymentArchitecture188RBD of Sample Application on Blade-Based ServerHardware192Figure 10.3RBD of Sample Application on IaaS Platform192Figure 10.4Sample End-to-End Solution197Figure 10.5Sample Distributed Cloud Architecture199Figure 10.6Sample Recovery Scenario in Distributed CloudArchitecture200Simplified Responsibilities for a Canonical CloudApplication203Figure 10.8Recommended Cloud-Related Service AvailabilityMeasurement Points205Figure 10.9Canonical Example of MP 1 and MP 2206Figure 10.10End-to-End Service Availability Key Quality Indicators207Figure 11.1Virtual Machine Live Migration219Figure 11.2Active–Standby Markov Model227Figure 11.3Pie Chart of Canonical Hardware Downtime Prediction228Figure 11.4RBD for the Hypothetical Web Server Application237Figure 11.5Horizontal Growth of Hypothetical Application238Figure 10.2Figure 10.7

811.911.10Figure 12.1Figure 12.2Figure 12.3Figure 12.4Figure 12.5Figure 13.1Figure 13.2Figure 13.3Figure 13.4Figure 13.5Figure 13.6Figure 14.1Figure 14.2Figure 14.3Figure 14.4Figure 14.5Figure 14.6Figure 14.7Outgrowth of Hypothetical ApplicationAggressive Protocol Retry StrategyData Replication of Hypothetical ApplicationDisaster Recovery of Hypothetical ApplicationOptimal Availability Architecture of HypotheticalApplicationTraditional Design for Reliability ProcessMapping Virtual Machines across HypervisorsA Virtualized Server Failure ScenarioRobustness Testing Vectors for Virtualized ApplicationsSystem Design for Reliability as a Deming CycleSolution Design for ReliabilitySample Solution Scope and KQI ExpectationsSample Cloud Data Center RBDEstimating MP 2Modeling Cloud-Based Solution with Client-InitiatedRecovery ModelClient-Initiated Recovery ModelFailure Impact Duration and High Availability GoalsEight-Ingredient Plus Data Plus Disaster (8i 2d) ModelTraditional Outage AttributabilitySample Outage Accountability Model for Cloud ComputingOutage Responsibilities of Cloud by ProcessMeasurement Pointss (MPs) 1, 2, 3, and 4Design for Reliability of Cloud-Based 1283283298299300301302305310

14.2Comparison of Server Virtualization TechnologiesVirtual Machine Lifecycle TransitionsService Availability and Downtime RatingsMean Opinion ScoresODCA’s Data Center ClassificationODCA’s Data Center Service Availability Expectationsby ClassificationTable 5.1Example Failure Mode Effects AnalysisTable 5.2Failure Mode Effect Analysis Figure for CoresidentApplicationsTable 5.3Comparison of Nominal Software Availability ParametersTable 6.1Example of Hardware Availability as a Functionof MTTR/MTTRSTable 7.1ODCA IaaS Elasticity ObjectivesTable 9.1ODCA IaaS Recoverability ObjectivesTable 10.1 Sample Traditional Five 9’s Downtime BudgetTable 10.2 Sample Basic Virtualized Five 9’s Downtime BudgetTable 10.3 Canonical Application-Attributable Cloud-Based Five 9’sDowntime BudgetTable 10.4 Evolution of Sample Downtime BudgetsTable 11.1 Example Service Transition Activity Failure ModeEffect AnalysisTable 11.2 Canonical Hardware Downtime PredictionTable 11.3 Summary of Hardware Downtime Mitigation Techniquesfor Cloud ComputingTable 12.1 Sample Service Latency and Reliability Requirements at MP 2Table 13.1 Sample Solution Latency and Reliability RequirementsTable 13.2 Modeling Input ParametersTable 14.1 Evolution of Sample Downtime 227231250276284304xxi

111.1Equation 11.2Basic Availability FormulaPractical System Availability FormulaStandard Availability FormulaEstimation of System Availability from MTBF and MTTRRecommended Service Availability FormulaSample Partial Outage CalculationService Reliability FormulaDPM FormulaConverting DPM to Service ReliabilityConverting Service Reliability to DPMSample DPM CalculationAvailability as a Function of MTBF/MTTRMaximum Theoretical Availability across RedundantElementsMaximum Theoretical Service Availability3535353638394444444445130241242xxiii

INTRODUCTIONCloud computing is a new paradigm for delivering information services to end users,offering distinct advantages over traditional IS/IT deployment models, including beingmore economical and offering a shorter time to market. Cloud computing is definedby a handful of essential characteristics: on-demand self service, broad network access,resource pooling, rapid elasticity, and measured service. Cloud providers offer a varietyof service models, including infrastructure as a service, platform as a service, andsoftware as a service; and cloud deployment options include private cloud, communitycloud, public cloud and hybrid clouds. End users naturally expect services offered viacloud computing to deliver at least the same service reliability and service availabilityas traditional service implementation models. This book analyzes the risks to cloudbased application deployments achieving the same service reliability and availabilityas traditional deployments, as well as opportunities to improve service reliability andavailability via cloud deployment. We consider the service reliability and service availability risks from the fundamental definition of cloud computing—the essential characteristics—rather than focusing on any particular virtualization hypervisor software orcloud service offering. Thus, the insights of this higher level analysis and the recommendations should apply to all cloud service offerings and application deployments.This book also offers recommendations on architecture, testing, and engineering diligence to assure that cloud deployed applications meet users’ expectations for servicereliability and service availability.Virtualization technology enables enterprises to move their existing applicationsfrom traditional deployment scenarios in which applications are installed directly onnative hardware to more evolved scenarios that include hardware independence andserver consolidation. Use of virtualization technology is a common characteristic ofcloud computing that enables cloud service providers to better manage usage of theirresource pools by multiple cloud consumers. This book also considers the reliabilityand availability risks along this evolutionary path to guide enterprises planning theevolution of their application to virtualization and on to full cloud computing enablement over several releases.AUDIENCEThe book is intended for IS/IT system and solution architects, developers, and engineers, as well as technical sales, product management, and quality managementprofessionals.xxv

xxviINTRODUCTIONORGANIZATIONThe book is organized into three parts: Part I, “Basics,” Part II, “Analysis,” and PartIII—,“Recommendations.” Part I, “Basics,” defines key terms and concepts of cloudcomputing, virtualization, service reliability, and service availability. Part I containsthree chapters: Chapter 1, “Cloud Computing.” This book uses the cloud terminology andtaxonomy defined by the U.S. National Institute of Standards and Technology.This chapter defines cloud computing and reviews the essential and commoncharacteristics of cloud computing. Standard service and deployment models ofcloud computing are reviewed, as well as roles of key cloud-related actors. Keybenefits and risks of cloud computing are summarized.Chapter 2, “Virtualization.” Virtualization is a common characteristic of cloudcomputing. This chapter reviews virtualization technology, offers architecturalmodels for virtualization that will be analyzed, and compares and contrasts “virtualized” applications to “native” applications.Chapter 3, “Service Reliability and Service Availability.” This chapter definesservice reliability and availability concepts, reviews how those metrics are measured in traditional deployments, and how they apply to virtualized and cloudbased deployments. As the telecommunications industry has very precise standards for quantification of service availability and service reliability measurements, concepts and terminology from the telecom industry will be presented inthis chapter and used in Part II, “Analysis,” and Part III, “Recommendations.”Part II, “Analysis,” methodically analyzes the service reliability and availability risksinherent in application deployments on cloud computing and virtualization technologybased on the essential and common characteristics given in Part I. Chapter 4, “Analyzing Cloud Reliability and Availability.” Considers the servicereliability and service availability risks that are inherent to the essential andcommon characteristics, service model, and deployment model of cloud computing. This includes implications of service transition activities, elasticity, andservice orchestration. Identified risks are analyzed in detail in subsequent chapters in Part II.Chapter 5, “Reliability Analysis of Virtualization.” Analyzes full virtualization,OS virtualization, paravirtualization, and server virtualization and coresidencyusing standard reliability analysis methodologies. This chapter also analyzes thesoftware reliability risks of virtualization and cloud computing.Chapter 6, “Hardware Reliability, Virtualization, and Service Availability.” Thischapter considers how hardware reliability risks and responsibilities shift asapplications migrate to virtualized and cloud-based hardware platforms, and howhardware attributed service downtime is determined.Chapter 7, “Capacity and Elasticity.” The essential cloud characteristic ofrapid elasticity enables cloud consumers to dispense with the business risk of

INTRODUCTION xxviilocking-in resources weeks or months ahead of demand. Rapid elasticity does,however, introduce new risks to service quality, reliability, and availability thatmust be carefully managed.Chapter 8, “Service Orchestration Analysis.” Service orchestration automatesvarious aspects of IT service management, espe

7.4 Cloud and Capacity 137 7.4.1 Nominal Cloud Capacity Model 138 7.4.2 Elasticity Expectations 141 7.5 Managing Online Capacity 144 7.5.1 Capacity Planning Assumptions of Cloud Computing 145 7.6 Capacity-Related Service Risks 147 7.6.1 Elasticity and Elasticity Failure 147 7.6.2 Partial Capacity Failure 149 7.6.3 Service Latency Risk 150

Related Documents:

Test-Retest Reliability Alternate Form Reliability Criterion-Referenced Reliability Inter-rater reliability 4. Reliability of Composite Scores Reliability of Sum of Scores Reliability of Difference Scores Reliability

sites cloud mobile cloud social network iot cloud developer cloud java cloud node.js cloud app builder cloud cloud ng cloud cs oud database cloudinfrastructureexadata cloud database backup cloud block storage object storage compute nosql

Reliability Infrastructure: Supply Chain Mgmt. and Assessment Design for reliability: Virtual Qualification Software Design Tools Test & Qualification for reliability: Accelerated Stress Tests Quality Assurance System level Reliability Forecasting: FMEA/FMECA Reliability aggregation Manufacturing for reliability: Process design Process variability

FlexPod Hybrid Cloud for Google Cloud Platform with NetApp Cloud Volumes ONTAP and Cisco Intersight TR-4939: FlexPod Hybrid Cloud for Google Cloud Platform with NetApp Cloud Volumes ONTAP and Cisco Intersight Ruchika Lahoti, NetApp Introduction Protecting data with disaster recovery (DR) is a critical goal for businesses continuity. DR allows .

FAA Reliability, Maintainability, and Availability (RMA) Handbook FAA RMA-HDBK-006B i U.S. Department of Transportation Federal Aviation Administration Reliability, Maintainability, and Availability (RMA) Handbook May 30, 2014 FAA RMA-HDBK-006B Federal Aviation Admini

The availability of a PV plant is highly dependent upon the system reliability of the inverter. Systems engineering for PV inverters is accomplished by first performing top down design-for-reliability (DfR) principles including fault tree analysis & reliability prediction methods which result in subsystem reliability allocations.

posing system reliability into component reliability in a deterministic manner (i.e., series or parallel systems). Consequentially, any popular reliability analysis tools such as Fault Tree and Reliability Block Diagram are inadequate. In order to overcome the challenge, this dissertation focuses on modeling system reliability structure using

Evidence Brief: Implementation of HRO Principles Evidence Synthesis Program. 1. EXECUTIVE SUMMARY . High Reliability Organizations (HROs) are organizations that achieve safety, quality, and efficiency goals by employing 5 central principles: (1) sensitivity to operations (ie, heightenedFile Size: 401KBPage Count: 38Explore furtherVHA's HRO journey officially begins - VHA National Center .www.patientsafety.va.govHigh-Reliability Organizations in Healthcare: Frameworkwww.healthcatalyst.comSupporting the VA’s high reliability organization .gcn.com5 Principles of a High Reliability Organization (HRO)blog.kainexus.com5 Traits of High Reliability Organizations: How to .www.beckershospitalreview.comRecommended to you b