Designing For Understandability: The Raft Consensus Algorithm

2y ago

25 Views

2 Downloads

774.61 KB

29 Pages

Last View : 6d ago

Last Download : 3m ago

Upload by : Ronan Garica

Report this link

Download PDF

Transcription

Designing for Understandability:the Raft Consensus AlgorithmDiego OngaroJohn OusterhoutStanford University

Algorithms Should Be Designed For ity!August 29, 2016The Raft Consensus AlgorithmSlide 2

Overview Consensus: Allows collection of machines to work as coherent group Continuous service, even if some machines fail Paxos has dominated discussion for 25 years Hard to understand Not complete enough for real implementations New consensus algorithm: Raft Primary design goal: understandability (intuition, ease of explanation) Complete foundation for implementation Different problem decomposition Results: User study shows Raft more understandable than Paxos Widespread adoptionAugust 29, 2016The Raft Consensus AlgorithmSlide 3

State Machine Responds to external stimuli Manages internal staterequest Examples: many storagesystems, services Clients resultMemcachedRAMCloudHDFS name node.August 29, 2016StateMachineThe Raft Consensus AlgorithmSlide 4

Replicated State MachineClientsz chineConsensusModuleStateMachineServersx 1Logy 3 x 4z xx 1Logy 3 x 4z xx 1Logy 3 x 4z x Replicated log ensures state machines execute same commands in same order Consensus module ensures proper log replication System makes progress as long as any majority of servers are up Failure model: delayed/lost messages, fail-stop (not Byzantine)August 29, 2016The Raft Consensus AlgorithmSlide 5

Paxos (Single Decree)ProposersAcceptorsChoose unique proposal #proposal # any previous?Majority? Select value forhighest proposal # returned;if none, choose own valueproposal # any previous?Majority? Value chosenAugust 29, 2016The Raft Consensus AlgorithmSlide 6

Paxos Problems Impenetrable: hard to develop intuitions Why does it work? What is the purpose of each phase? Incomplete Only agrees on single value Doesn’t address liveness Choosing proposal values? Cluster membership management? Inefficient Two rounds of messages to choose one value“The dirty little secret of the NSDIcommunity is that at most fivepeople really, truly understand everypart of Paxos :-)”— NSDI reviewer“There are significant gaps betweenthe description of the Paxosalgorithm and the needs of a realworld system . the final system willbe based on an unproven protocol”— Chubby authors No agreement on the detailsNot a good foundation for practical implementationsAugust 29, 2016The Raft Consensus AlgorithmSlide 7

Raft Challenge Is there a different consensus algorithm that’s easier tounderstand? Make design decisions based on understandability: Which approach is easier to explain? Techniques: Problem decomposition Minimize state space Handle multiple problems with a single mechanism Eliminate special cases Maximize coherence Minimize nondeterminismAugust 29, 2016The Raft Consensus AlgorithmSlide 8

Raft Decomposition1. Leader election: Select one server to act as leader Detect crashes, choose new leader2. Log replication (normal operation) Leader accepts commands from clients, appends to its log Leader replicates its log to other servers (overwrites inconsistencies)3. Safety Keep logs consistent Only servers with up-to-date logs can become leaderAugust 29, 2016The Raft Consensus AlgorithmSlide 9

Server States and RPCsstartFollowerdiscoverhighertermPassive (but expectsregular heartbeats)noheartbeatCandidateIssues RequestVote RPCsto get elected as leaderwinelectionLeaderAugust 29, 2016Issues AppendEntries RPCs: Replicate its log Heartbeats to maintain leadershipThe Raft Consensus AlgorithmSlide 10

TermsTerm 1Term 2Term 3Term 4Term 5timeElectionsNormalOperation At most 1 leader per termSplitVote Some terms have no leader (failed election) Each server maintains current term value (no global view) Exchanged in every RPC Peer has later term? Update term, revert to follower Incoming RPC has obsolete term? Reply with errorTerms identify obsolete informationAugust 29, 2016The Raft Consensus AlgorithmSlide 11

Leader ElectionBecome candidatetimeoutcurrentTerm ,vote for selfSend RequestVote RPCsto other serversvotes from majorityBecome leader,send heartbeatsAugust 29, 2016The Raft Consensus AlgorithmRPC from leaderBecomefollowerSlide 12

Election Correctness Safety: allow at most one winner per term Each server gives only one vote per term (persist on disk) Majority required to win electionB can’t alsoget majorityVoted forcandidate AServers Liveness: some candidate must eventually win Choose election timeouts randomly in [T, 2T] (e.g. 150-300 ms) One server usually times out and wins election before others time out Works well if T broadcast time Randomized approach simpler than rankingAugust 29, 2016The Raft Consensus AlgorithmSlide 13

Normal Operation Client sends command to leader Leader appends command to its log Leader sends AppendEntries RPCs to all followers Once new entry committed: Leader executes command in its state machine, returns result to client Leader notifies followers of committed entries in subsequent AppendEntries RPCs Followers execute committed commands in their state machines Crashed/slow followers? Leader retries AppendEntries RPCs until they succeed Optimal performance in common case: One successful RPC to any majority of serversAugust 29, 2016The Raft Consensus AlgorithmSlide 14

Log Structureterm123456789101112233333x 3 q 8 j 2 x q z 5 y 1 y 3 q j x 4 z 6commandlog indexleader for term 31112233x 3 q 8 j 2 x q z 5 y 1 y 31112233333x 3 q 8 j 2 x q z 5 y 1 y 3 q j x 4 z 611x 3 q 8followers11122333x 3 q 8 j 2 x q z 5 y 1 y 3 q jcommitted entries Must survive crashes (store on disk) Entry committed if safe to execute in state machines Replicated on majority of servers by leader of its termAugust 29, 2016The Raft Consensus AlgorithmSlide 15

Log InconsistenciesCrashes can result in log inconsistencies:123456789s111122333x 3 q 8 j 2 x q z 5 y 1 y 3 q js21112233x 3 q 8 j 2 x q z 5 y 1 y 3s3111223333x 3 q 8 j 2 x q z 5 y 1 y 3 q j x 4s411x 3 q 8s5111222222x 3 q 8 j 2 x q z 5 y 3 q j x 8 x 410log indexleader for term 4followersRaft minimizes special code for repairing inconsistencies: Leader assumes its log is correct Normal operation will repair all inconsistenciesAugust 29, 2016The Raft Consensus AlgorithmSlide 16

Log Matching PropertyGoal: high level of consistency between logs If log entries on different servers have same index and term: They store the same command The logs are identical in all preceding entries123456789101112233333x 3 q 8 j 2 x q z 5 y 1 y 3 q j x 4 z 611122344x 3 q 8 j 2 x q z 5 y 1 x z y 7 If a given entry is committed, all preceding entries are alsocommittedAugust 29, 2016The Raft Consensus AlgorithmSlide 17

AppendEntries Consistency Check AppendEntries RPCs include index, term of entry preceding new one(s) Follower must contain matching entry; otherwise it rejects request Leader retries with lower log index Implements an induction step, ensures Log Matching Property1leader:follower before:follower after:August 29, 201623412345123451123x 3 q 8 x q y 11123x 3 q 8 x q y 11123x 3 q 8 x q y 1112x 3 q 8 x q11111x 3 q 8 j 2 y 6 a x11111x 3 q 8 j 2 y 6 a x1123x 3 q 8 x q y 111111x 3 q 8 j 2 y 6 y 61123x 3 q 8 x q y 1Example #1: successExample #2: mismatchExample #3: successThe Raft Consensus AlgorithmSlide 18

Safety: Leader Completeness Once log entry committed, all futureleaders must store that entryLeader election for term 4: Servers with incomplete logs must notget elected: Candidates include index and term of lastlog entry in RequestVote RPCs Voting server denies vote if its log is moreup-to-date Logs ranked by lastTerm, lastIndex August 29, 2016The Raft Consensus Algorithm1 2 3 4 5 6 7 8 9s1 1 1 1 2 2 3 3 3s2 1 1 1 2 2 3 3s3 1 1 1 2 2 3 3 3 3s4 1 1 1 2 2 3 3 3s5 1 1 1 2 2 2 2 2 2Slide 19

Raft Evaluation Formal proof of safety Ongaro dissertation UW mechanically checked proof (50 klines) C implementation (2000 lines) 100’s of clusters deployed by Scale Computing Performance analysis of leader election Converges quickly even with 12-24 ms timeouts User study of understandabilityAugust 29, 2016The Raft Consensus AlgorithmSlide 20

User Study: Is Raft Simpler than Paxos? 43 students in 2 graduate OS classes (Berkeley and Stanford) Group 1: Raft video, Raft quiz, then Paxos video, Paxos quiz Group 2: Paxos video, Paxos quiz, then Raft video, Raft quiz Instructional videos: Same instructor (Ousterhout) Covered same functionality: consensus, replicated log, cluster reconfiguration Fleshed out missing pieces for Paxos Videos available on YouTube Quizzes: Questions in 3 general categories Same weightings for both tests Experiment favored Paxos slightly: 15 students had prior experience with PaxosAugust 29, 2016The Raft Consensus AlgorithmSlide 21

User Study ResultsAugust 29, 2016The Raft Consensus AlgorithmSlide 22

ImpactHard to publish:Widely adopted: Rejected 3 times at major 25 implementations before paperconferencespublished Finally published in USENIX ATC2014 Challenges: PCs uncomfortable withunderstandability as metric Hard to evaluate Complexity impresses PCsAugust 29, 2016 83 implementations currently listed onRaft home page 10 versions in production Taught in graduate OS classes MIT, Stanford, Washington, Harvard, Duke,Brown, Colorado, .The Raft Consensus AlgorithmSlide 23

Additional Information Other aspects of Raft (see paper or Ongaro dissertation): Communication with clients (linearizability) Cluster liveness Log truncation Other consensus algorithms: Viewstamped Replication (Oki & Liskov, MIT) ZooKeeper (Hunt, Konar, Junqueira, Read, Yahoo!)August 29, 2016The Raft Consensus AlgorithmSlide 24

Conclusions Understandability deserves more emphasis in algorithm design Decompose the problem Minimize state space Making a system simpler can have high impact Raft better than Paxos for teaching and implementation: Easier to understand More completeAugust 29, 2016The Raft Consensus AlgorithmSlide 25

Why “Raft”?ReplicatedAndFaultTolerantPaxosAugust 29, 2016The Raft Consensus AlgorithmSlide 26

Extra Slides

Raft Properties Election Safety: at most one leader can be elected in a given term Leader Append-Only: a leader never modifies or deletes entries in itslog Log Matching: if two logs contain an entry with the same index andterm, then the logs are identical in all entries up through the given index Leader Completeness: if a log entry is committed, then that entry willbe present in the logs of all future leaders State Machine Safety: if a server has applied a log entry at a givenindex to its state machine, no other server will ever apply a different logentry for the same indexAugust 29, 2016The Raft Consensus AlgorithmSlide 28

Leader Changes Logs may be inconsistentafter leader changelog indexleader forterm 8 No special steps by new Leader’s log is “the truth”1 1 1 4 4 5 5 6 6 6f1 1 1 1 4 4 5 5 6 6leader: Start normal operation Followers’ logs will eventuallymatch leader1 2 3 4 5 6 7 8 9 10 11 12f2 1 1 1 4possiblefollowersMissingEntriesf3 1 1 1 4 4 5 5 6 6 6 6f4 1 1 1 4 4 5 5 6 6 6 7 7f5 1 1 1 4 4 4 4ExtraneousEntriesf6 1 1 1 2 2 2 3 3 3 3August 29, 2016The Raft Consensus AlgorithmSlide 29

Log Consensus Module State Machine x 1 y 3 x 4 August 29, 2016 The Raft Consensus Algorithm Slide 5 Replicated State Machine Replicated log ensures state machines execute same commands in same order Consensus module ensures proper log replication System makes progress as long as any majority of servers are up Failure model: delayed/lost messages, fail-stop (not Byzantine)

Related Documents:

Nonprofit Self-Assessment Checklist

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

1.4K Views

2y ago

Name of thé élément in thé language and script of thé ... - UNESCO

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

105 Views

9m ago

Analysis of Raft & Pile Raft Foundation using Safe Software

analyses of un-piled and piled raft foundations with sandy soil. For the un-piled raft, the normalized settlement parameter (IR) for the raft sizes of 8mx8m and 15mx15m ranged as 1.03-1.17mm and 0.66- 0.83mm respectively. In the case of the piled raft with raft thickness of 0.25, 0.40, 0.80, 1.50, 3.0m, the corresponding maximum settlements

15 Views

1y ago

[Kl - Mauritius

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

463 Views

1y ago

Employee Benefits Event - Schneider Downs Tax Services

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

322 Views

1y ago

Study Investigating thè Effect of E- Service Quality on Customer's ...

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

116 Views

9m ago

Structural Behavior of Tall Building Raft Foundations in Earthquake Zones

, respectively. Raft foundations fail when the resistance of the raft is less than the action caused by the applied load. The raft resistance is measured using the design moment strength M c while the raft action is measured by the external bending moments M e as shown in Figure 1. The raft limit state function is given by the following equation:

18 Views

1y ago

ANALYSIS AND DESIGN OF RAFT FONDATION

analysis and design. For foundations of such high rise building, normally raft foundation, pile foundation or piled-raft foundation are used. The raft will be used for economical consideration. The raft foundation is a kind of

110 Views

2y ago

Recent Views

MANAGERIAL FINANCE - GBV

of Managerial Finance page 2 Introduction to Managerial Finance 1 Starbucks—A Taste for Growth page 3 1.1 Finance and Business What Is Finance? 4 Major Areas and Opportunities in Finance 4 Legal Forms of Business Organization 5 Why Study Managerial Finance? Review Questions 9 1.2 The Managerial Finance Function 9 Organization of the Finance

3y ago

6.8K Views

Chapter 1 The roles of finance function in organisations

The roles of the finance function in organisations 4. The role of ethics in the role of the finance function Ethics is the system of moral principles that examines the concept of right and wrong. Ethics underpins an organisation’s sustained value creation. The roles that the finance function performs should be carried out in an .File Size: 888KBPage Count: 10Explore furtherRole of the Finance Function in the Financial Management .www.managementstudyguide.c Roles and Responsibilities of a Finance Department in a .www.pharmapproach.comRoles and Responsibilities of a Finance Department .www.smythecpa.comTop 10 – Functions of Business Finance in an om23 Functions and Duties of Accounting and Finance nded to you b

1y ago

335 Views

2017-2018 GRANDE ÉCOLE MSc in MANAGEMENT

Descriptif des cours Course Outlines 10 Catalogue des cours/ Course Catalog 2017-2018 FIN: Finance/Finance A : Actuariat/Actuarial, Insurance E : Finance d’entreprise/Corporate Finance The course liste tables and the course outlines G : Finance générale/General Finance M : Finance de marché/Market Finance S : Synthèse/Synthesis IDS: Systèmes d’Information, Sciences de la Décision et .

3y ago

312 Views

Behavioral Finance and Wealth L Management

Introduction to Behavioral Finance CHAPTER1 What Is Behavioral Finance? Behavioral Finance: The Big Picture Standard Finance versus Behavioral Finance The Role of Behavioral Finance with Private Clients How Practical Application of Behavioral Finance Can Create a Successful Advisory Rel

2y ago

377 Views

Catalogue des Cours Course Catalog - ESSEC Business School

10 Catalogue des cours/Course Catalog 2021-2022 FIN: Finance/Finance E : Finance d'entreprise/Corporate Finance G : Finance générale/General Finance M : Finance de marché/Market Finance S : Synthèse/Synthesis IDS: Systèmes d'Information, Sciences de la Décision et Statistiques/ Information Systems, Decision Sciences and Statistics

1y ago

222 Views

SINGAPORE - Kelly Services

FINANCE Chief Financial Officer Degree/Master 15 20,000 25,000 Finance Assistant Diploma 1-3 2,800 3,400 Finance Controller Degree 10-15 10,000 18,000 Finance Director Degree 15 15,000 20,000 Finance Executive/ Senior Finance Executive Degree 2-5 3,000 6,000 Finance Manager/ Assistan

2y ago

527 Views

Ministries of Finance and Nationally Determined Contributions

Rodrigo Rojo, IDB Sr. Consultant and advisor to Ministry of Finance of Chile. Colombia German Romero Otalora and Laura Marcela Ruiz Daza — Office of the Vice-Minister — Ministry of Finance. Ireland Paul Ryan — International Finance Division — Ministry of Finance Sean Judge — Department of Finance — Ministry of Finance

1y ago

232 Views

Trade Finance & Supply Chain Finance Awards 2022

In February 2022, Global Finance will publish its annual selections for the World's Best Trade Finance and Supply Chain Finance Providers. Global Finance will name the best trade finance providers in more than 100 countries and territories, eight global regions and

1y ago

215 Views

McKinsey on Finance

finance and strategy 23 How M&A practitioners enable their success Perspectives on Corporate Finance and Strategy Number 56, Autumn 2015 Finance McKinsey on. McKinsey on Finance. is a quarterly publication written by corporate-finance experts and practitioners at McKinsey & Company. This publication offers readers insights into value-creating .

3y ago

272 Views

SAP Simple Finance - tutorialspoint

SAP Simple Finance is only known as S/4 HANA Finance and this will be the only name of other releases of SAP Simple Finance. During the installation of SAP S/4 HANA Finance, various front-end and back-end components get installed. 2. SAP Simple Finance Introduction

2y ago

252 Views

pwc Finance Function Transformation

PwC’s finance effectiveness framework looks at 3 core areas within finance, to frame a programme of work that makes the finance function more effective, and to increase its interaction with the business: Finance efficiency Risk, Compliance and Control Finance Insights (the key lever in

2y ago

285 Views

Sustainable Finance: A Primer and Recent Developments

Social (impact) finance RBC Wealth Management Green finance Resonance Fund Impact finance Bridges Fund Management Socially responsible finance Nutmeg . Source: Author's own research. Despite this variety of definitions, some consistency of terminology has coalesced around the construct of "sustainable finance" in terms of a range of

1y ago

151 Views

The International Finance Corporation's Blended Finance Operations

The International Finance Corporation's Blended Finance Operations . 1. Context. Blended finance is a risk mitigation tool applied to investments for which it is difficult to attract commercial funding. Blended finance refers to the combination of concessional and commercial funding in private sector-led projects. Its rationale is

1y ago

187 Views

Agile Finance Reimagined Reimagining Finance for the New Normal

6 Agile Finance Reimagined: Reimagining finance for the new normal While the global impact of COVID-19 is still evolving, this much is clear: finance functions have been forced to deliver more value to the business, beyond simply driving down costs. "We are seeing that shift from finance being focused on efficiency to effectiveness," said the

1y ago

130 Views

Oracle Banking Supply Chain Finance User Guide

Oracle Banking Supply Chain Finance User Guide 7 2. Supply Chain Finance - An Overview 2.1 Supply Chain Finance Supply Chain Finance commonly known as (SCF) is a type of supplier finance which enables the supplier to cash his receivables early than the actual payment date, thereby freeing up its working capital.

1y ago

132 Views

Designing For Understandability: The Raft Consensus Algorithm

It looks like you're using an ad-blocker