1、Systems Research,Barbara Liskov October 2007,Replication,Goal: provide reliability and availability by storing information at several nodes,Single Server,Server,Clients,Single Server,Server,Clients,X,Replicated Servers,Servers,Clients,X,Replication Issues,Semantics What is being replicated Failure a
2、ssumptions,Issue 1: Semantics,One-copy consistency Or weaker,Servers,Clients,Issue 2: Type of Operations,Only reads and writes General operationsacct.deposit($); acct.withdraw($);,Replication protocols,Data replication Quorums and voting Operations State machine replication System performs a sequenc
3、e of operations,Issue 3: Failure Assumptions,Network is asynchronous Eventual delivery Network is malicious Corruption Replay Spoofing Handled via cryptographyNodes are failstop or Byzantine,Failstop Failures,Nodes fail by crashing A machine is either working correctly or it is doing nothing!The ass
4、umption made in the 1980s,Failstop failures,Requires 2f+1 replicas Operations must intersect at at least one replica In general want availability for both reads and writes: f+1 nodes is sufficient Read and write quorums,Quorums,Servers,Clients,State:,State:,State:,write A,write A,write A,X,Quorums,S
5、ervers,Clients,State:,State:,State:,A,A,X,Quorums,Servers,Clients,State:,State:,A,write B,write B,write B,X,State:,A,X,Data Replication,R.H. Thomas, A majority consensus approach to concurrency control for multiple copy databases, ACM TODS, 1979 D.K. Gifford, Weighted voting for replicated data, SOS
6、P 1979 H. Attiya, A. Bar-Noy, and D. Dolev, Sharing memory robustly in message-passing systems, JACM , Jan. 1995,Quorum Consensus,Each data item has a version number A sequence of values write(d, val, v#) Waits for f+1 oks read(d) returns (val, v#) Waits for f+1 matching v#s Else does a write-back,S
7、tate Machine Replication,Replicas must execute operations in the same orderImplies replicas will have the same state, assuming replicas start in the same state operations are deterministic,Failstop Replication,Viewstamped replication: a new primary copy method to support highly available distributed
8、 systems, B. Oki and B. Liskov, PODC 1988 Thesis, May 1988 Replication in the Harp file system, S. Ghemawat et. al, SOSP 1991 The part-time parliament, L. Lamport, TOCS 1998 Paxos made simple, L. Lamport, Nov. 2001,Approach,Use a primary It orders the operations Other replicas obey this order,Views,
9、System moves through a sequence of views Primary runs the protocol Replicas watch the primary and do a view change if it fails,Normal Case,Client sends request to primary Primary sends prepare message,Normal Case,Client sends request to primary Primary sends prepare message Replicas receive prepare
10、Send prepare-ok message to the primary,Normal Case,Client sends request to primary Primary sends prepare message to all Replicas receive prepare Send prepare-ok message to the primary Primary waits for f prepare-oks Sends response to client,Normal Case,A 2-phase protocol: Prepare; commit Only 3 mess
11、age delays,Byzantine Failures,Nodes fail arbitrarily They lie, they colludeCauses Malicious attacks Non-deterministic software errors,Quorums,3f+1 replicas are needed to survive f failures 2f+1 replicas is a quorum Insures intersectionThe minimum in an asynchronous network,State:,A,State:,A,State:,A
12、,State:,Quorums,Servers,Clients,write A,write A,X,write A,write A,State:,A,State:,A,B,State:,B,State:,B,Quorums,Servers,Clients,write B,write B,X,write B,write B,BFT,M. Castro and B. Liskov, Practical Byzantine faulty tolerance and proactive recovery, ACM TOCS, 2002,Strategy,Primary runs the protoco
13、l in the normal case Replicas watch the primary and do a view change if it failsKey difference: replicas might lie Solution: add a pre-prepare phase,Normal Case,Client sends request to primary,Normal Case,Client sends request to primary Primary sends pre-prepare message to all,Normal Case,Client sen
14、ds request to primary Primary sends pre-prepare message to allWhy not a prepare message? Because primary might be malicious,Normal Case,Client sends request to primary Primary sends pre-prepare message to all Replicas check the pre-prepare and if it is ok: Send prepare messages to all,Normal Case,Re
15、plicas wait for 2f+1 matching prepares Send commit message to all,Normal Case,Replicas wait for 2f+1 matching prepares Send commit message to all Replicas wait for 2f+1 matching commits Execute operation and send result to client,Follow-on Work,BASE: using abstraction to improve fault tolerance, R.
16、Rodrigo et al, SOSP 2001 R.Kotla and M. Dahlin, High Throughput Byzantine Fault tolerance. DSN 2004 J. Li and D. Mazieres, Beyond one-third faulty replicas in Byzantine fault tolerant systems, NSDI 07 Abd-El-Malek et al, Fault-scalable Byzantine fault-tolerant services, SOSP 05 HQ replications: a hy
17、brid quorum protocol for Byzantine Fault tolerance, OSDI 06,Papers in SOSP 07,Monday 1:30-3:30 Zyzzyva: Speculative Byzantine fault tolerance Tolerating Byzantine faults in database systems using commit barrier scheduling Low-overhead Byzantine fault-tolerant storage Attested append-only memory: making adversaries stick to their word Tuesday: 11:00-12:00 PeerReview: practical accountability for distributed systems,
copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1