Disconnected Operation In The Coda File System

7m ago
11 Views
0 Downloads
1.52 MB
23 Pages
Last View : 26d ago
Last Download : n/a
Upload by : Wren Viola
Share:
Transcription

DisconnectedSystemJAMESJ. KISTLERCarnegieMellonOperationin the Coda Fileand M. SATYANARAYANANUniversityDisconnectedoperationdata duringtemporaryis a mode of operation that enables a client to continue accessing criticalfailures of a shared data repository. An important,though not exclusive,application of disconnected operation is in supporting portable computers. In this paper, we showthat disconnected operation is feasible, efficient and usable by describing its design and implementationin the Coda File System. The central idea behind our work is that caching of data,now widely used for performance, can also be exploited to improve [OperatingSystems]:Systems]: Reliability—Management: —distributedfile systems; D.4.5 [OperatingD.4.8 [OperatingSystems]: Performance —nzeasurementsGeneralTerms: Design, Experimentation,AdditionalKey Words and Phrases:reintegration,second-class replication,Measurement,Performance,Disconnected operation,server ding,optimisticreplication,1. INTRODUCTIONEvery serious user of a distributedsystem has faced situationswork has been impededby a remotefailure.His frustrationacute when his workstationis powerfulhas been configuredto be dependentinst anteof such dependencePlacingusers,growingdataandis the use of datain a distributedallowspopularitythemenough to be usedon remoteresources.fileto delegateof distributedfilefromstandalone,butAn importanta swhere criticalis particularlycollaborationsuch as NFSof thatfilesyst em.betweendata.[161 and AFSThe[191This work was supported by the Defense Advanced Research Projects Agency (Avionics Lab,Wright Research and DevelopmentCenter, AeronauticalSystems Division(AFSC), U.S. AirForce, Wright-PattersonAFB, Ohio, 45433-6543 under Contract F33615-90-C-1465, ARPA Order7597), NationalScience Foundation(PYI Award and Grant ECD 8907068), IBM Corporation(Faculty DevelopmentAward, Graduate Fellowship,and Research InitiationGrant), DigitalEquipmentCorporation(External Research Project Grant), and Bellcore (InformationNetworking Research Grant).Authors’address: School of ComputerScience, Carnegie Mellon University,Pittsburgh,PA15213.Permission to copy without fee all or part of this material is granted provided that the copies arenot made or distributedfor direct commercial advantage, the ACM copyright notice and the titleof the publicationand its date appear, and notice is given that copying is by permission of theAssociation for Computing Machinery.To copy otherwise, or to republish, requires a fee and/orspecific per[email protected] 1992 ACM 0734-2071/92/0200-0003 01.50ACM Transactionson Computer Systems, Vol. 10, No. 1, February 1992, Pages 3-25.

J J. Kistler and M, Satyanarayanan4.attests to the compellingusers of these systemscriticaljunctureHowmaynatureof these considerations.Unfortunately,have to accept the fact that a remotefailureseriouslycan we improvethe benefitswhenof a sitory,them.of affairs?Ideally,butis inaccessible.Wetheat awe wouldlikebe able to continuecallthelatterto enjoycriticalmodeworkof operationdisconnectedoperation,becauseit representsa temporarydeviationfromnormaloperationas a client of a shared repository.In this paper we show that disconnectedoperationin a file system is indeedfeasible,efficientand usable.The centralidea behindour workis thatcachingofexploitednowdata,to rmance,implementedcantion in the Coda File System at CarnegieMellonUniversity.Our initialexperiencewithCoda confirmsthe viabilityoperation.We haveto two days.successfullyoperatedFor a disconnectiondisconnectedof thisduration,and propagatingchangestypicallytakes100MBhas been adequatefor us during2. DESIGNCodaisservers.Unixandforan1 clientsThe designacademichalflastingoneof reconnectingA local disk ofof disconnection.thatsize shouldbeOVERVIEWdesigneduntrustedthe processof aboutworkday.beopera-of disconnectedfor periodsabouta minute.these periodsTrace-drivensimulationsindicatethat a diskadequatefor disconnectionslastinga typicalalsodisconnectedenvironmentanda muchis optimizedresearchconsistingsmallerof a largenumberfor the access and sharingenvironments.Itiscollectionof ntendedforapplicationsthat exhibithighlyEach Coda clienthas a localconcurrent,fine granularitydisk and can communicatedata access.with the serversover a high bandwidthnetwork.ily unable to communicatewithAt certaintimes, a client may be temporarsome or all of the servers. This may be due toa server or networkfailure,or due to the detachmentfrom the network.Clientsview Coda as a single,location-transparentsharedtem.serversThe Coda namespacelarityof subtreescalledis mappedvolumes.Atto individualeachof a portablefileclient,a cacheThe first mechanism,replicasat more thanserveroneallowsreplication,server.Thesetsys-( Venus)to achievevolumesof replicationhighto havesitesforisprotocolbased1Unixitsvolumeon callbacksis a trademarkACM Transactionsstorageof AT&Tgroup( VSG).[9] to guaranteeBell Telephonethatsubsetaof a VSG thatisaccessibleVSG( A VSG).The performancecurrentlyaccessibleis a client’scost of server replicationis kept low by cachingon disks at clientsandthroughthe use of parallelaccess protocols.Venus uses a cache coherencevolumeThefileat the granu-managerdynamicallyobtainsand caches volumemappings.Coda uses two distinct,but xclientan open fileLabson Computer Systems, Vol. 10, No 1, February1992,yieldsits latest

DisconnectedOperation m the Coda File System.Icopy in the AVSG.This guaranteeis providedby servers notifyingwhen their cached copies are no longer valid, each notificationbeingto as a ‘callbackbreak’.all AVSG sites,DisconnectedModificationsand rvices file system requestsSincecache missescannotapplicationVenuspropagatesdepictsa typicaland disconnectedEarlierdiscusscantlythis3. DESIGNAta highusedbydisconnected,by relyingsolely on the contentsof itsbe servicedor o servertransitions[18, 19] etweenserverends,Figure1replicationitsonlyour designdescribedserverattentionin thosetoareasfor disconnectedreplicationdisconnectedwherein depth.Inoperation.its presenceWehas amsmodificationsCoda paperscontrast,theclientsreferredin parallelAVSGtakestowhenare propagatedto missingVSG sites.second high availabilityCoda,as failureseffectin Coda5twowantedour l,we wishedstrategyoff-the-shelfto preserveforhighavailability.hardwarethroughoutby seamlesslytransparencyinte-gratingthe highavailabilitymechanismsof Coda intoa normalenvironment.At a more detailedlevel, other considerationsinfluencedour design.the advent of portableworkstations,includethe need to scale gracefully,Thesetheresource,integrity,andvery differentclients and servers, and the need to strikeaboutandWe examineconsistency.3.1each of thesesecuritya balanceissuesassumptionsmadeavailabilitybetweenin the sdistributedsystemstendancestor,AFS,had impressedratherthantreatinga priori,ittogrowuponinsize.Ourexperienceus the need to prepareas an afterthought[17].withfor growthWe broughtthisexperienceto bear upon Coda in two ways. First, we adopted certainmechanismsthatenhancescalability.Second,we drew upon a set of generalprinciplesto guide our design choices.An exampleof a mechanismwe adoptedfor scalabilityis callback-basedwhole-filecaching,offers thecache coherence.Anothersuch mechanismaddedoccuradvantageof a much simplerfailureon an open, neveron a read, write,substantiallypartial-file[1] wouldmodel:a cacheseek, or close.missThis,can onlyin turn,simplifiesthe implementationof disconnectedoperation.Acachingscheme such as thatof AFS-4[22], Echo [8] or MFShave complicatedour implementationand madedisconnectedoperationless transparent.A scalabilityprinciplethat has had considerableof functionalityon clientsrather thanthe placinginfluenceon our designservers. Only if integrityACM Transactions on Computer Systems, Volisor10, No 1, February 1992

6.J J. Kistler and M Satyanarayanan‘m. —lACM TransactIons on Computer Systems, Vol. 10, No 1, February1992

edscalabilityprinciplechange.Consequently,or agreementalgorithmsconsensusPowerful,bylargehavewe have adoptedwe have rejectednumberssuch as thaton the currentPortable3.2haveOperation in the Coda File Systemof nodes.thisprinciple.is the avoidanceofsystem-widestrategiesthat requireelectionForexample,used in Locus [23] that dependpartitionstate of the network.lightweightand compactto observehowlaptopa personwehaveon nodesavoidedachievingcomputersare commonplacewithin a shareduses such a machine.Typically,he identifiesthem from the shared file system into theWhenhe returns,he copiessubstantiallysimplifyuse a ite-backoperationcouldthe use of portableclients.Users would not have tospace whileisolated,nor wouldthey have to man-changeschampionapplicationThe use of portabledatafiles of interestand downloadslocal name space for use whilemodifiedsystem. Such a user is effectivelyperformingupon reconnection!Early in the design of Coda we realizeduallyviolated7WorkstationsIt is lefor disconnectedoperation.machinesalso gave us anothermachinesinsight.Thearefactathatpeople are able to operate for extendedperiods in isolationindicatesthat theyare quitegood at predictingtheirfuturefile access needs. This, in turn,suggeststhatitis reasonablecache for romHence Coda providesa singleassistanceinaugmentingoperation.caused by failurescaused by unpluggingmechanismto cope withare no differentportablecomputers.all disconnections.Ofcourse, there may be qualitativedifferences:user expectationsas wellextent of user cooperationare likelyto be differentin the two cases.First- vs. Second-Class3.3IfdisconnectedClientsunattendedas theReplicationoperationall? The answerto thisassumptionsmade aboutthewhyis serverreplicationquestiondependsclients and serversis feasible,criticallyin Coda.on theneededveryatdifferentare like appliances:they can be turnedoff at willand may befor long periods of time. They have limiteddisk storage capacity,theirsoftwareand hardwaremay be tamperedwith,and their owners maynot be diligentabout backingup the local disks. Serversare like publicutilities:they have much greaterdisk capacity,they are physicallysecure,and theyItare carefullyis ssand administeredto distinguishreplicas(i.e.,cacheby professionalbetweencopies)first-classon clients.staff.replicasonFirst-classreplicasare of higherquality:theyare more persistent,widelyknown,secure, available,completeand accurate.Second-classreplicas,in contrast,are inferioralong all these dimensions.Only by periodicrevalidationwithrespect to a first-classreplica can a second-class replica be useful.ACM Transactionson Computer Systems, Vol. 10, No. 1, February 1992.

J. J. Kistler and M. Satyanarayanan8.The functionandof a ica.Whenprotocolis to combineof a second-classdisconnected,replicathe qualitywithinfor degradation.thefaceability.HencequencyWhereasof little.itof is properlyit requirescostsof areplicathe qualityforsakesis importantqualitywhich it is contingentisthe greater the poten-preservesoperationof disconnecteda measureof last resort.Server replicationis expensiveDisconnectedreplicationdisconnectedtheof the second-classmay be degraded because the first-classreplica uponinaccessible.The longer the durationof disconnection,tialthe performancehardware.touseserverreplicationor not is thus a trade-offbetweenqualityand cost. Codapermita volumeto have a sole server replica.Therefore,an installationrely exclusivelyon disconnectedoperationif it so chooses.3.4Optimisticvs. PessimisticBy definition,a networkreplicaandallreplicacontrolto the designingstrategies,dangeroccurrence.A pessimisticallreadscontrolbytowardsa sthemoperationwouldof a oidsresolvingdisconnectedwouldfamiliesor by sand writessecond-classtwo[51, is thereforeA pessimisticpartitionedexclusivecontrolsuch controluntilby a disconnectedbetweenoptimisticAn optimisticof conflictsclientto acquiresharedordisconnection,and to eenandpessimisticby permittingattendantexistsby disallowingdoescanControlassociates.of disconnectedto a singleavailabilitypartitionits first-classoperationsand ct priorPossessiontoofreadingor writingat all other replicas.Possessionof sharedcontrolwouldallow readingatother replicas,but writes would still be forbiddeneverywhere.Acquiringcontrol prior to voluntarydisconnectionis relativelysimple. It ismoredifficultwhenhave to arbitratedisconnectionamongneededto makesystemcannota wisepredictis involuntary,multiplerequesters.decisionis davailable.actuallythesystemmaythe informationForuse theexample,object,thewhenthey would release control,or what the relativecosts of denyingthem accesswould be.Retainingcontroluntilreconnectionis acceptablein the case of briefdisconnections.But it is unacceptablein the case of extendeddisconnections.A disconnectedclient with shared controlof an object would force the rest ofthe system to defer all updates until it reconnected.With exclusivecontrol,itwould even preventother users from makinga copy of the object. Coercingthe clientto reconnectACM Transactionsmaynot be feasible,sinceits whereaboutson Computer Systems, Vol. 10, No 1, February1992maynot be

DisconnectedOperation in the Coda File System.9known.Thus, an entireuser communitycould be at the mercy of a singleerrantclient for an unboundedamountof time.Placinga time bound on exclusiveor shared control,as done in the case ofleases [7], avoids this problembut introducesothers. Once a lease expires,adisconnectedclient loses the abilityto access a cached object, even if no oneelse in thesystemdisconnectedalreadyis interestedoperationmadewhileAn optimisticdisconnectedwhichin it.is to providedisconnectedapproachclientThis,havein datesto be discarded.has its own disadvantages.maydefeatswithAn updatean updateat anothermadeat onedisconnectedorconnectedclient.For optimisticreplicationto be viable, the system has to bemore sophisticated.There needs to be machineryin the system for detectingconflicts,age andmanuallyfor automatingresolutionwhen possible,and for confiningdampreservingevidencefor manualrepair.Havingto repairconflictsviolatestransparency,is an annoyanceto users, and reduces theusabilityof the system.We chose optimisticreplicationbecausewe felt thatits strengthsandweaknessesbetter matchedour design goals. The dominantinfluenceon ourchoiceanwas the low degreeoptimisticstrategyof write-sharingwaslikelytotypicalleadtoof isticstrategywas also consistentwith our overallgoal of providingthehighestpossible availabilityof data.In principle,we could have chosen a pessimisticstrategyfor server replication even after choosingan optimisticstrategyfor disconnectedBut that would have reduced transparency,because a user wouldthe anomalyof beingable to updatedata whendisconnected,unableto do so whenthe n optimisticfromstrategythe user’sdata in his accessibleeveryoneelse in thatset ecome4. DETAILEDthroughoutstrategypresentsAt any time,Further,also applya uniformvisiblemodeland his updatesare immediatelyHis accessibleuniverseis usuallyofof thevisibletothe entireWhenfailuresoccur, his accessibleuniversehe can contact, and the set of clients that they, inthroughouthis shisuniverse.AND IMPLEMENTATIONIn describingour implementationclient since this is where muchthe physicalstructureof Venus,and Sectionsmanyto serverhe is able to read the latestIn the limit,whenhe is operatingconsistsof justhis machine.UponDESIGNtion of the serverSection 4.5.of the servers.of an optimisticperspective.serversand clients.to the set of serversturn,canaccessibleto a subsetin favoroperation.have facedbut beingof disconnectedof the complexityof a client,4.3 to 4.5supportneededoperation,we focus on thelies. Section 4.1 describesSection4.2 introducesthe majorstatesdiscuss these states in detail.A descripfor disconnectedACM Transactionsoperationis containedinon Computer Systems, Vol. 10, No. 1, February 1992.

10.J. J Kistler and M anStructureofa Coda client.Structureof thepartbutdebug.FigureVenuscomplexityof Venus,of the kernel.mance,wouldThe latterhavebeen2 illustratesinterceptsUnixwe madeapproachless portablethe high-levelfilesystemit a user-levelmayhaveaccess,are handledA systemdisconnectedMiniCache.If possible,thereturnedto the application.servicereturnsoperationentirelyby Venus.call on a Coda objectcallsviaor serveris forwardedfromourfor good odeperformanceoverheadMiniCacheto filtercontainsno supportreplication;by thetheseVnodeonoutforfunctionsinterfaceto thecall is serviced by the MiniCacheand controlOtherwise,the MiniCachecontactsVenusistothe call. This, in turn, may involvecontactingCoda servers. Controlfrom Venus via the MiniCacheto the f a Coda client.theMiniCachestate as a side effect.MiniCacheinitiatedby Venuson events such as callbackcriticalyieldedand considerablystructureinterface[10]. Since this interfaceimposes a heavyuser-levelcache managers,we use a tiny in-kernelmanykernel-Venusinteractions.The MiniCacheremoteto Codaserversimplementationstate changesbreaksfromconfirmthatthemayCodaalso beservers.MiniCacheis[211,StatesVenus operatesin one of threeFigure3 depicts these statesis normallyon the alertemulation,betweenandthem.in the hoardingstate, relyingon server replicationbutfor possible disconnection.Upon disconnection,it entersthe emulationstate and remainsUpon reconnection,Venus entersACM Transactionsstates: hoarding,and the transitionson Computer Systems, Voltherefor thethe reintegration10, No 1, Februarydurationstate,1992of disconnection.desynchronizesits

DisconnectedOperation in the Coda File System.11r)HoardingFig. 3. Venus states and transitions.When disconnected, Venus is in the emulationstate. Ittransmitsto reintegrationupon successful reconnectionto an AVSG member, and thence tohoarding, where it resumes connected operation.cachewithitsAVSG,andthenall volumesmay not be replicatedbe in differentstates with respectconditionsin the system.4.3revertstoacross theto differentthehoardingstate.same set of servers,volumes,dependingSinceVenus canon failureHoardingThe hoardingstatestateis to hoardnot its onlyis so ncepredictedwithbehavior,manageisits cache in a mannerand disconnectedoperation.a certainset of files is criticalthe implementationespe

Disconnected Operation in the Coda File System JAMES J. KISTLER and M. SATYANARAYANAN Carnegie Mellon University Disconnected operation is amode of operation that enables client to continue accessing critical data during temporary failures of a shared data repository.