Self-Organization And Resilience For Networked Systems .

3y ago
9 Views
3 Downloads
2.40 MB
16 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Philip Renner
Transcription

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.Self-Organization andResilience for NetworkedSystems: Design Principlesand Open Research IssuesBy S IMON D OBSON , Senior Member IEEE, DAVID H UTCHISON , A NDREAS M AUTHE ,A LBERTO S CHAEFFER -F ILHO , Member IEEE, PAUL S MITH , AND JAMES P. G. S TERBENZABSTRACT Networked systems form the backbone of modernhand, through structural and operational resilience techniquessociety, underpinning critical infrastructures such as electric-to ensure that they can detect, defend against, and ultimatelyity, water, transport and commerce, and other essential ser-withstand challenges.vices (e.g., information, entertainment, and social networks).It is almost inconceivable to contemplate a future withouteven more dependence on them. Indeed, any unavailability ofKEYWORDS Autonomic communications; network resilience;programmable networks; self-organization; system resiliencesuch critical systems is—even for short periods—a rather bleakprospect. However, due to their increasing size and complexity, they also require some means of autonomic formation andself-organization. This paper identifies the design principlesand open research issues in the twin fields of self-organizationand resilience for networked systems. In combination, theyoffer the prospect of combating threats and allowing essentialservices that run on networked systems to continue operatingsatisfactorily. This will be achieved, on the one hand, throughthe (self-)adaptation of networked systems and, on the otherManuscript received July 30, 2018; revised January 11, 2019; acceptedJanuary 11, 2019. (Corresponding author: Andreas Mauthe.)S. Dobson is with the School of Computer Science, University of St Andrews, StAndrews KY169AJ, U.K. (e-mail: simon.dobson@andrews.ac.uk).D. Hutchison is with the School of Computing and Communications, LancasterUniversity, Lancaster LA14YW, U.K. (e-mail: d.hutchison@lancaster.ac.uk).A. Mauthe is with the Institut für Wirtschafts- und Verwaltungsinformatik,Universität Koblenz, 55118 Koblenz, Germany, and also with the Department ofElectrical Engineering and Information Technology, Technische UniversitätDarmstadt, 64289 Darmstadt, Germany (e-mail: tadt.de).A. Schaeffer-Filho is with the Institute of Informatics, Federal University of RioGrande do Sul, Porto Alegre 90040-060, Brazil (e-mail: alberto@inf.ufrgs.br).P. Smith is with the Center for Digital Safety and Security, AIT Austrian Instituteof Technology, 2444 Vienna, Austria (e-mail: paul.smith@ait.ac.at).J. P. G. Sterbenz is with the Department of Electrical Engineering andComputer Science, The University of Kansas, Lawrence, KS 66045 USA (e-mail:jpgs@ittc.ku.edu).Digital Object Identifier 10.1109/JPROC.2019.2894512I. I N T R O D U C T I O NToday’s world has become strongly dependent on networked computer systems, more through evolution andopportunism than through foresight and planning. Therapid adoption of Internet technologies has been nothingshort of astounding [1]—a process that was acceleratedby the advent of the World Wide Web, building on theubiquitous spread of the Internet’s network infrastructure.The resulting networked systems are now so common thatmany, especially in the younger generation, forget (or donot know) what life was like before their advent. Prominent examples of networked systems include utility networks (e.g., Smart Grid), industrial control systems (ICSs),the emerging Internet of Things (IoT), Industry 4.0,Cloud Computing, 5G, and Smart Cities. In these systems,the architecture is essentially that of a set of distributedservices operating over a communications network, wherethe services are characterized by the nature of the application or enterprise (whether this is IoT, ICS, 5G, etc.).As the ubiquity of networked systems has grown, so havetheir speed and complexity, and their necessity for manysocial processes—to the point that human managementis inadequate to the task of keeping the systems working. This has driven the imperative toward self-organizingsystems—and also self-managing, self-protecting, andother so-called “self-*” properties [2]—that allow the0018-9219 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.P ROCEEDINGS OF THE IEEE1

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.Dobson et al.: Self-Organization and Resilience for Networked Systemsnetwork (in the broadest sense) to adapt its own organization and behavior in pursuit of service-level goals. Selforganization can operate in pursuit of many different goalswhich may themselves be stated at a number of levels. Onemay seek to improve (or maintain) performance in theface of changing network conditions, or to integrate newdevices or access points, or to include new service variants.However, from a user’s perspective, these technical issuescan be subsumed under a goal of resilience: the systemcontinues to work according to the user’s expectationsregardless of changes that may themselves be hiddenanyway [3], [4].It is these twin entwined concepts—self-organization inpursuit of resilience—that are our topics in this paper. Wedefine self-organization as the techniques a system may useto change its detailed structure and behavior in responseto external stimuli (extrinsic challenges) or changes inrequirements (intrinsic challenges), in order to maintainservice levels which may themselves be modified as part ofthe adaptive process. The ability to change service levelsis a crucial part of this: some challenges are simply insurmountable and lead inevitably to user-visible degradation.We define resilience as the ability of the system to avoidthis extremity and continue to provide acceptable service.The goal of this paper is to identify the design principlesand open research issues in the combined fields of selforganization and resilience for networked systems.The challenges to self-organization and resilience areenormous. Critical services can be subject to natural disasters, third-party failures such as power outages, configuration, and other failures. The rise of cyberattacks addsa directed dimension to these challenges, and the range ofrecent attacks (including Stuxnet [5], the Mirai botnet [6],and WannaCry [7]) has led to a burgeoning cybersecurityindustry that supports a huge scientific and engineeringeffort to prevent attacks, develop mechanisms for ameliorating their effects, and provide forensic support for laterinvestigation [8]. The critical nature of many networkedsystems and their increasingly intimate effects on the livelihoods (and indeed lives) of an increasing number of peopleis leading users to mandate specific levels of resilience forcertain applications [9], [10].The remainder of this paper is organized as follows.Section II presents the necessary background on the independent research areas of self-organization and systemsresilience. Section III discusses the interplay between selforganization and resilience in the design of network systems architectures. Section IV outlines a number of openresearch challenges covering technical aspects such as therole of intent-based networking (IBN) and network function virtualization, and wider considerations such as thoserelating to the human element and the role of people inguaranteeing systems resilience. Finally, Section V presentsour concluding remarks.II. B A C K G R O U N DResilient self-organized networked systems researchdraws on several domains. We organize our brief2P ROCEEDINGS OF THE IEEEbackground review around two themes: the modeling andprovision of self-organization within networks and theadditional technology used to promote resilience.A. Self-OrganizationSelf-organizing systems have long been a subject ofresearch interest in computer science. The earliest example is possibly due to Dijkstra [11], who studied selfstabilizing systems that returned to a predictable state aftera perturbation. The name of this paper—“Self-stabilizingsystems in spite of distributed control” [our emphasis]—foreshadowed the difficulties facing anyone attempting toconstruct such systems.The core challenge of self-stabilization, and of selforganization in general, is that it is a global property:no single component can determine whether or not thesystem as a whole is in an acceptable state, especiallyin the presence of component or communication failure.Similar issues occur throughout science: one example isthe way in which murmurations of starlings form andevolve as the integral of a large number of individuallylocal decision processes, with the structure (flock) beingstable even as components (birds) move or dropout. It hasproven possible [12] to develop surprisingly simple algorithmic descriptions of these processes, and there is nowa significant class of biologically inspired approaches toself-organizing systems.Self-organization is increasingly important as networkcomplexity increases. The clearest examples come fromdomains in which there are frequent, spontaneous changesin device population, topology, services, and so on. Mobilead hoc networks (MANETs) were once limited to tacticalmilitary systems but now are essential in enabling IoTsystems and wireless sensor networks. They are also foundin home mesh networks and vehicular networks (VANETs).We repeatedly see that, in many cases, niche techniqueshave mainstreamed, leading to a more complex but moreindependent and self-organizing overall architecture. Thekey observation is that management is treated as a local(or at least nonglobal) task that is the responsibility of thenodes themselves, rather than being imposed from outside.An important phenomenon for our current purposesconcerns the behavior of networks under attack. These areoften studied within the framework of percolation theory:given a network and an attack that remove some proportion of nodes and/or edges, is there still a “giant component” that keeps a large fraction of nodes connected?The simplest of such attacks removes nodes or edges atrandom, and might more accurately be described as modeling typical failures; more structured attacks target thenodes with particular properties, such as the high-degreehubs or the low-degree bridges between otherwise sparselyconnected components. Internet core routers [13] exhibita powerlaw topology, giving the network a degree of “natural” or topological resilience to failure which nonethelessdoes not necessarily extend to resilience against informedand targeted attacks.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.Dobson et al.: Self-Organization and Resilience for Networked SystemsTopological techniques can be regarded as giving“spatial” resilience, in the sense that it is availablestatically whenever a challenge arises. This is perhaps themost stable form of resilience, since it depends only onthe system’s overall features. It is also possible to applymore dynamic techniques, either by adapting the network(perhaps activating additional links) or by adapting itsbehavior (perhaps providing new service instances orchanged protocols or security stances): a more “temporal”form of resilience that can use less resources undernormal circumstances and provide a more flexible array ofresponses [14].Within the systems community, the most influentialintellectual currents have undoubtedly been provided byresearch into autonomic computing defined by Kephartand Chess [15] as “computing systems that can managethemselves given high-level objectives from administrators.” Autonomic computing research has explored systemsthat are self-managing, self-configuring, self-optimizing,and possessing a range of other self-* properties. Theagenda has been structured around two parallel strands,one aimed at creating the proper ab initio architecturesfor self-* behaviors, and another aiming at adding self-*management behaviors to collections of existing services.Alongside autonomic computing has been a parallel effortin autonomic communications [2], [16] looking explicitlyat adding self-* properties to networks. For both computingand communications, much of the work has been centered around a closed-loop control architecture—variouslyreferred to as Monitor–Analyze–Plan–Execute over sharedKnowledge (MAPE-K) or Collect–Analyze–Decide–Act—that generalize standard control-theoretic approaches byusing various other mathematical formalisms, often addingflexibility at the expense of decidability.In autonomic networks the underlying idea is thatthe network structures can form in an autonomous fashion by enabling network nodes to parameterize andoperate independently using sensing and environmental awareness, and adaptation capabilities within thenodes and of protocols. Essentially, autonomous networks and their components should require little or nodirect intervention during set-up as well as runtime. Theylearn and adapt to changes in the environment whileproviding a stable, reliable, and secure communicationinfrastructure.The autonomic approach arose in response to the concerns of systems developers as well as network operators,and it is explicitly targeted at adding self-* properties toexisting systems rather than forcing the development ofnew systems purely to address self-* questions—legacysoftware is, after all, just software that has worked wellfor a long time. Although this may limit the functionalitythat can be developed, the ability to add managementfunctions (for example, by mining server logs to triggerreconfigurations) enormously reduces the startup cost ofself-organization.B. ResilienceResilience should now be considered a vital property ofsystems and networks. In [17], resilience is defined as aconcept associated with telecommunications systems andsupporting resources and defines their ability to resist theloss of capacity due to failures or foreseen overload. Thegoal is to optimize the availability and quality of serviceof a system and enable it to return to a previous normalcondition after a challenge subsided. The emphasis in [18]is on fault management and recovery methods, and howQuality of Resilience can be achieved through resiliencedifferentiated services. We define resilience as the ability ofa system or network to maintain acceptable levels of operation in the face of challenges, including malicious attacks,operational overload, misconfiguration, and equipmentfailures. Hence, resilience management encompasses thetraditional fault, configuration, accounting, performance,and security functionalities [19] and comprises structuralas well as wider context related considerations [18].Concerns about the dependability and resilience ofcomputer systems date back to the earliest days ofcomputing. Within the networking context they, forinstance, motivated the early Advanced Research ProjectsAgency Network (ARPANET) design, resulting in thedecision to use connectionless paths in order to recovermore easily from a failed router [20]. However, it wasrecognized that basic fault tolerance is insufficient in thecase of correlated failures (e.g., due to an attack), andthus, network survivability became an important disciplinefor modern networks [21] (along with architecturalstrategies to achieve survivability [21], [22]). Althoughfault tolerance only requires redundancy in componentsand paths, survivability requires diversity [4], in orderfor the redundant part not to share the same fate as thefailed component. Other factors are active managementand protection elements that allow detecting the onset ofchallenges and combat them through appropriate defenseand mitigation action. Challenge taxonomies [23] andalso a resilience ontology [17] are required to capturethreats and challenges in appropriate threat models onwhich resilient system and network design can be based.A more formalized and comprehensive view on networkresilience is presented in [4]. Cholda et al. [18] present adetailed survey on resilience differentiation on the Internetthat also provides a detailed discussion on resilienceassessment frameworks. In essence, these discussions capture the relationship between resilience related conceptsand disciplines, i.e., challenge tolerance (including faulttolerance, survivability, disruption tolerance, and traffictolerance), trustworthiness (including dependability,performability, and security), robustness, and complexity.Furthermore, a set of principles grouped as prerequisites,enablers, tradeoffs, and behaviors are defined (Fig. 1).Among the key resilience enablers are redundancy,diversity, and connectivity and association [24]. Redundancy refers to adding additional resources to provideP ROCEEDINGS OF THE IEEE3

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.Dobson et al.: Self-Organization and Resilience for Networked Systemscase of Operational Resilience, a challenge analysis inconjunction with a resilience estimator helps to determineif specified resilience targets are being met. If this is notthe case appropriate resilience mechanisms have to beinvoked to counter or adapt to challenges in order tomaintain a high level of resilience.Fig. 1.Relationship between prerequisites, enablers, tradeoffs,and behaviors of resilience.fault tolerance. This principle can be employed acrossall system levels, including hardware redundancy (i.e.,additional hardware which is added to system or systemcomponents to improve availability even in the case offailure), path redundancy (i.e., the availability of multiplealternative network paths between source and destination), and application redundancy (i.e., multiple application instances that can carry out the same task). Diversity(in addition to redundancy) provides survivability in caseswhere redundant components share the same fate. Thisincludes geographically diverse paths to survive largescale disasters, and the avoidance of “monocultures” inhardware and software, e.g., to improve resistance to zeroday attacks. Connectivity and association refer to disruption tolerance in challenged communication environmentsdue to intermittent and episodic connectivity, e.g., forwireless links, mobility, unpredictable delay, and energyconstrained nodes. A key aspect is to be able to communicate even when stable end-to-end paths are not available.Resilience in networks and systems has different viewpoints. In [18], structural and guaranteed resilience differentiation are distinguished. The former is concernedwith structural arrangements (specifically related to therecovery of different connections) whereas the latter provides guarantees on the level of resilience. We distinguish between Structural Resilience, which expresses theresilience of the network and system infrastructures (i.e.,the structural arrangements) and the assessment of theresilience level they offer (e.g., [24]); and OperationalResilience that specifies the level of active resilience management capabilities within a system or network that allowto actively defend, detect, and mitigate against threats.An early application of adaptation for structuralresilience in the context of optical network restorationis a self-healing network [25], in which a distributedalgorithm restores a cut fiber in an optical mesh network.Dynamic routing as is typical on the Internet adapts to linkand router failures; the ability to provide and exploit geographic diversity across redundant paths enables resilienceto correlated failures from large-scale disasters [26].Well-defined resilience targets can be used to providemore clear and

self-organizing systems. Self-organization is increasingly important as network complexity increases. The clearest examples come from domains in which there are frequent,spontaneous changes in device population, topology, services, and so on. Mobile ad hoc networks (MANETs) were once limited to tactical military systems but now are essential in .

Related Documents:

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

FRM:SG2.SP2 Establish Resilience Budgets FRM:SG2.SP3 Resolve Funding Gaps FRM:SG3 Fund Resilience Activities FRM:SG3.SP1 Fund Resilience Activities FRM:SG4 Account for Resilience Activities ; FRM:SG4.SP1 Track and Document Costs FRM:SG4.SP2 Perform Cost and Performance Analysis FRM:SG5 Optimize Resilience Expenditures and Investments

Jan 27, 2021 · Plan for Resilience, Workplace Edition Robertson Cooper Resilience Model How to Build Resilience Skills in the Workplace 30 Ways to Build Workplace Resilience Five Key Stress Resilience Skills 6 unconventional ways to build focus, resilie

This presentation and SAP's strategy and possible future developments are subject to change and may be changed by SAP at any time for any reason without notice. This document is 7 provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a .