Survivable Network Design - University Of Pittsburgh

1y ago
12 Views
2 Downloads
1.21 MB
32 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Ryan Jay
Transcription

Survivable Network DesigngDavid TipperGraduate Telecommunications and Networking ProgramUniversity of PittsburghTelcom 2110 Slides 12Motivation Communications networks need to besurvivable? Communication Networks are CriticalI f t tInfrastructure(CI) (PCCIP 1996) theth systems,tassets and services upon which society and theeconomy depend Communication infrastructure often consideredmost important CI due to reliance on it by otherinfrastructures– banking and finance, government services– power grid SCADA, etc. Increasing Impact and Rate of Failures– Increased bandwidth of links (WDM technology infiber optic network)– Increased societal dependence– Multiple network operators and vendor equipment1

Causes of Network Outages According to Sprint a link outage in IP backbone every 30 minon average Accidents– cable cutscuts, car wreckwreck, etcetc.– According to AT&T 4.39 Cable cuts / year / 1000 km Human errors– incorrect maintenance, installation Environmental hazards– fire, flood, etc. Sabotage– physical, electronic Operational disruptions– schedule upgrades, maintenance, power outage Hardware/Software failures– Line card failure, faulty laser, software crash, etc.Backbone FailuresOther Unknown9%36%Link Failure Time to Recoverfrom Layer1 failureCongestion32%Router Operations Software UpgradeHardware UpgradeConfiguration Errors23%Router Failures Source: University of Michigan, 2000 Software failuresHardware failuresDOS Attacks2

Network Survivability Definition– Ability of the network to support the committed Quality ofServices (QoS) continuously in the presence of variousfailure scenarios– Includes performance as well as availability Survivability Components– Analysis: understand failures and system functionality after failures– Design: adopt network procedures and architecture to prevent andminimize the impact of failures/attacks on network services.– Goal: maintain service for certain scenarios at reasonable cost Self – Healing networkSurvivable Network Design Three steps towards a survivable network1. Prevention:– Robust equipment and architecture (e.g., backup power supplies)– Security (physical, electronic), Intrusion detection, etc.2. Topology Design and Capacity Allocation Design network with enough resources in appropriate topologySpare capacity allocation – to recover from failure3. Network Management and traffic restoration procedures DetectDt t theth failure,f ilandd reroutet traffict ffi aroundd ffailureilusingi ththeredundant capacity3

Survivability – Basic Concepts Working path and Backup path (recovery path): Working path: carry traffic under normaloperation Backup path: an alternate path to carry the trafficin case of failures3W orking route4BackuprouteBackup route1XAD CSCustom er2BSurvivability – Basic Concepts– To survive against a network failure– working path and backup path must be disjoint– So that both paths are not lost at the same time Disjoint ? (depending on a failure scenario)– Link disjoint– Node disjoint– (Shared Risk Link Group) SRLG ceNode-disjointDestination4

Shared Risk Link Group(SRLG)CLogical intentActual routingAPhysical CablesB Two fiber cables share the same duct or other commonphysical structure (such as a bridge crossing). Two cables can fail simultaneouslyClassification ofSurvivability Techniques Path-based (Global) versus Link-based (Local)F ilFailureDependentDd t vs. FailureF ilIndependentI dd tProtection versus RestorationDedicated-Backup versus Shared- Backup CapacityRing versus Mesh topologyDual homingP cycle5

Path-based versus Link-based Path-based Scheme (Global)– Disjoint alternate routes are provided between sourceand destination node23Working path164Backup path5Path-based versus Link-based Link-based Scheme (Local)– Alternate routes are pprovided between end nodes of thefailed link– Can have backhaul situation which wastes bandwidth23Working path164Backup path56

Partial Path Scheme Partial Path Scheme– Alternate routes are from the upstream node todestination node or from the downstream node tosource node231264315645Working pathBackup pathPath-based versus mplerFasterrecovery speed 7

What Does Survivability Get You?BPWPSource (S)Destination (D) Ai is an availability of link i Availability of a connection between S-D:Ano protection Aii WPAprotection Ai Ai i WPi BP Given Ai 0.998297,- Ano-protection 0.996597, Aii WP BPAprotection 0.999983Failure Dependent vs. Failure Independent Failure Dependent – the backup path depends on whichdevice fails – need a set of paths one for each failure case FailureF ilIndependentI dd t – backupb k pathth linkli k andd noded disjointdi j i twith working path - one backup path per working path Example:Failure Dependent backup pathfor link 2-3 failure7913123W ki pathWorkingth612Failure Dependent backuppath for link 1-2 failure1011458Failure Independent backup path8

Protection versus Restoration When to establish the backup paths? Protection– Backup paths are fully setup before a failure occurs.occurs– When failure occurs, no additional signaling is needed to establishthe backup path– Faster recovery timeWP Restoration– Backup paths are established after a failure occurs– More flexible with regard to the failure scenariosBP backupb k pathsh are setup afterf theh llocationi off failuref ilisi knownk– More capacity efficient due to its shared-backup nature, Utilize any spare capacity available in the network– But cannot guarantee 100% restorability after failuresProtection Protection Variants– 1 1 Protection (dedicated protection) Traffic is duplicated and transmitted over both working and backuppaths– Fastest recovery speed, but not bandwidth efficient– 1:1 Protection (dedicated protection with extra traffic) During normal operation (failure free), traffic is transmitted onlyover working path; backup path can be used to transmit extra traffic((low ppriorityy traffic)) better bandwidth utilization When the working path fails, extra traffic is preempted, and trafficis switched to the backup pathBPSourceWPDestination9

Protection– 1:N Protection (shared recovery with extratraffic) One protection entity for N working entitiesAPSWorking Channel 1Working Channel 2APSProtection ChannelWorking Channel nNode 1Node 2– M:N Protection (M N) M protection entities for N working entities– Self Healing Rings are a form of ProtectionLink Redundancy Link BundlingSimultaneous PhysicalConnections– Link failure does notaffect forwarding– Load redistributed amongother members of bundle Parallel Link Technologies– MLPPP – T1/E1/ Link aggregationagg egat o– 802.3ad – Ethernet aggregation– SONET/SDH aggregation– Multi-Link Frame Relay10

Types of Self-healing RingsWorking ringWorking ringProtection ringPProtectioni ringiADMADMADMADMADMADMADMADM1:1 Uni-directional self-healing ring(USHR)1:1 Bi-directional self-healing ring(BSHR)Dedicated versus Shared - Backup Dedicated-Backup Capacity– Backup resource can be used only by a particular working path Shared-BackupShared Backup Capacity– Backup resource between several working paths can be shared– Rule: backup resource can be shared only when correspondingworking paths are not expected to fail at the same time– More capacity efficientWP1 (traffic 5 units)64BP12Working path5Backup path71Link 5-7:dedicated spare capacity 15 unitsshared spare capacity 10 unitsBP238WP2 (traffic 10 units)11

Dedicated versus Shared - Backup Dedicated-Backup Capacity– Backup resource can be used only by a particular working path Shared-BackupShared Backup Capacity– Backup resource between several working paths can be shared– Rule: backup resource can be shared only when correspondingworking paths are not expected to fail at the same time– More capacity efficientWP1 (traffic 5 units)6BP142Working pathBackup path571BP2Link 5-7:dedicated spare capacity 15 unitsshared spare capacity 10 units38WP2 (traffic 10 units)Ring vs Mesh ArchitecturesAdvantages of Rings: More cost efficient at low traffic volumes Fast protection switching, some capacitysharingAdvantages of Mesh: More cost efficient at high traffic volumes Facilitates capacity and cost efficient meshrestoration More flexible channel re-configuration12

P CyclesProtection (P) Cycle– Closed cycles are formulated in the mesh network.– Affected traffic is rerouted along these cyclescycles.– For a large network will have a number of p-cycles((a)) A pre-configurepgcycley(c) A link not on the cycle fails(b) A link on the cycle fails(d) Another link not on the cycle failsP-Cycles: Basics For meshed networksPre-reserved protection paths (before failure)Based on cycles, like ringsAlso protects straddling failures, unlike ringsLocal protection action, adjacent to failure (in theorder of some 10 milliseconds) Shared capacity(c) A link not on the cycle fails “pre-configured protection cycles” p-cycles Developed at13

P-Cycles: Basics A single p-cycle in a network:p-Cycles: Basics Protected spans: 9 „on-cycle“(1 protection path)14

p-Cycles: Basics Protected spans: 9 on-cycle’’y((1 pprotection ppath)) 8 straddling’’ (2 protection paths)Restoration using p-cyclesA. Form the sparecapacity into aparticular set ofpre-connectedcycles !ijIf span i fails,p-cycle j providesone unit ofrestorationcapacityA span on the cycle fails - 1 Restoration Path, BLSR-likejA p-cyclei" xi , j 1 " caseIf span i fails,p-cycle j providestwo units ofrestorationcapacityA span off the p-cycle fails - 2 Restoration Paths, Mesh-like" xi , j 2 " case15

Mesh Survivability TechniquesMesh Survivability TechniquesProtectionRestorationDedicated-backup ProtectionPath-basedPath-based RestorationLink-based RestorationLink-basedShared-backup ProtectionPath-basedLink-basedP-cycleSurvivability Technique Metrics Scope of failure coverage– single link failure, single node/link failure, multiple failures, etc. Recoveryy time– 50ms in SONET Ring Backup capacity requirement (redundancy, Rr amount of spare capacity )amount of working capacity Guaranteed bandwidth vs. non guaranteed Reordering and duplication– switching between WP and BP Additive latency and jitter– quality of backup path, backup path length, congestion on backup path State overheadScalabilitySignaling requirementsNotion of resilence classes (QoR)– Different level of connection availability, restorability and recovery time16

Transport Survivability Number of techniques exist––––––APSMulti-homing (with or without trunk diversity)Link restorationPath restorationSelf healing ringsp-cycles See a mixture of techniques in real networks Usually little or no survivability at the far edge (CPE – last mile) Edges are multi-homed to MAN or WANAccessCoreAccessDual/Multi-homing Topologies Dual-homing– Customer host is connected totwo switched-hubs.– Traffic may be split betweenprimary and secondary pathsconnecting to the hubs.– Each path serves as a backup foranother. Multi-homing– Customer host is connected to morethan two switched hubs.– Greater protection against a failure.switchcustomer hostDual-homing topologyMulti-homing topology17

Dual-homing in TelephoneNetworkTransmission NetworkSDH/SONET FacilityProtectionXClass 4 TollNetworkDiverseSwitchLocationsXTransmission NetworkMultiple RoutesBetween OfficesSmallRadius ofDamageSmall Radiusof ServiceLossClass 5LocalNetworkResilient Edge Connectivity Multi-Homing for resilient Internet and IP-VPNconnectivity Solves link failure and ISP node failure problems What about failure of customer edge router?CustomerISP18

Virtual Router Redundancy Protocol Redundant default gateways:VRRP (RFC 2338)MasterBack-upMultiple routers onthe subnet negotiatewho will be “Master”and own the VirtualRouter IP Address.All other routersare backups. Backuppriority is configurable.GEGEMaster sendsperiodic hellosto communicatealive state.EthernetSwitchHostHostHosts arepreconfigured with VirtualRouter IP address asdefault fortraffic exiting the LAN.HostDual-homing in Data NetworkCustomer Edge(CE) RouterProvider Edge(PE) Router19

Implementation Multi-layered:– Demand Topology– Logical Transport Topology– Fiber/Optical Topology Can implement survivabilitytechniques at each layer Need to consider–––––––Failure propagationAlarm SettingSpeed of recoveryCostManagementTraffic GroomingEtc.Traffic Restoration Capabilities A survivability scheme and spare capacity doesn’t accomplishrestoration by itself, must be used in conjunction with dynamicrestoration techniques.techniques Need to detect failure and do path rearrangement given thatthere is enough spare capacity in the networks. For example a dual-homing approach guarantees survivingconnectivity, but it doesn’t restore the circuits/connections initself.itself Need network management procedures to perform pathrearrangement.20

Steps in Traffic RecoveryDetectionReconfigurationprocessRepairp pprocessNotificationFault IsolationIdentificationPath selectionRepairReroutingNormalizationIP Survivability Options Several techniques to improve survivability inIP networks IP layer –– adjust link weights and timers for faster failurerecovery– prestore second shortest paths, etc, Adopt Optical Transport techniques from Telcooperators (survivable rings, APS, pathrestoration, etc.) MPLS logical layer restoration21

IP Dynamic RoutingLink failure floodedNew YorkSanFranciscoNew Path Computed OSPF or IS-IS computespppath If link or node fails, New path is computed Response times: Typically a few seconds– Can be tuned to 1000’s milliseconds– According to Sprint data – usually 7secs to recoverBackup Label Switched PathsError signaledNew YorkSanFranciscoPrimary LSPBackup LSP Primary (working) LSP & backup LSPs established a priori If primary fails– Signal to head end, Use backup Faster response, requires wide area signaling22

MPLS Fast Reroute Increasing demand for“APS-like” redundancy– MPLS resilienceilitto lilink/nodek/ d failuresf il– Control-plane protection required– Avoid cost of SONET APS protectionD tDetourPrimaryLSR Solution: MPLS Fast-reroute– RSVP Extensions define Fast Reroute– LSPs can be set up, a priori, to backup: One LSP across a link and optionally next node, or All LSPs across a particular link1:1 Protection For each LSP, for each node– Set up one LSP as backup– Merge into primary LSP further downstream– Backs up link and downstream node23

1:1 LSP ProtectionTraffic uses detour LSPLink FailsMerged Downstream1:N Link Protection For each link, for each neighbor– Set up one detour LSP to backup the link as a whole– Uses LSP Hierarchy to backup all LSPs which wereusing failed linkMultiple PrimaryLSPs on same linkOne detour LSP for link24

1:N Link ProtectionLink FailsPrimary LSPs multiplexedover one detour LSPLSPs demultiplexedat next node1:N Link and Node Protection For each link– For each node 2 hops away Detour LSP backs up link & intermediate node Uses LSP Hierarchy to backup all LSPs to that node If there are two 2-hop paths to that node, setup twodetour LSPs– For each node 1 hop away Detour LSP backs up LSPs ending at that node25

MPLS Fast Reroute Provides fast recovery for LSP failure– Based on a ppriori backupp of detour LSPs– (eg, 5 millisecond for tens of LSPs with 1:1) There are significant tradeoffs between theapproaches––––Number of LSPs requiredWhether node failures are protectedAbility to reserve resources for backup LSPsOptimality of routesSummary of MPLS Methods End-to-End disjoint backup LSP – one per workingLSP in the network MPLS FFastt RRe-RouteR t– 1:1 LSP link or link node protection– 1:N Link protection– 1:N Link plus node protection All of these are interoperable based on IETF standards Sink Trees are under study Does MPLS solve all the problems?– Can’t recover from IP Layer Failure– Doesn’t provide protection of layer 1 customers– Fault Propagation Issue26

Multilayer Networks WAN networks have multiple technology layers Converging toward IP/MPLS/WDMMultiple Layers present several survivability challenges Coordination of recovery actions at different layers–Which layer is responsible for fault recovery? Spare Capacity Allocation (SCA) Failure Propagation––How to prevent over allocation, when each layer provides spare resources?Lower layer failure can affect multiple higher layer links!3MPLS connections1WDM Physical Path523145Optimization Based Design In implementing the chosen survivabilitytechnique (e.g., link protection, p-cycles) at aparticular layer (e.g., optical)- optimizationtechniques are usually adopted. First design working network and working/activepaths Then determine survivability design (often calledspare capacity network design) Examples in ITU Planning document Consider shared backup path protection27

Spare Capacity Allocation Single Layer Spare Capacity Allocation (SCA) Problem– given working paths and network (or virtual network) topology– provision spare capacity and find backup routes for fault tolerance– Goal: minimum spare capacity or cost Matrix based formulation*––––P path link incident matrix, Q backup link incident matrixRelate to spare provision matrix G, and spare capacity reservation sAssume path restoration with disjoint backup routesShared backup path protection for any single link failure14l2356Working Path*Backup PathY.Liu, D.Tipper, and P. Siripongwutikorn, “Approximating Optimal Spare Capacity Allocation by SuccessiveSurvivable Routing,'' ACM/IEEE Transactions on Networking, Vol. 13., No. 1, pp. 198-211, Feb., 2005 .Matrix model for SCAWorking and backup pathmatrices related to spareprovision matrix G Link i1gij spare capacity neededon link i when link j failsFrom G find sparecapacity allocations max(G)b3c465e7d123456723 4567Backup path linkincident matrixGs2211212110201101P1a2QT PWorking path linkincident 010101021000100000010101000Flowssrc dst1 a b2 a c3 a d4 a e5 b c6 b d7 b e8 c d9 c e10 d 001100000001101 2 3 4 5 6 7 8 9 10An Example:1. Link 2 fails2. Flow 3,4 affected3. Backup paths up4. Spare BW 2 on Li28

Optimization model for linkfailuresminS eT sTotal spare capacityQ,sEnough spare capacity on each links GG QT M PCalculation of spare provision matrixP Q 1Link-disjointed backup pathsQ BT D (mod 2)Flow conservation of backupQ is a binary matrixInteger programmingDecision variable: Q, ss.t.Given: M – traffic demand matrixP – working path link incidence matrixB and D – node-link & flow-node incidence matricesMixed Integer Programming problem NP HardHeuristic Solution Algorithm Successive survivable routing algorithm*– Decompose multi-commodity flow multiple single flows– Goal: Each flow seeks a backup path with minimaladditionalddl spare capacity– Using shortest path algorithm for each flow to route link-disjointed backup paths using spare provision matrix G to calculatelink cost – incremental spare reservation vr ; Flows successively update their backup paths termed: successive survivable routingg ((SSR)) Randomly order flows for successively updating. Fast computation find near optimal solution *Apparatus and Method for Spare Capacity Allocation, Y. Liu and D. Tipper, U.S. Patent 6,744,727 B2, June 1, 2004 Presented in ACM/IEEE Trans. On Networking Feb.,200529

SSR flowchart of flow r On source node of flow r:1. Given pr–2. Periodically update G–3. Calculate vr–4. Update qr using vr5. Update s, and G pr , qr : working andbackup path vectorsG, s: spare provisionmatrix and sparereservation vectorvr : incremental sparereservations used as linkreservations,costStop after no backup pathupdate on the networkNumerical comparison Compare different algorithms and bounds–––––––RAFT:: Resourceesou ce aggregationagg egat o faultau t toleranceto e a ceSPI: Sharing with partial informationSR: Survivable routing (SSR without iteration)SSR : Successive survivable routingSA: Simulated annealingBB: Branch and bound on a path-flow model – optimalLP: Linear programming lower bound Metrics:– % Redundancy spare capacity/working capacity,– execution time30

Experiment 63639137294132423630332134353137Network node degree ranges from 2.31 to 4.4Consider balanced mesh load caseRedundancy versus Timeon Network 375Network 370Fast responseWorsesolutions,fast65Redundancy (%) SSR, SR, SPI have64 randomdcases withithdifferent flow orders Range of solutions Time is the sum oftime to compute all64 cases60RAFTNear optimalsolutions, fastSPI5550Better solutions,slow, 410Time (second)31

State of the Art Survivable Network Design– Important in WAN Backbones Basic approachpp– Given particular technology (e.g., WDM, MPLS, etc)assume Traffic restoration scheme (e.g., failure independent path restoration) Failure scenario (any single link failure)– Determine least cost survivable network design usingoptimization formulations with heuristic solutions Manyy tradeoffsdeo s identifieddeed andd studieds ud ed––––––Protection vs. RestorationReactive vs. ProactiveShared vs. DedicatedLink vs. Path vs. Rings, etc.Failure Dependent vs. FIDEtc.,WPBP32

Design network with enough resources in appropriate topology Spare capacity allocation - to recover from failure 3. Network Management and traffic restoration procedures . Solves link failure and ISP node failure problemsSolves link failure and ISP node failure problems What about failure of customer edge router? Customer ISP. 19

Related Documents:

Sep 06, 2018 · SURVIVABLE FIRE ALARM CIRCUITS 40 Advancing the Science of Safety What Circuits Need Survivability? Signaling Line Circuit (SLC) When used to trigger remote NAC power supplies or booster power supplies. When used to for control modules that trigger circuits or amplifiers SURVIVABLE FIRE ALARM CIRCUITS 41 Advancing the Science of Safety

Survivable Vehicles for the Warfighters Mine Resistant Ambush Protected (MRAP) Vehicle Program Logistics Overview November 2012 . Unclassified - For Official Use Only 185 - May 06 MNF-W Commander . August 2009Requirement increases to 7,774 MRAP Task Force M-ATVs arrived in Afghanistan and LRIP 16 ADM increased to 22,882 per JROC for 1,400 M .

network.edgecount Return the Number of Edges in a Network Object network.edgelabel Plots a label corresponding to an edge in a network plot. network.extraction Extraction and Replacement Operators for Network Objects network.indicators Indicator Functions for Network Properties network.initialize Initialize a Network Class Object

design problem. Key words: Network design, LP relaxations, worst-case analysis, heuristics. 1. Introduction In this paper, we study from several perspectives the linear programming (LP) relaxations of a class of network design problems. This class includes a number of

It is important in any design project that network designers carefully analyze and evalu-ate the scope of the design before starting to gather information and plan network design. Therefore, it is critical to determine whether the design task is for a green field (new) network or for a current production network (if the network already exists, the

Professional Engineering 6X9 / Microwave Transmission Networks / Lehpamer / 122-2 / Chapter 5 5Chapter Microwave Network Design 5.1 Introduction After the preliminary microwave network plan has been approved, detailed microwave network design has to be completed. Site acquisi-tion, microwave network design, RF design (in case of wireless network

Certified Network Defense (CND) Outline . Module 01: Computer Network and Defense Fundamentals Network Fundamentals Computer Network Types of Network Major Network Topologies Network Components Network Interface Card

RM0008 Contents Doc ID 13902 Rev 9 3/995 4.3.1 Slowing down system clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57