1、 TECHNICAL REPORT ATIS-0100028 NETWORK RESILIENCY PLANNING FOR ENTERPRISE CUSTOMERS ATIS is the leading technical planning and standards development organization committed to the rapid development of global, market-driven standards for the information, entertainment and communications industry. More
2、 than 200 companies actively formulate standards in ATIS Committees, covering issues including: IPTV, Cloud Services, Energy Efficiency, IP-Based and Wireless Technologies, Quality of Service, Billing and Operational Support, Emergency Services, Architectural Platforms and Emerging Networks. In addi
3、tion, numerous Incubators, Focus and Exploratory Groups address evolving industry priorities including Smart Grid, Machine-to-Machine, Networked Car, IP Downloadable Security, Policy Management and Network Optimization. ATIS is the North American Organizational Partner for the 3rd Generation Partner
4、ship Project (3GPP), a member and major U.S. contributor to the International Telecommunication Union (ITU) Radio and Telecommunications Sectors, and a member of the Inter-American Telecommunication Commission (CITEL). ATIS is accredited by the American National Standards Institute (ANSI). For more
5、information, please visit . Notice of Disclaimer however they may not be as stringent as those for the regional or national centers. The Enterprise Customer is thus expected to have a range of site availability requirements depending on the number and type of site locations. The service provider nee
6、ds to create a design process that meets these site availability requirements for the Enterprise Customer. Large service providers design their core backbone networks with high levels of redundancy built in for network elements (e.g., routers, cross-connects, etc) as well as transport facilities lin
7、king these elements. Hence, core backbone network failures typically do not result in extended service outages. The exception may be large scale disasters such as earthquakes, hurricanes, etc, or terrorist attacks ATIS-0100028 5 that may cause substantial damage to a network over a large region4. Bo
8、ttlenecks occur at the edges of the network under normal conditions and hence, the main issue is to ensure accessibility from all the Enterprise Customer sites to the edge of the core backbone of the service provider network. Clauses 6.1 6.3 provide descriptions on various methodologies and processe
9、s by which networks in general and core backbone networks in particular, can achieve high levels of reliability and resiliency. Clause 6.4 defines a key availability metric that is utilized for designing suitable network access to the core backbone network for all customer sites. Clause 6.5 describe
10、s the design process for connecting customer sites to the core backbone network. 6.1 Service Offerings and Reliability Options In general, services for Enterprise Customers must provide a high degree of resiliency. These services are designed to meet the desired level of service availability (e.g.,
11、the “5 nines” availability guarantees the industry is famous for). Resiliency in service design relies on a number of fundamental principles. Chief among these are the notions of “Single Point of Failure” (SPoF) avoidance and of self healing networks. A service SPoF is defined in terms of support fo
12、r any given service. A SPoF is a network component (either software or hardware) whose failure will disrupt that service until the failure is automatically restored or physically repaired. Good examples of this are the ingress and egress nodes for a given service path. In the rare event that one of
13、these nodes is lost, the self healing capabilities of the network would not be able to restore those service paths (see Figure 1). If the SPoFs introduce more than acceptable risk, the risk may be mitigated by using diversity based service options. Such services allow for the complete physical separ
14、ation of groups of circuits so that no single failure will disrupt more than any one of the group. Services based on routing diversity generally imply that the customer contracts for twice as many paths as are required by the traffic load. In the ideal case, one set of paths is fully diverse5from th
15、e other. In some instances, one set of paths serves as a backup while the other carries all traffic and the path switching is under customer control. In other cases, traffic may be load shared across both sets of paths. Diversity arrangements can be designed to avoid most SPoFs, except the customer
16、location (see Figure 1), but they are highly resource intensive requiring considerable excess capacity. Diversity options are useful for designing access from the customer site to the core backbone network. 4Re-routing traffic seamlessly with automated recovery mechanisms is not feasible in such cas
17、es and the only mitigation is to install disaster recovery techniques (see Clause 6.4). 5In some cases, depending on the available topology (e.g., physical spur) complete physical diversity may not be possible. ATIS-0100028 6 Diversity ArrangementsCPE ACPE ZNetwork Node BNetwork Node C1. If Customer
18、 Premise locations A or Z fail, then physical repair is the only way toresume operations.2. If Network Nodes B or C or their inter-connecting transport links fail, thenautomated restoration mechanisms can restore service over alternate paths.Legend: CPE Customer Premise EquipmentFigure 1 - Illustrat
19、ive Example of Diversity Arrangements Self-healing services are designed to react to a failure by automatically avoiding the failed elements. These mechanisms exist at multiple network layers and are based on automatic restoration techniques. Restoration mechanisms widely deployed to support self-he
20、aling services are based on self-healing ring architectures, or on mesh networks with fast re-routing capabilities around failures. The notion of re-routing is applicable to transport as well as voice and data networks. SONET Ring architectures are useful for network access while mesh restoration me
21、thods are typically deployed in core backbone networks. Resilient service options need to be carefully designed to afford customers the appropriate degree of service continuity. In the transport services world, this means offering services that are guaranteed to be quickly restored in the event of a
22、 failure (e.g., SONET Ring re-routing around a cable cut/intrusion). Services that ride on top of the transport layer may add additional layers of service specific resilience. Ensuring end-to-end service resilience support may be fairly complex as the end-to-end service path includes multiple networ
23、k segments (see Figure 2). The end-to-end path can be viewed as two access segments on either side of a core backbone (service) network. Note that the access segment may include a backhaul segment to reach the provider network edge. The access segments provide connectivity from the customer premise
24、to the edge of the core backbone network. Resiliency within the core backbone can be provided by a variety of mechanisms including redundancy in elements and fast rerouting. Resiliency in the connectivity to the service network can be achieved via diverse or resilient access coupled with dual homing
25、 options, thus avoiding the creation of a SPoF at edge elements. Different dual-homing options allow the customers traffic to be homed to either separate ports on the same switch, to separate switches in the same location, or to switches in separate locations, depending on the customer requirements.
26、 Maximum resiliency is achieved when each network segment is in and of itself resilient, and there are no SPoFs between them. As service SPoFs are removed by combining dual homing with resilient access and resilient core backbone networks, customers are provided with the highest possible availabilit
27、y. ATIS-0100028 7 Vo ic e N e t w o r kTransportNetworkFirstSP POPEnterprise Customer Data/Call Center“Backhaul”Customer Access Data Access“Backhaul”Data AccessL2, L3 or BothLSO/ SWCFirstSP POPData NetworkVoice Data Switch POPVo ic e Data Switch POPGeneralized Customer Connectivity ModelEach service
28、 segment may use specific resilience options, such as: Access rings Backhaul rings/mesh Edge nodes dual homing Core network Mesh and service (layer) specific mechanismsResiliency options are often combined to provide end-to-end service resiliencyLegend: SP Service Provider; POP Point of Presence; LS
29、O Local Serving Office; SWC - SwitchFigure 2 - Generalized Enterprise Customer Connectivity Segments 6.2 Business Processes The realization of an enterprise network that meets customer needs can only be achieved when there is a partnership between the customer and the telecommunications services pro
30、vider. Key to this partnership is the sharing of information between carrier and customer. The customer shares information about its business process designs, key sensitivities, and customer site information - number and type of customer sites/offices and the availability requirements for each site.
31、 The service provider shares data about its services, service options and their cost, and service capabilities specific to geographic location. It then creates a network design that seeks to optimize the cost to the Enterprise Customer such that the site availability requirements of the customer are
32、 satisfactorily met. To do this, the service provider takes into consideration a range of service reliability options for the critical segments of the customers network. Some of the high-resiliency options can be resource intensive, and as such carry a higher cost for the Enterprise Customer. Custom
33、ers must be active participants in developing end to end solutions that will meet their business needs. This entails the design and execution of an appropriate set of business processes. The customer sites that house critical business processes can be important SPoFs. Customers must ensure that crit
34、ical sites are not SPoFs by providing alternate locations and putting into place all the processes needed to ensure that multiple sites have updated data bases, are continually ready, and are properly staffed. Disaster recovery plans are needed to allow the transfer of business operations from one s
35、ite to the other. Ideally, the selection of sites for critical operations centers includes telecommunications resiliency considerations. This requires a partnership with the service provider to identify service and resiliency options available, and access alternatives. Often, LATA boundaries, tariff
36、s and the availability of fiber are factors in assessing a location from a telecommunications reliability perspective. In some instances, Enterprise Customers contract with multiple service providers to mitigate the risk of business operations interruptions. However, in many situations this has prov
37、en to be a less effective method than a strong partnership between the Enterprise Customer and its network provider. To a large extent, this is because many right-of-ways are shared between carriers and common conditions will cause failures for multiple providers networks. A good example of this sit
38、uation is the 2006 earthquake in the Luzon ATIS-0100028 8 Strait off the coast of Taiwan, in which facilities for multiple providers of trans-Pacific networks were impacted. 6.3 Infrastructure Support With more and more critical applications running globally, across cities, countries or continents,
39、it becomes imperative that networking services deliver the reliability sought by enterprises. As previously noted, service reliability is supported by risk mitigation at different levels. High-end communications services often come with a slew of options for mitigating the risk factors (from simple
40、fiber cuts, to catastrophic events leading to loss of central offices or enterprise locations) for improving service reliability. These services must be carried in an infrastructure equally resilient to failure. Several key network resiliency-related mechanisms of redundancy, restoration and rerouti
41、ng employed in modern telecommunications networks are briefly discussed here. These mechanisms are widely deployed in core backbone networks. 6.3.1 Redundancy Mechanisms There are multiple types of redundancy used extensively in the telecommunications networks including equipment and facility redund
42、ancy. While redundant equipment and facilities are most often thought of being used to protect against failures, they are also useful in preserving service when scheduled maintenance activities are required. Equipment redundancy has been used extensively to protect against equipment outages which ma
43、y occur due to software or hardware failures. In order to maximize network resilience, network element architectures have evolved to provide complete redundancy for critical core subsystems such as power supplies and common controllers. Service supporting customer facing line-cards may also be confi
44、gured in a 1+1 or 1:N protection scheme based upon the level of resiliency required. Network elements may be deployed in a redundant manner, coupled with dual connectivity between nodes and advanced routing mechanisms. Often dual connectivity is coupled with diverse routing of the connecting links t
45、o provide end-to-end protection. Facility redundancy requires route diversity and is generally used either in conjunction with restoration or dual connectivity as described below. 6.3.2 Restoration and Re-Routing At Layer 1, restoration is used primarily to mitigate degradation or loss of signal tra
46、nsmission, such as that which would result from fiber intrusions or other failures. As a practical point, restoration schemes use re-routing capabilities in network nodes to accomplish transmission restoration. They use the network capacity more efficiently than an approach based solely on using div
47、erse connectivity between nodes. In addition, restoration can also mitigate risks associated with losing intermediate nodes, by providing rerouting capabilities around a failed node. Note however that restoration is never instantaneous there is added value in having diversity arrangements. Facility
48、restoration mechanisms have been in place and have evolved over many years. SONET/SDH self-healing ring technology is bandwidth intensive and is constrained by a maximum ring circumference and number of nodes that can be accommodated while still meeting restoration time targets of 50 ms (for span sw
49、itching) and 200 ms (for loopback switching) ATIS-0900105. Technology and economic considerations have prompted development of alternatives based on mesh topology for core backbone networks. Intelligent cross-connect nodes capable of supporting computational intensive re-routing deliver sub-second restoration in mesh networks G.8080. Regardless of the technique, effective facility restoration requires careful capacity planning and monitoring to ensure the presence of sufficient restoration capacity. This is easiest for self-healing rings, but often requires m