1、 ATIS-0100002 RELIABILITY ASPECTS OF NEXT GENERATION NETWORKS TECHNICAL REPORT The Alliance for Telecommunication Industry Solutions (ATIS) is a technical planning and standards development organization that is committed to rapidly developing and promoting technical and operations standards for the
2、communications and related information technologies industry worldwide using a pragmatic, flexible and open approach. Over 1,100 participants from over 300 communications companies are active in ATIS 22 industry committees and its Incubator Solutions Program. Notice of Disclaimer Study Group 12 End-
3、to-End Transmission Performance of Networks and Terminals (see ITU-T Recommendation I.350); Study Group 13 Multi-Protocol and IP-based Networks and their Interworking; and Study Group 15 Optical and Other Transport Networks. 6.3 Other Forums and Committees Forums involved in network reliability incl
4、ude the following (see T1.TR.70-2001): IETF Internet Engineering Task Force; NRIC Network Reliability and Interoperability Council; NRSC Network Reliability Steering Committee; OIF Optical Internetworking Forum; and Cable Labs (see pkt-tr-voipar-v01-001128). 6.4 Federal Government Related Work See A
5、ppendix A on Data Elements For Reporting Cyber And Physical Events Affecting Telecommunications Networks. 7 NEXT GENERATION NETWORKS OVERVIEW 7.1 NGN Scope The NGN is an edge-to-edge packet-based network that seamlessly supports data and voice services, video and multimedia services, and other advan
6、ced features (Figure 1). NGN Reliability Performance (as discussed in this document) has two perspectives: (i) the service view; and (ii) the network view. Generally, the service view will be important to both end users as well as other service providers. The network view will be most important to t
7、he owner and operator of the network. The service user experiences service outages, failed service attempts, etc., while the service provider experiences maintenance costs as well as OAMP3outages such as loss of the ability to diagnose. These and other key concepts are discussed in Section 6 of T1.T
8、R.70-2001 1. 3Operations, Administration, Management, and Provisioning ATIS-0100002 Figure 1 - NGN Scope The architectural components are: Access Edge to interface network customers, add services functionality, and aggregate traffic to the backbone transport network. Services Edge to provide key ser
9、vice functionalities such as Services and Network Management, authentication, VoIP call control, etc. Core Network to provide transport connectivity. Packet and PSTN Edge to interface to the Internet and the PSTN. 7.2 Next Generation Network Functionality The NGN functionality is depicted in three l
10、evels as shown in Figure 2: (i) Content and Applications, (ii) Communications Services, and (iii) the Communication Paths to provide multiple service and applications. 6 ATIS-0100002 Communication Paths Nodes Control Communication Services Content and Applications End-users Applications (e.g., Real-
11、time media, e-commerce) Content Servers Gateways, gatekeepers, etc. Enhancing communications (e.g., security, policy engines, services management) Enabling communications (e.g., network management) Any path connectivity Layer 0-3 technologies: IP, Optical Ethernet, ATM, Frame Relay, SONET, WDM, Radi
12、o, satellite, etc. Figure 2 - Next Generation Network Functionality These NGN levels inter-operate to provide end-users or end-devices with services and network operators with remote OAMP capabilities. The Communication Path level provides path connectivity between end-user and end-devices using Lay
13、er 0-3 technologies such as SONET, WDM, ATM, Frame Relay, Optical Ethernet and IP. The Communications Services level enhances communications by provided capabilities such as network management and security. The Content and Applications level provides the end-user applications such as voice and video
14、. 8 NGN DESIGN CONSIDERATIONS 8.1 Design Challenges The NGN is a multi-service packet-based network that is multi-vendor supplied and multi-provider operated. This presents some key reliability-related design challenges: Wide range of service reliability expectations over the same network. Wide rang
15、e of application sensitivity to failure. Functional, OAMP, state information, fault information, etc., inter-operability. Containment of failures (fault propagation causing high impacting outages). Non-deterministic nature of packet networks, which may lead to long protocol convergence time at the t
16、ime of control plane failure. Increased system functionality and, in turn, hardware and software complexity resulting in lower fault detection and recovery coverage. Frequent software upgrades for new features and services, which may result in increased annual down time. 7 ATIS-0100002 8 Distributed
17、 VOP control requiring inter-NE co-operation to save and use state information for successful recovery. VOP applications and services provided by multiple individual systems, which may lower end-to-end service availability (due to concatenation of the systems along the service path). 8.2 NGN Require
18、ments NGN reliability-related requirements that only consider the communication path have limited value because they do not address reliability and availability as experienced by end-users or end-devices. For the service user reliability experience to be satisfactory, all three-network levels must o
19、perate together reliably as a system. This means that the failure mode “behavior” of the NGN system must be mapped to the impact on the end-user or end-device “quality of experience”. It is necessary to relate reliability-related NGN design parameters (e.g., network restoration time, port failure fr
20、equency, etc.) to the reliability-related metrics that capture the end-user or end-device experience (e.g., Service Downtime) Objectives for the end-user metrics, such as Service Downtime, should not be standardized because these numbers are technology and implementation-specific and will improve ov
21、er time based on competition. A requirement should be phrased as “no single point of failure causing a service outage requiring a field repair”. Depending on the technology and network design, the end-user downtime to meet this requirement could vary from 0.5 to 3 minutes per year. Because the appli
22、cations vary in terms of their timeliness needs, the performance criterion that constitutes a failure varies. These thresholds also vary depending on the usage state (e.g., access vs. use). Values for these parameters should be characterized and agreed to across the industry. 8.3 NGN Design Strategi
23、es The network architects role is to optimize the network design to satisfy service-based reliability/availability expectations and to minimize cost. The design strategies, shown in Figure 3, are grouped into two areas that are meant to minimize or eliminate the frequency of occurrence of the failur
24、e events that originate within the network system (prevention design strategies), and reduce the impact of the failure event on the service (mitigation/masking strategies). ATIS-0100002 NGNSolutionsMaximize ServiceReliability the impact of the failure event failure mode; and the business value. The
25、network-wide use case approach requires that reliability/availability requirements should not be specified for Network Elements (NEs) independently of the network solution. As Figure 4 illustrates, network use case requirements that are set based on both market business drivers (top-down) and techno
26、logy capabilities (bottom-up). The solution-specific requirements result in network element options to allow for design flexibility. This flexibility allows the network designer to mitigate or mask failure modes via networking or network element (or both) design strategies - whichever is of most val
27、ue for the specific network solution. An example is 1:N port protection as a network element option. The baseline set of requirements is independent of any specific network solution. These attributes, which are to be met for all network solutions, are ones that are most economically resolved within
28、the product not by networking design strategies. An example is traffic overload robustness, where it is better to contain the impact within the network element rather than relying on network restoration to recover. ATIS-0100002 Figure 4 - Network Element Requirements 8.4 NGN Design Considerations Th
29、e key network reliability design principles for the NGN starts with the prevention strategies. The masking and mitigation strategies simply add cost to the NGN and therefore must be selected based on value. 1. Simplify the functional design to minimze the NGNs technology failure rates (e.g., increas
30、e Mean-time-between-Failures). 2. Ensure that the functional design is robust to events such as climate, ESD/EMI and traffic overloads. 3. Design partitioning to facilitate good fault isolation to reduce equipment return rates and improve mean-time-to-repair. 4. Human factors design to minimize proc
31、edural error. 5. Security features to prevent denial of service attacks. The masking and mitigation strategies should be selected assuming good implementation quality of the functional design. Adding cost and complexity to the NGN design should not be done because of poor quality, it should be done
32、based on the inherent reliability of the design. That is, select those areas where the failure frequency-duration-impact combination warrants the cost versus the Service Customer expectations. 10 ATIS-0100002 11 1. To minimize cost of reliability and complexity of design, fault handling technologies
33、 should be selected based on: Failure mode risk; Bandwidth efficiency; and Ability to mitigate failure mode impact on end-user services. 2. To expand the failure event scope to include all failure events, it is necessary to go beyond the widely used case-based scope. These events include: Hardware a
34、nd software failures. Normal OAMP activities done by the network operator both remotely and on-site. People both procedural errors and acts of sabotage. Traffic overloads both bearer traffic and control messages. Environmental incidents such as floods, earthquakes, etc. Although Communication Server
35、s must be built with the best current practices for building carrier-grade platforms with no single point of failure, the impact of catastrophic failures on the building itself where these network elements are located has to be taken into account when designing these networks. Therefore, the archite
36、cture for an NGN must include strategies to handle events beyond direct control (e.g., fires, earthquake, flooding). One possible strategy is to operate the processors that compose the Communication Server in an N+M configuration (N working servers and M backup servers), where the extra processors c
37、an take over call processing in the event of a failure of one of the other processors. Of course, the redundant processors would be located in a separate physical location. In normal operation, the Media Gateways communicate with the processor to which they are assigned. In the event that a processo
38、r fails completely, the Media Gateways impacted would then try to communicate with a standby processor and resume processing. In parallel, the standby processor would have detected or have been informed of the failure, and would have loaded the right system configuration to assume the role of the fa
39、iled processor. Because of the distributed architecture of an NGN, the Media Gateways can be physically located in several different locations, minimizing the risk of one catastrophic event cutting off service to all users. Service to end users can be restored very quickly with this architecture, mu
40、ch faster than the older approach - which consisted of shipping new equipment to the affected site, sometimes in a pre-configured container, and physically rebuilding the connectivity. 8.5 Design Consideration Example To discuss the design considerations, a generic State Diagram (Figure 5) describes
41、 the classes of states the network can exist in. For each state type and transition, there are design considerations to prevent, mitigate, or mask the failures. ATIS-0100002 2. Auto-Recovering States 3. NOC -Recovering States To State 4 To State 4 1. Normal Operating State 6. Outage States Requiring
42、 Repair Action 5. Dormant States 4. Recovered States To State 2 Figure 5 - Design Considerations Framework The following sections describe the design considerations for each of the states and transitions. 8.5.1 Normal Operating State Because the network system spends most of its time in this state,
43、the network protection/restoration features are often overhead resources waiting to be used, thus representing a cost that is not directly generating revenue. Reserved bandwidth and static pre-provisioned back-up paths are examples. The design considerations for this state are: Network architecture
44、to minimize single points of failure. (Transition 1 to 6) The design complexity and its ability to ensure the ongoing integrity of the back-up facilities to minimize dormant faults. Robustness of the network elements to the operating and traffic environments to ensure the network does not change fai
45、lure states. Human factors design to prevent procedural errors, such as fail-safe commands. 8.5.2 Detection and Recovery There are three generic intermediate state types: 1. Auto-recovering States are of two types: Transients or software failures are detected and automatically recovered back to the
46、normal operating state and require no repair. (Transitions 1 to 2 and 2 to 1). Hard faults are where there is protection using hold-off timers. (Transitions 1 to 2 and 2 to 4). These states are service-affecting and if long enough will be considered a service outage. 2. NOC Recovering States are tho
47、se that require the personnel at the Network Operations Center (NOC) to manual recover the network to a recovered state. (Transitions 1 to 3 and 3 to 4). The intent is to recover to the non-service-affecting state 3; however, some designs can only recover to a reduced service-affecting state. 12 ATI
48、S-0100002 13 3. Recovered States are non-service-affecting states requiring repair that have recovered and alarmed the failure. 4. Dormant States are non-service-affecting states where there are hard faults that are undetected (e.g., of a back-up resource). They are of two types: detectable by a sch
49、eduled background integrity test (Transitions 1 to 5 to 4) and undetectable (Transitions 1 to 5 to 6 or any of the recovering states). The design considerations for these states are: High failure mode detection coverage for both the operational resources to eliminate single points of failure as well as back-up resources to eliminate dormant faults. Integrity of recovery requires that the speed of detection and recovery is fast enough to mask the impact on the higher laye
copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1