1、 TECHNICAL REPORT ATIS-0100026 A METHODOLOGY FOR DESIGN OF END-TO-END NETWORK RELIABILITY FOR PROACTIVE NETWORK RELIABILITY PLANNING ATIS is the leading technical planning and standards development organization committed to the rapid development of global, market-driven standards for the information
2、, entertainment and communications industry. More than 200 companies actively formulate standards in ATIS Committees, covering issues including: IPTV, Cloud Services, Energy Efficiency, IP-Based and Wireless Technologies, Quality of Service, Billing and Operational Support, Emergency Services, Archi
3、tectural Platforms and Emerging Networks. In addition, numerous Incubators, Focus and Exploratory Groups address evolving industry priorities including Smart Grid, Machine-to-Machine, Networked Car, IP Downloadable Security, Policy Management and Network Optimization. ATIS is the North American Orga
4、nizational Partner for the 3rd Generation Partnership Project (3GPP), a member and major U.S. contributor to the International Telecommunication Union (ITU) Radio and Telecommunications Sectors, and a member of the Inter-American Telecommunication Commission (CITEL). ATIS is accredited by the Americ
5、an National Standards Institute (ANSI). For more information, please visit . Notice of Disclaimer the TR describes a methodology for approaching this key task. To enable a more informed decision on the appropriate reliability-level for a given network-element (NE) it introduces the Significant Point
6、 of Failure (SgPoF) metric. Early computation of the SgPoF in the design process, followed by the implementation of the recommended design will result in Capital Expenditure (CapEx) savings and reduce the cost associated with design changes that occur late in the design for reliability process. ATIS
7、-0100026 ii FOREWORD The Alliance for Telecommunication Industry Solutions (ATIS) serves the public through improved understanding between providers, customers, and manufacturers. The Network Performance, Reliability, and Quality of Service Committee (PRQC) develops and recommends standards, require
8、ments, and technical reports related to the performance, reliability, and associated security aspects of communications networks, as well as the processing of voice, audio, data, image, and video signals, and their multimedia integration. PRQC also develops and recommends positions on, and foster co
9、nsistency with, standards and related subjects under consideration in other North American and international standards bodies. The mandatory requirements are designated by the word shall and recommendations by the word should. Where both a mandatory requirement and a recommendation are specified for
10、 the same criterion, the recommendation represents a goal currently identifiable as having distinct compatibility or performance advantages. The word may denotes a optional capability that could augment the standard. The standard is fully functional without the incorporation of this optional capabil
11、ity. Suggestions for improvement of this document are welcome. They should be sent to the Alliance for Telecommunications Industry Solutions, PRQC Secretariat, 1200 G Street NW, Suite 500, Washington, DC 20005. At the time it approved this document, PRQC, which is responsible for the development of
12、this Technical Report, had the following members: M. Neibert, PRQC Chair (Telcordia) P. Tarapre, PRQC Vice Chair (AT in many cases the hardware and software may already be in place and provisioning entails only configuration tasks such as creating (or modifying) a customer record in a database and a
13、ssociating it with the service(s) and service level for which the customer has subscribed. AT-PROV. 4 ACRONYMS & ABBREVIATIONS A/S = Active Standby CapEx = Capital Expenditure COPT = Capacity Outage Probability Table CSCF = Call Session Control Function DT = Downtime HW = Hardware IP = Internet Prot
14、ocol NE = Network Element NGN = Next Generation Network OAM = Operations, Administration and Maintenance OpEx = Operations Expenditure PSTN = Public Switched Telephone Network SgPoF = Significant Point of Failure SIP = Session Initiation Protocol SLA = Service Level Agreement SPoF = Single Point of
15、Failure SW = Software TR = Technical Report 5 INTRODUCTION & RATIONALE The planning and provisioning of telecommunications services is a complex task. It requires a careful balancing of the business and technical aspects. The task involves detailed understanding of sophisticated systems that compris
16、e the various network elements and using this understanding to 4Telcordia documents are available from Industry Direct Sales, Telcordia, 8 Corporate Place, PYA 3A-184,Piscataway,NJ,08854-4156,or: . ATIS-0100026 3 design and implement the end-to-end network to support the desired services. Mistakes i
17、n the planning and provisioning steps have a direct business impact in terms of Service Level Agreement (SLA) penalties and lost revenue due to the loss of dissatisfied-customers. At the same time business constraints require care to not overspend on CapEx and Operations Expenditure (OpEx). The plan
18、ning and provisioning processes have been developed based on experience with legacy networks. At a high-level, the usual process for planning through provisioning is as follows: Design for ReliabilityNetworkArchitectureTo Provisioning InputsLink & NodeSizingNetwork DesignFigure 1 - A High-Level View
19、 of the Planning & Provisioning Process As shown in Figure 1, design for reliability is fundamental to the planning and provisioning of networks for service availability. It is also a task that falls squarely within the purview of reliability engineering. Hence this TR will focus on design for relia
20、bility as a key activity for planning and provisioning in the delivery of network service availability. The design for network reliability in legacy networks incorporates the following essential steps: Data collection: Reliability data for each NE that is in the network architecture. For each in-sco
21、pe service, a reference-connection representation of the end-to-end network that supports the particular service. Network reliability modeling and analysis: Obtain the end-to-end network reliability using the reference-connection and equipment reliability data. (If needed, the same reference connect
22、ion can be used to calculate other reliability metrics as well.) Comparison of model-derived network availability and other reliability metrics with objectives: Ensure the designed network meets the reliability objective. In case the objective(s) are not met, the reliability engineer will do a gap-a
23、nalysis to identify design-areas where improvement can help meet the reliability objectives. Re-Design the network: The reliability engineer and the designer will together work the changes in the network design to ensure the objectives are met. ATIS-0100026 4 With the advent of IP technology, the tr
24、aditional design for reliability methodologies will be inadequate for NGNs. The distributed nature of the IP network and the multiple services it supports will lead to a very large number of reference connections that are needed in the design for network reliability. As services move to networks bas
25、ed on IP-technology, often the requirements have to be ported over from legacy technologies. This requires using reference-connections and reliability-budgeting to map existing requirements onto the components of the new-technology network. An example of this approach is provided by CL-1128. Thus it
26、 makes it important for the network-designer to adopt a way of organizing the reference-connections. Early in the planning and provisioning process the impact of each NEs failure should be assessed when deciding on its reliability design. Making the reliability-related changes late in the design pro
27、cess may entail changes in other design-decisions, (e.g., capacity-sizing). This is especially true for NGNs that will be more complex due to the disaggregated nature of the IP-technology. Secondly, if these design-changes are made without considering the failure impact they could result in CapEx th
28、at may not be justified when evaluated for the impact on subscribers. As a simple example, a decision to remove all Single Points of Failure (SPoF) to improve reliability will be expensive and probably unnecessary for many network elements whose failure does not impact a large number of service-subs
29、cribers. The impact of NE failure depends on the reliability of the NE and also the traffic (load) that it has to process. An NE that is failed but has no traffic to serve will not impact any subscribers. In the design for reliability of NGN new failure sources need to be considered. Such failure so
30、urces are typically not of concern when designing legacy networks. Examples of such non-traditional failure sources are power and cyber-security attacks. In legacy-network, failures in the telecommunications network are fairly independent of failures in the power infrastructure. The Public Switched
31、Telephone Network (PSTN) wireline phone service is still available when there is a power outage because of power available from backup batteries or diesel-generators. In IP networks, the distributed access media will depend largely on power supplied from commercial sources with many NEs placed in no
32、n-Central Office environment without the benefit of backup batteries or diesel-generators. Thus design for reliability methodologies have to put adequate emphasis on power. Lack of security is another new failure-source. With the use of IP technology, cyber-threats are an increasing concern. In lega
33、cy Time Division Multiplexing (TDM) networks the SS7 signaling/control network was separate from the transport network. Furthermore, with proprietary hardware and software, it was not possible for the hacker to access and attack these networks. IP technology with widespread use of off-the-shelf hard
34、ware and software components has changed this. Planning and provisioning through the design for reliability has to factor in these threats. 6 A DESIGN FOR RELIABILITY METHODOLOGY To enable proactive network reliability-planning that would address the above mentioned new challenges, the following fou
35、r-step methodology is proposed. Figure 3 summarizes the concepts. In Section 7, we provide an example of how to calculate the SgPoF metric and use it to improve the design for reliability. 1. Input Collection The first step consists of collecting input data required at the various steps of the desig
36、n for network reliability process. The data include the following. a) NE specific data: Starting with the list of the planned network elements, their component configurations, reliability features, and component failure rates. b) Traffic Profile: The expected traffic behavior on the planned network.
37、 This data is available to network architects as it is used to size the links and nodes. c) List of the planned applications and services. ATIS-0100026 5 d) Reliability requirements for services and networks: The reliability requirements are conveyed as part of their contract. In case these are not
38、explicitly stated, the network-designers can rely on their own experience or derive them from standards using reliability-budgeting. e) Proposed metrics for reliability measurement: In case these are not specified, the network-designers can rely on experience or obtain them from published standards.
39、 2. Network Architecture (Preliminary Network Design) This step creates the network architecture (preliminary network design) based on the network elements and the traffic profile. a) For each in-scope service, identify the call path and the end-to-end reliability requirements. For ease in managemen
40、t of the numerous call paths, they can be collected in a service matrix as shown in Figure 2. To ensure that the correct level of reliability is designed for, objectives should be set for SgPoF and for the end-to-end reliability. Figure 2 - Example of a Service Matrix for an IMS Network b) Take a fi
41、rst-cut at improving design such as choice of reliability features and alternative choice of equipment. Element is in the call set-up Element is in the bearer ConnectionCategoriesSignaling orBearerCircuits SitesNG-DSLAMAGCF/VGW SBC Routers MRF MGWP-CSCFI-CSCF S-CSCF TAS HSSDNS/ENUM BGCF MGCF IN CCFO
42、n-NetOriginateTerminateOriginateTerminateOriginateTerminateOriginateTerminateOriginateTerminateOriginateTerminateCall-SetupPathBearer PathIMS H.248 toPSTN callIMS SIP toPSTN callIMS SIP toIMS SIP callCall-SetupPathBearer PathCall-SetupPathBearer PathServicesCritical Network ElementsATIS-0100026 6 3.
43、 Model Significant Points of Failure (SgPoF). Prioritize network elements by the impact of their failure on service-subscribers. Use of the service matrix will aid the network-designer in limiting this modeling to critical network elements only. Critical network elements are those that appear in the
44、 call-paths most frequently (Figure 2). a) Measure the significance of an NEs failure by modeling its impact on service. The impact can be measured by the expected number of subscribers affected by the NEs failure. We will refer to this measure as the SgPoF metric. b) If the SgPoF metric value for a
45、 given NE is too high, (e.g., it does not meet an objective for the NE,) then improve design to mitigate the impact of that NE. Such redesign could be the addition of redundancy or choice of an alternative reliability feature available on that NE. 4. Reliability Validation a) Model end-to-end reliab
46、ility to ensure compliance with end-to-end reliability network and/or service requirements. This reliability model will depend on the metrics under consideration. For service availability metric, standard definition of service availability are provided by ATIS AT-100016. b) Incorporate the effect of
47、 non-traditional failure sources in the reliability models. These failure-sources are unique to IP networks, (e.g., power and security). In a practical design effort, the design is improved iteratively. So as shown in Figure 3, after Step 3 and after Step 4, there are decision points to check if req
48、uirements are met. Failure to meet reliability requirements, results in the designer looping-back to improve the design for reliability. Such improvements are achieved through the addition of reliability features, change in choice of NE or re-arranging the links so that particular network-elements h
49、andle less traffic. Inputs(1)To (2): Improve DesignYesNoYesNetwork Architecture/Design(2)Identify Significant Points of Failure(3)Reliability Validation(4)To (2): Improve DesignNoSgPoFObjective Met ?Objective Met ?Figure 3 - An End-to-End Design for Reliability Process for NGN 7 EXAMPLE: SIGNIFICANT POINT OF FAILURE (SgPOF) METRIC CALCULATION We provide an example SgPoF metric calculation for the NE of an IMS network BL-2008. The SgPoF for a NE is a measure of the expected number of subscribers affected by failure of the NE being analyzed. The calculation and its use in i