1、 TECHNICAL REQUIREMENT T1.TRQ.11-2004 Technical Requirement on Recording Outages in Packet Network Elements Prepared by T1A1.2 Working Group on Network Survivability Performance Problem Solvers to the Telecommunications Industry A Word from ATIS and Committee T1 Established in February 1984, Committ
2、ee T1 develops technical standards, reports and requirements regarding interoperability of telecommunications networks at interfaces with end-user systems, carriers, information and enhanced-service providers, and customer premises equipment (CPE). Committee T1 is sponsored by ATIS and is accredited
3、 by ANSI. NOTE - The users attention is called to the possibility that compliance with this standard may require use of an invention covered by patent rights. By publication of this standard, no position is taken with respect to the validity of this claim or any patent rights in connection therewith
4、. The patent holder has, however, filed a statement of willingness to grant license under these rights on reasonable and nondiscriminatory terms and conditions to applicants desiring to obtain such a license. Details may be obtained from the publisher. T1.TRQ.11-2004 Published by Alliance for Teleco
5、mmunications Industry Solutions 1200 G Street, NW, Suite 500 Washington, DC 20005 Committee T1 is sponsored by the Alliance for Telecommunications Industry Solutions (ATIS) and accredited by the American National Standards Institute (ANSI). Copyright 2003 by Alliance for Telecommunications Industry
6、Solutions All rights reserved. No part of this publication may be reproduced in any form, in an electronic retrieval system or otherwise, without the prior written permission of the publisher. For information contact ATIS at 202.628.6380. ATIS is online at . Printed in the United States of America.
7、T1.TRQ.11-2004 Technical Requirement on Recording Outages in Packet Network Elements Secretariat Alliance for Telecommunications Industry Solutions Approved February 2004 Abstract Currently, outage alarm messages from packet network elements (e.g., IP routers) are generated in large numbers. They ar
8、e not automatically translated into meaningful inferences with the cause, impact, and duration of the underlying failure without either human or external operational support systems. This Technical Requirements Document provides a set of requirements to accurately capture critical details related to
9、 such element outages and to create the necessary framework for automated delivery of outage reports. The goal is to fully capture these critical requirements for all packet network technologies such as IP, ATM, and Frame Relay and enable service providers to swiftly initiate remedial solutions in r
10、eal time as well as accurately develop the necessary reliability programs for these elements. T1.TRQ.11-2004 Foreword This Technical Requirements (TRQ) document provides a set of outage recording requirements for evolving packet network technologies such as IP, ATM, and Frame Relay. This TRQ is inte
11、nded for providers of packet-based telecommunications networks and services, telecommunications equipment suppliers, and government agencies responsible for addressing emergency situations. Suggestions for enhancement of this report are welcome. These should be sent to the Alliance for Telecommunica
12、tions Industry Solutions, Suite 500, 1200 G Street N.W., Washington, D.C. 20005. Working Group T1A1.2 on Network Survivability Performance, which developed this report, has the following officers and participants: O. Avellaneda, T1A1.2 Chair S. Makris, T1A1.2 Vice-Chair F. Kaudel, T1A1.2 Chief Edito
13、r R. Holley, J. Huang, A. McCain, P. Tarapore; T1A1.2 Technical Editors Active Participants: O. Avellaneda J. Bennett J. Colombo C. Dvorak R. Holley J. Huang P. Kimbrough Y. Kogan J. Lankford S. Makris A. McCain A. Nguyen R. Paterson P. Tarapore A. Thiessen K. Trahan A. Webster R. Wohlert ii T1.TRQ.
14、11-2004 Table of Contents EXECUTIVE SUMMARY 1 1 PURPOSE, SCOPE, APPLICATION, AND OUTLINE 1 1.1 PURPOSE . 1 1.2 SCOPE. 1 1.3 APPLICATION 2 1.4 OUTLINE 2 2 INTRODUCTION. 2 3 RELATED WORK . 2 4 OUTAGE RECORDING REQUIREMENTS 3 5 MEASUREMENT FRAMEWORK . 4 6 DATA DEFINITION AND COLLECTION 5 6.1 DATA DEFIN
15、ITION 6 6.2 DATA STORAGE 8 6.3 NOTIFICATION AND DATA COLLECTION 8 7 MANAGEMENT OF THE OUTAGE MEASUREMENT PROCESS. 8 7.1 CONFIGURATION AND EXECUTION . 9 7.2 USAGE INFORMATION 9 8 CONCLUSION 9 9 BIBLIOGRAPHY. 9 10 DEFINITIONS . 10 11 ABBREVIATIONS AND ACRONYMS 11 A USAGE EXAMPLES 12 A.1 APPLICATION EX
16、AMPLE 1: USING AOT TO CALCULATE COMPONENT AVAILABILITY OVER A PERIOD OF TIME. . 12 A.2 APPLICATION EXAMPLE 2: USING AOT TO CALCULATE COMPONENT AVAILABILITY OVER A PERIOD OF TIME. . 13 A.3 APPLICATION EXAMPLE 3: USING AOT AND NAO TO DETERMINE COMPONENT MTTR. . 14 A.4 APPLICATION EXAMPLE 4: USING AOT
17、AND NAO TO DETERMINE COMPONENT MTBF AND MTTF. 14 A.5 APPLICATION EXAMPLE 5: USING AN EVENT RECORD AS A DIAGNOSTIC TOOL 14 A.6 APPLICATION EXAMPLE 6: USING AN EVENT RECORD TO COLLECT ELEMENT RELIABILITY DATA. 15 B SCHEMA FOR EVENT RECORD AND OBJECT DATA RECORD. 16 Table of Figures FIGURE 1 - OUTAGE M
18、EASUREMENT FRAMEWORK . 5 FIGURE 2 - OUTAGE CHARACTERIZATION 7 FIGURE 3 - EXAMPLE OF UP AND DOWN STATES FOR DEFECT THRESHOLD BASED ON CPU UTILIZATION. 12 FIGURE 4 - AVAILABILITY CALCULATION USING AOT . 13 FIGURE 5 - DERIVING MTTR FROM RAW OUTAGE DATA. 14 FIGURE 6 - DERIVING MTBF AND MTTF FROM RAW OUT
19、AGE DATA 14 Table of Tables TABLE 1 - EVENT RECORD. 16 TABLE 2 - OBJECT DATA 16 iii Technical Requirement on T1.TRQ.11-2004 Requirements for Recording Outages in Packet Network Elements Executive Summary This Technical Requirements Document provides a set of outage measurement requirements for packe
20、t network technologies such as IP, ATM, and Frame Relay. It is intended as a guide for service providers in operating packet networks and equipment suppliers in designing or manufacturing packet network elements. 1 Purpose, Scope, Application, and Outline 1.1 Purpose This Technical Requirement (TRQ)
21、 is intended to provide the industry with an initial set of requirements to enable the automated generation and presentation of outage data from packet network technologies. This TRQ will enable equipment vendors and network management system vendors to provide outage information on line with a grea
22、ter degree of accuracy and reliability. The ultimate goal is to move away from a manual process that requires human intervention to a more efficient process that can be automated for network operators and services. 1.2 Scope Outage identification in todays packet networks is based on alarm messages
23、generated by network elements (e.g., cards, facilities, etc.). Significant- and costly-correlation and trouble-shooting by a human operator is usually required to examine all alarm messages and filter out false alarms, identify the failed elements, recommend remedial solutions (if detected in time),
24、 and determine reliability impacts. This document focuses on the need for developing the necessary requirements for the delivery of automated and detailed outage reports for physical elements in packet networks. Such requirements1should explicitly specify critical data such as the identification of
25、the failed element and the start and end time of the outage. The objective is to enable service providers to obtain complete reports on element outages instead of sifting through thousands of alarm messages, most of which do not require immediate attention. The report generation requirements call fo
26、r real-time delivery as well as periodic or polled report generation. These reports can be used for swift remediation on the onset of an outage (real-time report use) as well as in the management of life cycle reliability programs for physical elements. This document focuses on reliability data from
27、 physical element outages, external conditions impacting a physical element, and element utilization overloads. Specific guidance for setting the necessary measurement outage thresholds is beyond the scope of this document. Service outages are also beyond the scope of this document. It is also recog
28、nized that outages can result from aspects other than physical elements such as protocols, databases, etc. Outage recording requirements for such aspects will be pursued in future TRQs. 1It is recognized that the “requirements” necessary for achieving the goals of this document can eventually be for
29、malized by an industry standard. 1 T1.TRQ.11-2004 1.3 Application This document examines the necessary requirements for the automated recording of outages from physical elements in packet networks. The goal of these requirements is to promote the recording of these outages beyond the normal creation
30、 of existing alarm messages that are generated in large numbers and difficult to analyze. These outage reports will convey the start of an outage of a physical element in real time and follow up with additional summaries of the outage when the outage ends. 1.4 Outline Clause 1 provides the necessary
31、 Purpose, Scope, Application and Outline. The Introduction is presented in Clause 2. Related work is briefly summarized in Clause 3. Outage recording requirements are detailed in Clause 4. Clause 5 describes the outage measurement methodology. Data Definition and Collection is presented in Clause 6.
32、 Configuration is described in Clause 7. Clause 8 contains the conclusion and recommendations. References, definitions, and a list of abbreviations and acronyms are presented in Clauses 9, 10, and 11 respectively. Usage examples are presented in Annex A. Recording content schema for the defined outp
33、uts are defined in Annex B. 2 Introduction Currently, outage impacts in packet networks are very difficult to characterize from a service providers perspective. For instance, existing router alarm messages are not automatically translated into meaningful inferences with the cause, impact, and durati
34、on of the underlying outage without either human or external operational support systems. Specifically, alarm messages are generated in large numbers and they tend to be very brief and lacking in clarity. This results in a great deal of variation in how outage measurements are collected and interpre
35、ted by equipment operators resulting in a substantial cost burden. It also prevents service providers to swiftly identify the start of an outage from a network element and take remedial measures in real time. At the same time, packet networks are carrying increasing volumes of time and fault sensiti
36、ve traffic. IP networks in particular are carrying more and more non-traditional traffic such as Voice, Video, and Data over packet-based technologies such as Frame Relay and ATM services. These types of traffic are more sensitive to short element outages (e.g., card reset) than traditional best eff
37、ort traffic. The first step in improving the reliability of packet networks in such an environment is to identify the start of an element outage in real time so that remediation procedures can commence, and, upon successful outage resolution, carefully understand the cause, duration, and impact of t
38、he outage. This leads to the critical need for automating the recording of network element outages. 3 Related Work This TRQ addresses a new area of work. This clause lists some of the relevant industry work that is complementary to the concepts of this document. They are related in the fact that eac
39、h listed work is attempting to further the monitoring of network performance and reliability, which in many cases can be related to certain types of outage conditions. 3.1 ITU-T SG 13 This working group has recommended metrics for IP service in recommendations Y.1540 1 and Y.1541 2. This work descri
40、bes measurements that can be conducted to provide a view of source-destination (e.g., path) outages for a service environment. 2T1.TRQ.11-2004 3.2 IETF IPPM Workgroup This working group is developing an extensive set of test protocols for monitoring the IP service and connectivity 34. This work exte
41、nds the work of the ITU-T SG 13 by providing a wide range of IP performance and connectivity related metrics and measurements. However, little work has been done to address automation of outage measurement as considered in this report. 3.3 Cross-Industry Working Team The objective of this work was t
42、o promulgate a common set of metrics and a common measurement methodology that can be used to assess, monitor, negotiate, and test compliance of service quality for ISPs and their customers. The goal was to apply the metrics and methodology to improve Internet performance and foster greater cooperat
43、ion between customers and service providers. A peer-to-peer measurement methodology was proposed for service-level performance and outage monitoring 5. 3.4 Telcordia Technologies Generic Requirements There has been substantial work to define network reliability and quality requirements by Telcordia
44、Technologies, published in industry generic requirements. Prominent examples include: GR-929, Reliability and Quality Measurements for Telecommunications Systems 6. This document is an effort to define metrics of network reliability for widely used classes of elements in telecommunications. The metr
45、ics are defined along with performance objectives, and detailed instructions for collecting data and presenting results. GR-512, LSSGR: Reliability Section 12 7. This document defines reliability requirements for voice circuit switching equipment. This project was the first to define many aspects of
46、 reliability for voice switching, including the concept of event threshold, hardware reliability modeling, and field reliability performance. 3.5 Quality Excellence for Suppliers of Telecommunications (QuEST) Forum The QuEST Forum develops the telecommunications industry variant of ISO 9000, TL 9000
47、. TL builds upon the foundation of ISO with many added requirements to address the special needs of the telecommunications industry. TL 9000, Quality Management System Requirements Handbook Release 3.0 8. This document lists the requirements of a quality system for a supplier of telecommunications e
48、quipment. TL 9000, Quality Management System Measurements Handbook Release 3.5 9. This document defines the metrics that are to be collected by suppliers of telecommunications equipment for further analysis. Many of the measurements that are required to be reported for packet-based elements are extr
49、emely difficult to collect with the tools available today, but will be collectible using the processes described in this document. 3.6 ITU-T SG 4 This working group is the lead for the telecommunication management network (TMN) framework, and the management of telecommunication services, networks, and equipment using the TMN framework. Additionally responsible for other telecommunication management studies relating to designations, transport-related operations procedures, and test and measurement technique