1、 TECHNICAL REPORT ATIS-0100025 A METHODOLOGY FOR ESTIMATING AVAILABILITY OF ACCESS IP ROUTERS IN TERMS OF CUSTOMER FACING LINE CARD AVAILABILITY ATIS is the leading technical planning and standards development organization committed to the rapid development of global, market-driven standards for the
2、 information, entertainment and communications industry. More than 250 companies actively formulate standards in ATIS 18 Committees, covering issues including: IPTV, Service Oriented Networks, Energy Efficiency, IP-Based and Wireless Technologies, Quality of Service, and Billing and Operational Supp
3、ort. In addition, numerous Incubators, Focus and Exploratory Groups address emerging industry priorities including “Green”, IP Downloadable Security, Next Generation Carrier Interconnect, IPv6 and Convergence. ATIS is the North American Organizational Partner for the 3rd Generation Partnership Proje
4、ct (3GPP), a member and major U.S. contributor to the International Telecommunication Union (ITU) Radio and Telecommunications Sectors, and a member of the Inter-American Telecommunication Commission (CITEL). For more information, please visit . Notice of Disclaimer hence, availability metric estima
5、tion for network elements plays a critical role in SLA interactions with equipment vendors and suppliers. IP-based networks and related evolving technologies - such as Multi-Protocol Label Switching (MPLS) - are expected to form the basis for Next Generation Networks (NGN) and Services. The specific
6、s of SLA interactions between service providers and IP equipment vendors are driven by the following: Degree of reliability and availability of individual network elements (e.g., Line Cards, Routers, etc.). Degree of redundancy built into the network design (e.g., redundant line cards). Thus, the de
7、velopment of an appropriate availability metric, and subsequent techniques for metric estimation, can be very beneficial to service providers. An Technical Report, T1.TR.78-2003, proposed a metric for assessing the access availability of routers in IP-based networks by characterizing fractional avai
8、lability of access routers in terms of lost ports that can be further weighted by some factor (e.g., port bandwidth). This Technical Report proposes a methodology for estimating the availability of IP-based access routers in terms of customer facing router line card availability. This is based on wh
9、at a customer experiences during network failure occurrences. If such failures result in downtime for the customer facing Line Card, then the Line Card is considered to be unavailable regardless of the actual failure. The purpose is to stimulate interactions between service providers, equipment vend
10、ors, and suppliers in the development of appropriate reliability/availability SLAs. It should be noted that this methodology can also be applied to other types of packet network technologies. For example, it can be utilized to assess the access availability of line cards in Frame Relay switches. 2 N
11、ORMATIVE REFERENCES The following standards contain provisions which, through reference in this text, constitute provisions of this Technical Report. At the time of publication, the editions indicated were valid. All standards are ATIS-0100025 2 subject to revision, and parties to agreements based o
12、n this Technical Report are encouraged to investigate the possibility of applying the most recent editions of the standards indicated below. Y.1540 ITU-T Recommendation Y.1540 (2007), IP Packet Transfer and Availability Performance Parameters.1ATIS-0100020.2008 ATIS-A010020.2008, Quantifying the Imp
13、act on IP Service Availability from Network Element Outages.2T1.TR.78-2003 T1.TR.78-2003, Access Availability of Routers in IP Networks.2ATIS-0100008.2007 ATIS-0100008.2007, DPM Metric for Transaction Services such as VoIP.23 DEFINITIONS 3.1 Short Duration Outage: For the purposes of this document,
14、a Line Card outage of duration less than one minute is considered to be a short duration outage. 4 ACRONYMS hence, availability metric estimation for network elements plays a critical role in interactions with equipment vendors and suppliers. An ATIS Technical Report, T1.TR.78-2003, proposed a metri
15、c for assessing the access availability of routers in IP-based networks This metric expresses average access router availability in terms of customer facing router ports, weighted by some factor (e.g., port bandwidth or number of customers), which are lost due to an element outage over a specified p
16、eriod of time. The use of customer facing ports as a unit enables the determination of fractional loss of any given router. The current and future ability of operational systems that can effectively track the status of customer facing ports, further weighted by some factor (e.g., port bandwidth), is
17、 a critical question. The advantage of such systems is that the granularity offered in tracking ports is significant; accurate 1This document is available from the International Telecommunications Union. 2This document is available from the Alliance for Telecommunications Industry Solutions (ATIS),
18、1200 G Street N.W., Suite 500, Washington, DC 20005. ATIS-0100025 3 estimation of router availability is greatly enhanced by such systems. On the other hand, there are considerable challenges in the use of customer ports to assess access availability: In a large IP backbone network, the number of cu
19、stomer facing ports could be excessively large, requiring significant development in tracking systems with additional downstream analytical capabilities. Port weighting can be done by some factor that is arbitrarily chosen. For example, port bandwidth may be convenient in some cases, whereas the num
20、ber of customers may be more appropriate in others. Regardless of the choice of factors, additional strains on operational complexity are introduced as a result. A service provider may therefore find it advantageous to consider router Line Cards as a unit for estimating the access availability metri
21、c. While Line Card use for metric estimation offers a lesser degree of granularity, it does offer the following advantages over customer ports: A very common cause for customer port failure is failure of the Line Card itself. And subsequently, port failure is typically resolved by replacing the enti
22、re Line Card. Tracking the status of router Line Cards offers a simplified approach towards development of operational systems having acceptable levels of complexity. The number of Line Cards is significantly less than the number of ports for tracking purposes. Further, Line Cards need not be weight
23、ed by some arbitrary factor. This document describes a system for tracking router Line Card status to estimate this metric. However, this method can be applied to ports as well. The choice and expense of developing the proper estimation process, whether utilizing Line Cards or ports, is left up to i
24、ndividual service providers. 6 IP NETWORK ELEMENTS This clause describes a fairly general IP Access Network. This network comprises all elements responsible for delivering transactions from the Customer Premise Equipment (CPE) at a customer location into the IP network backbone. One common component
25、 to all these elements is the Line Card that houses Customer Ports. These Line Cards are on the “drop side” of an Access Router, where facilities from a customers CPE terminate on individual ports on the Line Cards. A failure in any element in the Access Network may result in downtime for individual
26、 ports on the Line Cards or on the entire Line Card on the Access Router. Such failures prevent delivery of customer transactions to the backbone. ATIS-0100025 4 KEY AR: Access Router BR: Backbone Router LC: Customer Facing Line Card Figure 1 : Access Network Elements There are five element types in
27、 a typical IP Access Network topology (Figure 1) whose failure can cause downtime for Line Cards3directly or indirectly. Elements whose failures directly impact downtime are: 1. Line Card on the customer facing side. Any failure in the electronic or optical components of the Line Card that causes tr
28、affic interruption will result in Line Card downtime. 2. Access Routers that form an edge on an Internet Service Provider (ISP) backbone network. Line Card downtime can be caused by a failure in a router component or from a total router failure. Line Card availability estimation can then be done (se
29、e Clause 7) based on these failures for each combination of Router Class, Line Card Type. A Router Class is a set of identical access routers from a single vendor. The use of such sets can enable metric estimation for different router vendors. For example, if a network has routers from two separate
30、vendors and each vendor produces two unique 3Only transport layer failures that directly impact customer facing Line Cards are considered for this document as shown in Figure 1 (access and backbone routers, their components, and facilities linking them). Failures of non-transport layer elements (e.g
31、., service/application layer elements) are not considered. ATIS-0100025 5 types of routers, then the total number of access routers in the network can be grouped into four Router Classes one for each (vendor, router type) combination. The metric can then be estimated for Line Card type within any gi
32、ven Router Class. Network elements whose failures may indirectly impact Line Card downtime are: 3. Facilities and supporting elements such as cross-connects, which link Access Routers to Backbone Routers. To increase the availability of the Access Network, an ISP usually provides redundancy by conne
33、cting each Access Router to two Backbone Routers at the same access node using two independent sets of uplinks4(Figure 1 depicts a typical access node with several Access Routers and two Backbone Routers). This permits customer traffic to enter the backbone in the following failure scenarios: A fail
34、ed uplink. A failed card supporting an uplink. A failed Backbone Router at the access node. If all facilities linking an access router to a backbone router fail, then all Line Cards at the access router will experience downtime. 4. Backbone Routers linked to Access Routers. As shown in Figure 1, if
35、both Backbone Routers at an access node fail (a rare event), then all Line Cards on the Access Routers at this node lose connection to the backbone. 5. Facilities linking Backbone Routers at an access node, to backbone routers at other backbone nodes. Such facility failures decrease the available ba
36、ndwidth from Access Routers to the backbone. Note that if all Backbone Router uplinks at an access node fail (a rare event), then all Line Cards on the Access Routers at this node lose connection to the backbone. Impacts on Line Cards from such failures are extremely rare, as service providers typic
37、ally have redundancy in the backbone (all elements that may indirectly cause Line Card downtime). Full redundancy in terms of facilities dual homing the access routers to pairs of backbone routers are intended to serve this purpose. In summary, the access Line Card acts as a common denominator for a
38、ll of the above failure types. Any one of these failures results in downtime for the impacted Line Cards. The goal of this document is to describe how the Line Card unavailability contribution from these failure types attributed to vendor related defects can be estimated. 7 AVAILABILITY METRIC FOR R
39、OUTER LINE CARDS Increasing variety of access routers and access line cards justifies an approach where average availability is evaluated separately for each Router Class, Line Card Type combination (see Clause 6). Consider a set of access routers of the same class with J types of access line cards
40、which are monitored for failures during time interval of length .T For each customer impacting failure ,1,ii L=K, the number ijn of type j cards affected and its durationijt is recorded. In case of redundancy, the failure of the active (primary) line card is not counted if the failover to the backup
41、 card was hitless. Otherwise, 4An uplink is a facility (e.g., DS3, OC-3, OC-48) connecting any access router to a backbone router. ATIS-0100025 6 only failures of active cards are counted. The average unavailability of type j access line card is calculated as: 1Lij ijijjntUNT=where jN is the total n
42、umber of type j active cards. The average unavailability can be expressed as: jjjRUM=(1)Where: 11Lij ijij LijintRn=(2)is the average repair time, and: 1jj LijiNTMn=(3)can be interpreted as the average time between router failures impacting customers on access line cards of type j . Metric jM can be
43、considered as an extension of the traditional field hardware MTBF. For the field MTBF, only individual line card failures which require card replacement are counted in the denominator. In jM,all card failures outside the maintenance window including those caused by reset, software bugs, and all impa
44、cted cards of type j in case of entire router failure are counted. This distinction is important since the metric needs to accurately represent the impact on customers resulting from all types of card failures, not just total card failures. For example, each time that a line card is reset, this caus
45、es a protocol re-convergence event resulting in short duration packet loss. Metrics ,R M, and U can be also defined for the entire population of access line cards without differentiating failure by LC type. Denote: 1111, , . JJLJLjijijjjijNNn nt t=ATIS-0100025 7 Then: , tNTRMnn= (4) and the average
46、unavailability: .RUM= (5) The value of using jM in addition to the average unavailability is demonstrated by the following example. Example: Consider a set of 400 access routers of a given class (from a single vendor) and let 1000T = hours. Each router has two cards of type 1, three cards of type 2,
47、 and five cards of type 3. In case of single card failures, 1ijn = if Line Card of type j failed and 0ijn = otherwise. In the case of entire router failure, 123( , , ) (2,3,5).iiinnn = In this example, assume constant failure duration ij jtt= of type j cards and a constant duration of the entire rou
48、ter failure. The failure duration is measured in hours. The number of failures for the entire router and each card type with their duration is given in Table 1. The failure parameters in Table are referred to as Scenario 1. We also consider Scenario 2, where the only difference with Scenario 1 is th
49、e increase in the number of routers failures from 1 to 5. Table 1: Scenario 1 - Number of failures and their duration Failure # Failures Duration Router 1 0.1 LC Type 1 30 0.8 LC Type 2 6 1.5 LC Type 3 2 0.5 The reliability metrics for two scenarios are given in Table 2. The results in columns R and M for Line Card Type ,1,2,3jj= and for All Cards are calculated using (2), (3) and (4) respectively. The unavailability for LC Type ,1,2,3jj= and for All Cards is calculated using (1) and (5) respectively. The Defects per Million (