1、 AMERICAN NATIONAL STANDARD FOR TELECOMMUNICATIONS ATIS-0100037.2013 IMPACT WEIGHTED MTBF A METRIC FOR ASSESSING RELIABILITY OF HIERARCHICAL SYSTEMS As a leading technology and solutions development organization, ATIS brings together the top global ICT companies to advance the industrys most-pressin
2、g business priorities. Through ATIS committees and forums, nearly 200 companies address cloud services, device solutions, emergency services, M2M communications, cyber security, ehealth, network evolution, quality of service, billing support, operations, and more. These priorities follow a fast-trac
3、k development lifecycle from design and innovation through solutions that include standards, specifications, requirements, business use cases, software toolkits, and interoperability testing. ATIS is accredited by the American National Standards Institute (ANSI). ATIS is the North American Organizat
4、ional Partner for the 3rd Generation Partnership Project (3GPP), a founding Partner of oneM2M, a member and major U.S. contributor to the International Telecommunication Union (ITU) Radio and Telecommunications sectors, and a member of the Inter-American Telecommunication Commission (CITEL). For mor
5、e information, visit . AMERICAN NATIONAL STANDARD Approval of an American National Standard requires review by ANSI that the requirements for due process, consensus, and other criteria for approval have been met by the standards developer. Consensus is established when, in the judgment of the ANSI B
6、oard of Standards Review, substantial agreement has been reached by directly and materially affected interests. Substantial agreement means much more than a simple majority, but not necessarily unanimity. Consensus requires that all views and objections be considered, and that a concerted effort be
7、made towards their resolution. The use of American National Standards is completely voluntary; their existence does not in any respect preclude anyone, whether he has approved the standards or not, from manufacturing, marketing, purchasing, or using products, processes, or procedures not conforming
8、to the standards. The American National Standards Institute does not develop standards and will in no circumstances give an interpretation of any American National Standard. Moreover, no person shall have the right or authority to issue an interpretation of an American National Standard in the name
9、of the American National Standards Institute. Requests for interpretations should be addressed to the secretariat or sponsor whose name appears on the title page of this standard. CAUTION NOTICE: This American National Standard may be revised or withdrawn at any time. The procedures of the American
10、National Standards Institute require that action be taken periodically to reaffirm, revise, or withdraw this standard. Purchasers of American National Standards may receive current information on all standards by calling or writing the American National Standards Institute. Notice of Disclaimer Leve
11、l 2: Mobile Telephone SwitchOffice (MTSO) Aggregation Pair; Level 3: Provider Edge (PE) PairRedundancy or silent failure in Aggregation pair affects K eNBs connected to it.Redundancy or silent failure in PE pair affects N eNBs; NK.Level 1 Level 2 Level 34G LTE Network SegmentFigure 4 - 4G LTE Segmen
12、t Architecture 7 Definition of Impact Weighted MTBF Consider a system that consists of three subsystems with increasing hierarchical levels 1, 2, 3i . In general, all subsystems are redundant. Only traffic-impacting failures which occur due to the limitations of the adopted redundancy are counted. L
13、et N be the number of elements in subsystem 1. The failure impact of an element in subsystem 1 is: 0 If the redundancy protects against that failure. 1 Otherwise. However, the failure impact of the highest-level subsystem is N because it impacts all elements in subsystem 1. The impact of a failure o
14、f a subsystem at level 2 is the number of elements K in subsystem 1 that are connected to it. For subsystem 1, 2, 3i , the mean time until the first customer impacting failure starting at a state where all active and redundant elements of subsystem null are “up”, is referred to as level null uptime
15、nullnull. Then the IW-MTBF metric is defined as nullnullnullnullnullnull_nullnullnullnull null nullnullnullnullnullnull nullnull nullnull nullnullnullnullnull(1) ATIS-0100037.2013 9 Note that level-2 subsystem may consist of several components like in the case of single chassis router (two component
16、s a pair of RPs and SW) or RNC (three components three pairs of IPs, MPs, and fabrics). For component 1, 2, ,jm, the mean time until the first customer impacting failure at a state where all active and redundant elements of component null are “Up”, is referred to as component null Uptime nullnullnul
17、l . Let the failure of each component impact the same number of elements K in level-1 subsystem. Then level-2 Uptime is calculated as nullnullnull nullnullnullnullnullnullnullnullnullnullnullnullnullnullnull2null Level-3 uptime is similarly calculated in case of several components and failure of eac
18、h component impacts the same number of elements N in level-1 subsystem. It is not difficult to generalize this example to a hierarchy with a greater number of levels. 8 Impact Weighted MTBF Practical Examples This clause illustrates the benefits of the IW-MTBF metric for hierarchical systems conside
19、red in clause 6 through numerical example using for calculation of level-2 Uptime and IF-MTBF equations (2) and (1), respectively. The time is measured in hours. 8.1 Single Chassis Router Consider a router in Figure 1 with 10K line cards carrying customer traffic. Table 1 provides reduction in IW-MT
20、BF in comparison with line-card (LC) MTBF due to failures of level-2 subsystem consisting of two redundant components: RP and SF. Level-2 uptime is calculated in Table 2 using given uptimes for RP and SF. The expectation is that failures of RP and SF components should have minimal impact on line car
21、ds. In our example, it is indeed the case when LC-MTBF is low (50,000 hours). However for LC-MTBF=150,000, the reduction of 25% is quite large. The only way to have a smaller reduction is to increase the uptime for RP and SF components. Table 1 - IW-MTBF Reduction in Comparison with LC-MTBF Reductio
22、n % 10% 18% 25% IW-MTBF 44,944 81,633 112,150 LC-MTBF 50,000 100,000 150,000 Table 2 - Uptime for RP, SF, and Level 2 Component RP SF Level 2 Uptime 10,000,000 8,000,000 4,444,444 Consider a set of routers with architecture given in Figure 1 where RP and SW subsystems and the number K of line cards
23、are the same for all routers. Then MTBO measured in production for this set of routers would be a field estimate of IW-MTBF where the latter was calculated based on expected uptime of RP and SW components. ATIS-0100037.2013 10 8.2 Multi-chassis Router Consider a router with 6L line-card chassis (LCC
24、), with 15 line cards per each LCC (16)K and the total number of line cards 615 90.N Each LCC has two RPs and six LCCs are interconnected by fabric. Table 3 provides reduction in IW-MTBF in comparison with line-card LC-MTBF=300,000 hours due to failures of level-2 RP-subsystem and level-3 fabric sub
25、system. The IW-MTBF and respective reduction are calculated for RP-Uptime=100,000,000 hours. Note that FCC-Uptime has major impact on Reduction. Reduction decreases from 98% for FCC-Uptime=500,000 hours to 21% for FCC-Uptime=125,000,000 hours. However, even 21% Reduction is not low enough that empha
26、sizes the importance of significantly high FCC-Uptime in multi-chassis router. Table 3 - G-Uptime Reduction in Comparison with LC-MTBF Router A B C Reduction % 21% 37% 98% IW-MTBF 237,906 189,274 5,450 FCC-Uptime 125,000,000 50,000,000 500,000 Consider a set of multi-chassis routers with architectur
27、e given in Figure 2 where RP and FCC subsystems as well as the number of chassis L and the number N of line cards are the same for all routers. Then MTBO measured in production for this set of routers would be a field estimate of IW-MTBF where the latter was calculated based on expected uptime of RP
28、 and FCC subsystems. Routers may have the same line cards and RPs in single-chassis and multi-chassis configurations. However, their IW-MTBF values will be generally different for the following two reasons. First, the FCC is more complex than SF in single-chassis routers that may result in lower FCC
29、-Uptime. Second, the impact of FCC failure in multi-chassis routers is generally larger than the impact of SW failure in single-chassis routers. Therefore, the MTBO calculation for the combined set of single- and multi-chassis routers would likely underestimate the IW-MTBF for single chassis routers
30、 and overestimate the IW-MTBF for multi-chassis routers. In addition, the IP backbone may have a hierarchical design with largest multi-chassis routers at the top of the hierarchy. In such a case, actual failure impact depends on the router place in the hierarchy. Hence, the existing formula for MTB
31、O calculation based on the number of affected line cards can be applied only to the set of routers at the same hierarchical level. 8.3 Radio Network Controller Consider an RNC with 8K call processors (CPs). Table 4 provides reduction in IW-MTBF in comparison with CP-subsystem uptime (CP-Uptime) due
32、to failures of level-2 subsystem caused by redundancy failures in MP, IP or SF component in level 2. Uptimes for MP, IP, and SF along with Level 2 uptime calculated using (2) are provide in Table 5. The Reduction is quite large (in the range 47% - 57%) that indicates that the uptime for level 2 comp
33、onents is not large enough. Table 4 - IW-MTBF Reduction in Comparison with CP-Uptime Reduction % 47% 53% 57% IW-MTBF 423,529 473,684 514,286 CP-Uptime 800,000 1,000,000 1,200,000 ATIS-0100037.2013 11 Table 5 - Uptime for MP, IP, SF, and L2-Uptime Component MP IP SF Level 2 Uptime 20,000,000 18,000,0
34、00 30,000,000 7,200,000 Consider a set of RNCs with architecture given in Figure 2 and the same number of CPs. Then MTBO measured in production using one CP as a unit of failure impact would be a field estimate of IW-MTBF where the latter was calculated based on expected uptime of IP, MP and fabric
35、components. The impact of RNC failure can be measured more granularly by the number of affected base stations (nodeBs). Then IW-MTBF and its MTBO estimate in production must be compared with nodeB-Uptime. 8.4 Access Segment of LTE Network Consider a network segment in Figure 4 with parameter values
36、90N and 15.K Table 6 provides reduction in IW-MTBF in comparison with eNB uptime of 10,000 hours due to failures of Aggregation routers (level-2 subsystem) and PE routers (level-3 subsystem). As expected, the reduction increases as Aggregation-Uptime and PE-Uptime decreases. Table 6 - G-Uptime Reduc
37、tion in Comparison with eNB-Uptime Reduction % 14% 25% 40% G-Uptime 8,562 7,485 5,981 Aggregation-Uptime 2,000,000 1,000,000 500,000 PE-Uptime 10,000,000 5,000,000 2,500,000 Consider an access segment of LTE (4G) network with interconnection architecture shown in Figure 4 where routers are identical
38、 in terms of their architecture and vendor inside each of two sets of Aggregation and PE routers. In general, there could be two different vendors for Aggregation and PE routers respectively. By selecting an eNB (evolved node B) as an impact unit for failure in Aggregation and PE levels, the MTBO me
39、asured in a production network with architecture in Figure 4 will be an estimate of IW-MTBF. 9 Concluding Remarks Examples in clauses 8.1 through 8.4 demonstrate a fairly broad application of the new metric IW-MTBF that incorporates the hierarchical structure of network elements and segments. The ap
40、plication of IW-MTBF requires knowledge of the Uptime for redundant subsystems at the upper hierarchical levels. For sufficiently large coverage factor exceeding 99%, the Uptime of a redundant system is very close to mean time between silent failures (MTBSF). Thus the knowledge of MTBSF is critical for application of IW-MBTF. For a brand new system, MTBSF can be provided only by its vendor similar to predicted MTBF for systems component. MTBSF can be also measured in production but the observation interval must be fairly large e.g., 18-24 months as silent failures are expected to be rare.