1、2010 ASHRAE 19ABSTRACTThis paper presents a top-level energy and environmentaldashboard for data center monitoring. It consists of fourgauges: one for infrastructure energy efficiency, two for IT-equipment intake air temperature compliance, and one for airmanagement effectiveness. The color-coded di
2、als indicategood, acceptable, and poor operation. The overall goal is tomove all four analog needles towards the 12 oclock positionrepresenting ideal operating conditions. In addition, the oper-ator can select both the averaging period and the samplingfrequency for the readings. A glance at this das
3、hboard providesinstant visual information on the operational status of the datacenter. This strikingly simple presentation is made possible byutilizing selected non-dimensional performance metrics tointelligently summarize a large amount of data and avoid oper-ator fatigue. The dashboard also has a
4、warning icon for eachgauge for out-of-bound data as well as access to detailed datawhen needed.INTRODUCTIONHave you ever asked yourself why the automobiles dash-board looks the way it does? The four top-level gauges aregenerally the speedometer, the fuel gauge, the engine-temper-ature gauge, and the
5、 clock. They provide the most importantinformation a driver needs to know to stay out of trouble andkeep moving. A second tier of warning icons are hidden fromview until something goes wrong with the vehicle (e.g., checkengine icon) or the behavior of the passengers (e.g., fastenseatbelt icon). This
6、 hierarchy is a time tested way of arrangingthe information provided to the driver.A data center facility should provide an adequate thermalIT-equipment environment while minimizing infrastructureenergy usage. Curbing energy consumption in energy inten-sive data centers is important for economic rea
7、sons and ensur-ing a satisfactory operating thermal environment is importantfor protecting IT-equipment from failure.However, there is a perceived conflict between these twoimportant goals. The most scrutinized link is “air manage-ment,” which essentially is about keeping cold and hot air frommixing
8、. Cold supply air from the air handler should enter theheat-generating IT-equipment without mixing with ambientair and the hot exhaust air should return to the air handler with-out mixing. Managing the cold and hot air streams in datacenters is important both for infrastructure energy manage-ment an
9、d IT-equipment thermal management.Air management has great potential to make data centersmore energy efficient. Correctly implemented air managementalso has great potential to improve the thermal IT-equipmentconditions. The information required to implement effectiveair management can be obtained by
10、 monitoring only three dataentities: Infrastructure energy efficiency, thermal IT-equipmentconditions, and air management effectiveness. A number ofuseful metrics have been developed over the past years. Thispaper lays out analogous rationale to the automobile dash-board as well as the justification
11、 for selecting three of thosemetrics for the proposed top-level energy and environmentaldashboard for data center monitoring.DATA CENTER MONITORINGThe old adage “you cant manage what you dontmeasure” was never truer than it is for data centers. As datacenter operators take action to improve the ener
12、gy efficiencyof their data centers, they need data to give them visibility intothis dynamic environment. Without these data, they are unableTop-Level Energy and Environmental Dashboard for Data Center MonitoringMagnus K. Herrlin, PhD Craig M. CompianoMember ASHRAE Associate Member ASHRAEMagnus K. He
13、rrlin is president at ANCIS Incorporated, San Francisco, CA. Craig M. Compiano is president at Modius Inc., San Francisco, CA.OR-10-003 2010, American Society of Heating, Refrigerating and Air-Conditioning Engineers, Inc. (www.ashrae.org). Published in ASHRAE Transactions 2010, Vol. 116, Part 1. For
14、 personal use only. Additional reproduction, distribution, or transmission in either print or digital form is not permitted without ASHRAEs prior written permission. 20 ASHRAE Transactionsto make informed decisions regarding data center optimiza-tion.Historically, data centers have been over-provisi
15、onedespecially when it comes to cooling. As data center operatorsimprove cooling efficiency, the safety margin for error createdby the over-provisioning is removed. With a greatly reducedresponse-time cushion, effective monitoring becomes amission critical application for data centers. Such monitori
16、ngsystems must provide near real-time data collection and analarm notification and escalation system. Monitoring systemsmust be robust, enterprise-wide systems that are capable ofproviding multi-site reporting and analysis.To fully realize the benefits of monitoring, these systemsmust provide an eff
17、ective way to access data, including:ReportsDashboardsTrendingStatistical analysiscovariance, regression, etc.Conditional alarmingnotifications based on complexconditional exceptionsAccess via standard Business Intelligence tools, includ-ing ExcelMonitoring systems are the basis for:Continuous impro
18、vement cyclesImproved operational effectivenessAvailabilitydowntime avoidanceCapacity planningUtility rebates.Typical capabilities of monitoring systems include:Ability to monitor and record granular data for data cen-ter devicesSupport for metrics such as DCiE, RCI, and RTISimple data accessODBC, r
19、eporting, dashboards,SQL, etc.Capable of running in a high availability environmentNear-real time accessAlarm escalation and managementMonitoring systems get their data either from instrumen-tation built into data center equipment or from separatesensors and meters. Most data center devices such as
20、PDUs,UPSs, CRACs, and intelligent power strips are capable oftransmitting data via communication protocols such asModbus, SNMP, BacNet, etc. A wide range of power meters,environmental sensors, pressure sensors, etc. are available inthe market. Sufficient instrumentation is a prerequisite toeffective
21、 monitoring.The level of instrumentation determines the level of detailwith which performance data can be measured. For example,the computation of basic DCiE simply requires a data point forthe total amount of power coming into the data center and adata point for the total amount of power going to t
22、he IT equip-ment. With additional instrumentation, however, a moredetailed DCiE can be computed.With more performance data collected, there is a need tointelligently summarize the information. Performance metricsplay a key role in making sense of data from comprehensiveand continuous monitoring of d
23、ata center devices. The nextsections discuss three such metrics.METRICSMetrics and the ability to monitor and track the perfor-mance of data centers are integral to successful operation. Oneof the most powerful features of metrics is the capability oftrending complex data over time. Generally speaki
24、ng, a“metric” is defined as a standard for measuring or evaluatingsomething. All three top-level metrics that were selected to beincluded in the proposed energy and environmental dashboardact in accordance with this definition.Data Center infrastructure Efficiency (DCiE) is a metricused to determine
25、 the energy efficiency of a data center. TheRack Cooling Index (RCI) is a measure of how well the IT-equipment is cooled within the manufacturers specifications.Since a thermal guideline becomes truly useful when there isan unbiased and objective way of determining the operatingcompliance with the g
26、uideline, the RCI index is included in theASHRAE Thermal Guideline (ASHRAE 2008) for purposesof showing compliance. Finally, the Return TemperatureIndex (RTI) is a measure of the performance of the air-management system.These metrics are individually used in the DOEs “DCPro” data center energy asses
27、sment software tool suite (DOE2009a) as well as in the Data Center Certified Energy Practi-tioner (DC-CEP) Program (DOE 2009b). They reduce a greatamount of data to understandable numbers that can easily betrended and analyzed. The rationale and definition of the threemetrics are presented next.DCiE
28、Data center infrastructure efficiency (DCiE) and thepower usage effectiveness (PUE) have become commonlyused metrics for data center efficiency. The PUE is essentiallythe reciprocal of DCiE. These metrics were developed bymembers of the Green Grid, which is an industry groupfocused on data center en
29、ergy efficiency. One benefit of usingthe DCiE rather than the PUE is that it has an easily understoodscale of 0-100% (Green Grid 2008).(1)Standard guidelines for the use and reporting of thesemetrics have been developed by the Green Grid. All DCiEmeasurements should be reported with subscripts that
30、identify(1) the accuracy of the measurements (2) the averaging periodof the measurements (e.g., yearly, monthly, weekly, daily), andDCiEIT-Equipment PowerTotal Facility Power- 100%= 2010, American Society of Heating, Refrigerating and Air-Conditioning Engineers, Inc. (www.ashrae.org). Published in A
31、SHRAE Transactions 2010, Vol. 116, Part 1. For personal use only. Additional reproduction, distribution, or transmission in either print or digital form is not permitted without ASHRAEs prior written permission. ASHRAE Transactions 21(3) the frequency of the measurement (e.g., monthly, weekly,daily,
32、 continuous). For the purpose of the proposed dashboard,the user can select both the averaging period and the frequency(limited by the actual measurement frequency).Table 1 shows ratings of the DCiE. A value of 100%simply indicates 100% efficiency, i.e., all energy is used by theIT-equipment (ideal)
33、. However, a typical value is only 50%(EPA 2007). State-of-the-Art installations have values around85% (Google 2009).The DCiE allows data center operators to quickly esti-mate the energy efficiency of their data centers and determinewhether any energy efficiency improvements need to be made.DCiE wil
34、l represent infrastructure energy efficiency on theproposed dashboard.RACK COOLING INDEX (RCI)The main task for a data center facility is to provide anadequate equipment environment, therefore a relevant metricfor IT-equipment intake temperatures should be used to gaugethe thermal environment. The R
35、ack Cooling Index (RCI) is ameasure of how effectively equipment racks are cooled withina given thermal guideline, both at the high end and at the lowend of the temperature range (Herrlin 2005). Specifically, theRCI is a performance metric explicitly designed to gaugecompliance with the thermal guid
36、elines of ASHRAE (2008)and NEBS (Telcordia 2001, 2006) for a given data center. Theindex is included in the ASHRAE thermal guideline forpurposes of showing compliance.Both guidelines use recommended and allowable ranges.The recommended intake temperature range is a statement ofreliability (facility
37、operation) whereas the allowable range isa statement of functionality (equipment testing). The numer-ical values of the recommended and allowable ranges dependon the applied environmental guideline. In the ASHRAE spec-ification, the recommended and allowable temperature rangesare 64.480.6F (1827C) a
38、nd 59.089.6F (1532C),respectively.Over-temperature conditions exist once one or moreintake temperatures exceed the maximum recommendedtemperature. Similarly, under-temperature conditions existwhen intake temperatures drop below the minimum recom-mended. The RCI “compresses” the equipment intake temp
39、er-atures into two numbersthe RCIHIand the RCILO. An RCIHIof 100% means no over-temperatures whereas an RCILOof100% mean no under-temperatures. Both numbers at 100%mean that all temperatures are within the recommendedtemperature rangei.e., absolute compliance. The lower thepercentage, the greater pr
40、obability (risk) intake temperaturesare above the maximum allowable and below the minimumallowable, respectively. A value below 90% is often charac-terized as “poor.”Figure 1 provides a graphical representation of the RCIHI(the RCILOis analogous). The bold curve is the intake temper-ature distributi
41、on for all N intakes; the temperatures have beenarranged in order of increasing temperature. The Total Over-Temperature represents a summation of all over-temperatures(triangular area). The Maximum Allowable Over-Tempera-ture is also defined in the figure (rectangular area). The defi-nition of RCIHI
42、is as follows:(2)Table 2 shows proposed rating of the RCI based onnumerous numerical analyses (Herrlin 2007). LawrenceBerkeley National Laboratory (LBNL) is also in the process ofbenchmarking this performance metric. The risk for temper-atures above (below) the maximum (minimum) allowabletemperature
43、 increases with declining values. A warning flag“*” appended to the index indicates that one or several intaketemperatures are above (below) the allowable range. Theindex value for the intake temperatures shown in Figure 1 isRCIHI= 95%*.Table 1. Rating of the DCiERating DCiEIdeal (maximum) 100State-
44、of-the-Art 85Best Practice 70Improved Operations 60Current Trend 55Typical (average) 50Table 2. Proposed Rating of the RCIProposed Rating RCIIdeal 100%Good 95% to 100%Net By-Pass Air 100%Figure 2 Preproduction dashboard (dial coloring exampleonlyuser defined). 2010, American Society of Heating, Refr
45、igerating and Air-Conditioning Engineers, Inc. (www.ashrae.org). Published in ASHRAE Transactions 2010, Vol. 116, Part 1. For personal use only. Additional reproduction, distribution, or transmission in either print or digital form is not permitted without ASHRAEs prior written permission. ASHRAE Tr
46、ansactions 23RCILOof 81% indicates an over-cooled space (90% isoften considered poor)RTI of 77% indicates an under-utilization of availableequipment temperature differential AND an over-venti-lated space; by-pass air of 30% (1/0.77).The overall goal is to move all four needles towards the12 oclock p
47、osition (100%). The crux of the matter is toknow what corrective actions may be needed. However, inthis example, improved air management could reduce theby-pass air (increase RTI), reduce the fan energy (improveDCiE), and increase the supply air temperature (raiseRCILO).The alarm levels are user-def
48、ined as well as the coloringof the dials in green, orange, and red to indicate good, accept-able, and poor operation, respectively. In addition, the operatorcan select both the averaging period and the samplingfrequency. A second tier of gauges include data of higher gran-ularity. Second tier data a
49、lso include trending of the utilizedmetrics.A glance at the proposed dashboard provides instantvisual information on the operational status of infrastructureenergy efficiency (DCiE), IT-equipment intake air tempera-ture compliance (RCIHIand RCILO), and air managementeffectiveness (RTI). The dashboard is not only a monitoringtool but also a diagnostic tool for reconfiguring the site andresolving air management issues. Furthermore, the alarmfunctionality provides important information of out-of-boundconditions. All t
copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1