1、Ranojoy Dutta is a High Performance Building Specialist with View Inc. Milpitas, CA. T. Agami Reddy is SRP professor at the Design School and the School of Sustainable Engineering in the Built Environment and George Runger is professor at the School of Computing Informatics and Decision Systems, Ari
2、zona State University, Tempe, AZ. A Visual Analytics Based Methodology for Multi-Criteria Evaluation of Building Design Alternatives Ranojoy Dutta T. Agami Reddy, PhD, PE George Runger PhD Associate Member ASHRAE Fellow ASHRAE ABSTRACT The objective of this paper is to illustrate a novel visualizati
3、on methodology which can enhance the complementary relationship between computers and designers of high performance buildings. The proposed approach facilitates Multi-Criterion Decision Making via interactive visualization that allows dynamic adjustment of important variables, while providing a visu
4、al range of allowable variation for the other design parameters. The parallel coordinates visualization technique is the culmination of a methodology which includes the application of Monte Carlo techniques to create a database of solutions using whole building energy simulations, along with data mi
5、ning methods to rank variable importance and reduce the multi-dimensionality of the design problem. The solution set are then fit by a second order regression model which can instantaneously provide bounds on specific regressor variables when other variables values are changed , while satisfying pre
6、-set energy performance limits. The methodology is illustrated using the USDOE medium office building configuration with 15 design variables. INTRODUCTION Designing buildings to be energy efficient can be described as a multi-criteria constrained optimization problem whose complexity originates from
7、 the large number of variables involved, the dynamic nature of building loads and processes, the intricacy of interaction effects among variables, and the inability of the designer to visualize cause and effect in multi-dimensional space. In multi-criteria optimization problems, the search for a sin
8、gle optimal solution is often futile, since the objectives are usually competitive. Instead, a feasible intermediary solution(s) may be found through an interactive search procedure involving both designer and computer. A strategy suited to this type of search has been demonstrated by Addison (1988)
9、 based on the idea of satisficing (satisfy + suffice) - a term coined by H.A. Simon in the context of economic theory. Simon suggests that, in general, individuals look for alternatives which are “good enough” rather than optimal. An alternative is “good enough” if it satisfies the individuals aspir
10、ation levels and suffices in the absence of a practical obtainable optimum. In the context of building design, these aspiration levels may alternately be considered to be performance thresholds (Addison 1988). Choosing from the wide variety of innovative technologies and energy efficiency measures a
11、vailable today, a designer has to balance environmental, energy and financial factors in order to reach the best possible solution that will maximize the energy efficiency of a building while satisfying the final user/owner needs (Diakaki et al. 2008). Thus, the need to address multi-criteria requir
12、ements makes it more valuable for a designer to know the “latitude” or “degree of freedom” he/she has in varying certain design variables while achieving satisfactory levels of energy performance as well as addressing other relevant criteria like life cycle cost, environmental impacts, etc. What is
13、required is a methodology that will allow designers to explore the consequences of decisions relating to varying these variables at the conceptual stage of design, and thereby design a building that achieves a good balance between multiple objectives (DCruz and Radford 1987). BACKGROUND While perfor
14、mance prediction can be highly automated through the use of computers, performance evaluation is usually not amenable, unless it is with respect to a single criterion. Multicriteria decision-making is the critical non-delegable design task that requires human intervention. Computers can, however, fa
15、cilitate the evaluation process though appropriate user interfaces that provide graphical representation of results and allow for direct comparison of multiple solutions with respect to multiple performance criteria. Thus, the design of high performance (low energy) buildings requires a synergy betw
16、een automated performance prediction/visualization with the human capabilities to perceive, relate and ultimately select a satisficing solution. Such a comprehensive design framework has been discussed by Dutta (2013), who addresses the need for a complementary relationship between human designers a
17、nd computers for Multi-Criteria Decision Making (MCDM) in the domain of low energy building design. The MCDM process has two elements (Thomas and Cook 2006): (a) A procedure to allow searching for one or more solutions that reflect the desired pay-off between the criteria. (b) A decision-making step
18、 wherein the designer selects the most desirable solution among feasible solutions. Dutta (2013) has implemented the MCDM search element using data mining techniques while the MCDM decision making component has been supported through interactive visualization. The complete MCDM process has been inco
19、rporated into a new methodology for high performance building design referred to as a Visual Analytics based Decision Support Methodology (VADSM). This paper discusses the decision making component of MCDM via Visual Analytics, which is defined as the science of analytical reasoning facilitated by i
20、nteractive visual interfaces (Thomas and Cook 2006). Historically, visual analytics evolved out of the fields of information and scientific visualization. However, visual analytics is more than just information visualization, and by definition it is an integrated approach combining visualization, hu
21、man factors and data analysis (Keim and Andrienko 2008). REVIEW OF EXISTING WORK In the domain of building energy analysis there is an increasing interest in the use of machine learning techniques such as neural networks, support vector machines and Random Forest (RF) for prediction of building ener
22、gy consumption (Zhao and Magoules 2012). State of the art nonlinear and nonparametric machine learning techniques such as RF do not require any prior knowledge of variable distribution or structure of the feature space, and inherently overcome assumptions of linear correlations and normality which a
23、re known to be ill-suited for many complicated applications. Tsanas and Xifara (2012) used RF to study the effect of eight input variables of residential buildings. They compared the RF to a classical linear regression technique (Iteratively Reweighted Least Squares IRLS) and found that RF greatly o
24、utperformed IRLS in finding an accurate functional relationship between the input and output variables. Classical regression settings may also fail to account for the presence of multi-collinearity, wherein variables appear to have large magnitude but opposite sign regression coefficients. On the co
25、ntrary, the RF learning mechanism randomizes the selection of a subset of features for each split, and thus can internally account for redundant and interacting variables (Breiman 2001). A growing body of research clearly supports the fact that machine learning techniques are viable alternatives to
26、physical modeling and traditional statistical analysis of building energy data (Dutta 2013). However, the disadvantages of such tools are that they often require extensive training data and are complex black box models that are not interpretable without advanced statistical knowledge while allowing
27、limited model visualization insights; which is a critical component for knowledge discovery. Hence, this research presents a methodology geared towards combing automated data analysis with interactive visualization techniques. There is also a history of research on visualization techniques (Haberl a
28、nd Abbas 1998) and graphical user interfaces (Papamichael, 1999) for the analysis of building simulation outputs. 3-D surface plots were used to view small differences between the simulated data and the measured data for non-weather dependent loads (Haberl et al. 1993). For weather dependent loads c
29、arpet matrix plots were used to detect different trends between DOE-2 simulations and measured consumption. While these techniques do assist the building energy analyst to review large amounts of building energy consumption data for errors or to establish time and temperature related trends, the max
30、imum number of dimensions (variables) that could be accommodated at a time in a single display is still limited to four three axial and one using color. The fourth dimension could also be time, as illustrated by Haberl et al. (1996) in their use of animated (time sequenced) displays of energy use da
31、ta. These techniques are rather limiting for the visual analysis of higher dimensional datasets involved in energy simulations as well as for usage data recorded by sensors and BMS at short time intervals. There are very few examples of graphical user interfaces that allow a designer to use simulati
32、on results for design synthesis. One interesting prototype is the BDA or Building Design Advisor (LBNL, 2006), designed to facilitate informed decisions from the early schematic phases of building design to the detailed specification of building components and systems (Papamichael 1999). The BDA pro
33、vides a graphical user interface that consists of two main elements: the Building Browser and the Decision Desktop. The Decision Desktop supports a large variety of data types, including 2-D and 3-D distributions, images, sound and video which can be displayed and edited in their own window The limi
34、tation here, again, is from the data mining and knowledge discovery perspective; potentially useful patterns are not highlighted by machine learning algorithms; rather, the user has to discover these through somewhat tedious trial and error visual analysis. METHODOLOGY VADSM (Fig. 1) begins by ident
35、ifying key design variables and their ranges defined by the owner/designer, and two or more response variables deemed key for decision making, such as annual energy use and peak electric demand. In the conventional use of simulation tools, the inputs are in essence already selected, and the resultin
36、g outputs are a function of those choices. This strategy only allows a limited trial and error design analysis. However, if simulated output targets are used instead to fine tune the inputs, then a design analysis activity could be transformed into a design synthesis opportunity. This is one of the
37、key concepts incorporated in VADSM. Once the variable ranges have been determined, appropriate experimental design techniques are adopted to generate a feasible number of simulation runs (variable combinations) with respect to run time, and to ensure uniform sampling over the entire solution space.
38、Batch simulated data can then be analyzed using state of the art data mining algorithms such as RF to ascertain variable importance, thereby discarding irrelevant variables and reducing the dimensionality of the problem. This allows simpler predictive models to be built using well established tradit
39、ional regression techniques such as Ordinary Least Squares (OLS). OLS regression generates a single global model which is well suited to fitting the reduced variable space and offers the added benefit of an explicit analytical equation that can be used to calculate inputs for specific outputs. The r
40、egression model is also easier to visualize and manipulate via a graphical interface. In contrast a RF model is essentially a black box predictor that cannot be as easily visualized or manipulated as an OLS Model. The subsequent stage in VADSM requires a graphical user interface (GUI) to provide des
41、igners a way to visualize the OLS predictive models and perform what-if assessments dynamically, i.e., in in real time. The Decision Support Model Viewer (DSMV) application has been developed to fulfill this requirement. It allows designers to quickly and easily specify the characteristics of potent
42、ial design options through direct manipulation of multiple inputs and get real time information about their energy performance. The DSMV also allows a designer to visually keep track of how a specific change in a single variable affects the “degrees of freedom” of other variables by dynamically upda
43、ting variable ranges. CASE STUDY DESCRIPTION A widely used commercial building energy simulation program based on the DOE-2.2 engine was used to study two outputs (response variables): annual Energy Use Index (EUI) with units of kBtu/ft2/yr (MJ/m2/yr) and annual peak electric demand (PED) in kW/yr.,
44、 for the U.S. Department of Energys (DoE) medium office prototype building. Reddy et al. (2007) have suggested a list of heuristically identified influential parameters that have simple and clear correspondence to specific inputs to the DOE-2 simulation program. Based on that a list of 15 variables
45、(Table 1) representing all three major load categories (building - system - plant) of interest were chosen for investigation. All the rest of the energy modeling parameters were set to the prototype description. Figure 1: Flow Diagram of the proposed design methodology - VADSM SIMULATION A minimum o
46、f three levels for each of the 15 independent variables were selected so as to capture quadratic behavior. However, even with three levels an exhaustive combination of the 15 variables would lead to 3nullnull 14 x 10nullcombinations; an impractical number of simulations. An experimental design techn
47、ique was thus essential to select fewer runs while ensuring stratified (representative) sampling of the variable space. Latin Hypercube Sampling (LHS) was used to generate a relatively sparse set of 15,000 variable combinations for simulation. LHS is often used to construct computer experiments for
48、performing sensitivity and uncertainty analysis on complex systems (Helton and Davis 2003). See Table 2 for descriptive statistics of the two simulated responses. A Random Forest (RF) ensemble of 500 regression trees was then utilized on the 15,000 rows of simulated data to generate variable ranking
49、s for each response. RF works by building an ensemble of individual decision trees on bootstrapped samples. Since it includes many trees, this ensemble is called a forest (Breiman 2001).This algorithm has an inherently robust variable selection technique that allows accurate identification of the top predictors. Once the best predictive variables were ranked according to importance, several second order linear OLS regression models were evaluated using the top ranked 5-8 design variables out of the initial 15.The two best OLS models (one for each response) were select