1、Designation: D6589 05 (Reapproved 2015)Standard Guide forStatistical Evaluation of Atmospheric Dispersion ModelPerformance1This standard is issued under the fixed designation D6589; the number immediately following the designation indicates the year oforiginal adoption or, in the case of revision, t
2、he year of last revision. A number in parentheses indicates the year of last reapproval. Asuperscript epsilon () indicates an editorial change since the last revision or reapproval.1. Scope1.1 This guide provides techniques that are useful for thecomparison of modeled air concentrations with observe
3、d fielddata. Such comparisons provide a means for assessing amodels performance, for example, bias and precision oruncertainty, relative to other candidate models. Methodologiesfor such comparisons are yet evolving; hence, modificationswill occur in the statistical tests and procedures and dataanaly
4、sis as work progresses in this area. Until the interestedparties agree upon standard testing protocols, differences inapproach will occur. This guide describes a framework, orphilosophical context, within which one determines whether amodels performance is significantly different from othercandidate
5、 models. It is suggested that the first step should be todetermine which models estimates are closest on average tothe observations, and the second step would then test whetherthe differences seen in the performance of the other models aresignificantly different from the model chosen in the first st
6、ep.An example procedure is provided in Appendix X1 to illustratean existing approach for a particular evaluation goal. Thisexample is not intended to inhibit alternative approaches ortechniques that will produce equivalent or superior results. Asdiscussed in Section 6, statistical evaluation of mode
7、l perfor-mance is viewed as part of a larger process that collectively isreferred to as model evaluation.1.2 This guide has been designed with flexibility to allowexpansion to address various characterizations of atmosphericdispersion, which might involve dose or concentrationfluctuations, to allow
8、development of application-specificevaluation schemes, and to allow use of various statisticalcomparison metrics. No assumptions are made regarding themanner in which the models characterize the dispersion.1.3 The focus of this guide is on end results, that is, theaccuracy of model predictions and t
9、he discernment of whetherdifferences seen between models are significant, rather thanoperational details such as the ease of model implementation orthe time required for model calculations to be performed.1.4 This guide offers an organized collection of informationor a series of options and does not
10、 recommend a specific courseof action. This guide cannot replace education or experienceand should be used in conjunction with professional judgment.Not all aspects of this guide may be applicable in all circum-stances. This guide is not intended to represent or replace thestandard of care by which
11、the adequacy of a given professionalservice must be judged, nor should it be applied withoutconsideration of a projects many unique aspects. The word“Standard” in the title of this guide means only that thedocument has been approved through the ASTM consensusprocess.1.5 The values stated in SI units
12、 are to be regarded asstandard. No other units of measurement are included in thisguide.1.6 This standard does not purport to address all of thesafety concerns, if any, associated with its use. It is theresponsibility of the user of this standard to establish appro-priate safety and health practices
13、 and to determine theapplicability of regulatory limitations prior to use.2. Referenced Documents2.1 ASTM Standards:2D1356 Terminology Relating to Sampling and Analysis ofAtmospheres3. Terminology3.1 DefinitionsFor definitions of terms used in this guide,refer to Terminology D1356.3.2 Definitions of
14、 Terms Specific to This Standard:3.2.1 atmospheric dispersion model, nan idealization ofatmospheric physics and processes to calculate the magnitudeand location of pollutant concentrations based on fate,transport, and dispersion in the atmosphere. This may take the1This guide is under the jurisdicti
15、on of ASTM Committee D22 on Air Qualityand is the direct responsibility of Subcommittee D22.11 on Meteorology.Current edition approved April 1, 2015. Published April 2015. Originallyapproved in 2000. Last previous edition approved in 2010 as D6589 05 (2010)1.DOI: 10.1520/D6589-05R15.2For referenced
16、ASTM standards, visit the ASTM website, www.astm.org, orcontact ASTM Customer Service at serviceastm.org. For Annual Book of ASTMStandards volume information, refer to the standards Document Summary page onthe ASTM website.Copyright ASTM International, 100 Barr Harbor Drive, PO Box C700, West Consho
17、hocken, PA 19428-2959. United States1form of an equation, algorithm, or series of equations/algorithms used to calculate average or time-varying concen-tration. The model may involve numerical methods for solu-tion.3.2.2 dispersion, absolute, nthe characterization of thespreading of material release
18、d into the atmosphere based on acoordinate system fixed in space.3.2.3 dispersion, relative, nthe characterization of thespreading of material released into the atmosphere based on acoordinate system that is relative to the local median positionof the dispersing material.3.2.4 evaluation objective,
19、na feature or characteristic,which can be defined through an analysis of the observedconcentration pattern, for example, maximum centerline con-centration or lateral extent of the average concentration patternas a function of downwind distance, which one desires toassess the skill of the models to r
20、eproduce.3.2.5 evaluation procedure, nthe analysis steps to betaken to compute the value of the evaluation objective from theobserved and modeled patterns of concentration values.3.2.6 fate, nthe destiny of a chemical or biological pol-lutant after release into the environment.3.2.7 model input valu
21、e, ncharacterizations that must beestimated or provided by the model developer or user beforemodel calculations can be performed.3.2.8 regime, na repeatable narrow range of conditions,defined in terms of model input values, which may or may notbe explicitly employed by all models being tested, neede
22、d fordispersion model calculations. It is envisioned that the disper-sion observed should be similar for all cases having similarmodel input values.3.2.9 uncertainty, nrefers to a lack of knowledge aboutspecific factors or parameters. This includes measurementerrors, sampling errors, systematic erro
23、rs, and differencesarising from simplification of real-world processes. Inprinciple, uncertainty can be reduced with further informationor knowledge (1).33.2.10 variability, nrefers to differences attributable totrue heterogeneity or diversity in atmospheric processes thatresult in part from natural
24、 random processes. Variabilityusually is not reducible by further increases in knowledge, butit can in principle be better characterized (1).4. Summary of Guide4.1 Statistical evaluation of dispersion model performancewith field data is viewed as part of a larger process thatcollectively is called m
25、odel evaluation. Section 6 discusses thecomponents of model evaluation.4.2 To statistically assess model performance, one mustdefine an overall evaluation goal or purpose. This will suggestfeatures (evaluation objectives) within the observed and mod-eled concentration patterns to be compared, for ex
26、ample,maximum surface concentrations, lateral extent of a dispersingplume. The selection and definition of evaluation objectivestypically are tailored to the models capabilities and intendeduses. The very nature of the problem of characterizing airquality and the way models are applied make one sing
27、le orabsolute evaluation objective impossible to define that issuitable for all purposes. The definition of the evaluationobjectives will be restricted by the limited range conditionsexperienced in the available comparison data suitable for use.For each evaluation objective, a procedure will need to
28、 bedefined that allows definition of the evaluation objective fromthe available observations of concentration values.4.3 In assessing the performance of air quality models tocharacterize a particular evaluation objective, one shouldconsider what the models are capable of providing. As dis-cussed in
29、Section 7, most models attempt to characterize theensemble average concentration pattern. If such models shouldprovide favorable comparisons with observed concentrationmaxima, this is resulting from happenstance, rather than skill inthe model; therefore, in this discussion, it is suggested a modelbe
30、 assessed on its ability to reproduce what it was designed toproduce, for at least in these comparisons, one can be assuredthat zero bias with the least amount of scatter is by definitiongood model performance.4.4 As an illustration of the principles espoused in thisguide, a procedure is provided in
31、 Appendix X1 for comparisonof observed and modeled near-centerline concentration values,which accommodates the fact that observed concentrationvalues include a large component of stochastic, and possiblydeterministic, variability unaccounted for by current models.The procedure provides an objective
32、statistical test of whetherdifferences seen in model performance are significant.5. Significance and Use5.1 Guidance is provided on designing model evaluationperformance procedures and on the difficulties that arise instatistical evaluation of model performance caused by thestochastic nature of disp
33、ersion in the atmosphere. It is recog-nized there are examples in the literature where, knowingly orunknowingly, models were evaluated on their ability to de-scribe something which they were never intended to charac-terize. This guide is attempting to heighten awareness, andthereby, to reduce the nu
34、mber of “unknowing” comparisons. Agoal of this guide is to stimulate development and testing ofevaluation procedures that accommodate the effects of naturalvariability. A technique is illustrated to provide informationfrom which subsequent evaluation and standardization can bederived.6. Model Evalua
35、tion6.1 BackgroundAir quality simulation models have beenused for many decades to characterize the transport anddispersion of material in the atmosphere (2-4). Early evalua-tions of model performance usually relied on linear least-squares analyses of observed versus modeled values, usingtraditional
36、scatter plots of the values, (5-7). During the 1980s,attempts have been made to encourage the standardization ofmethods used to judge air quality model performance (8-11).3The boldface numbers in parentheses refer to the list of references at the end ofthis standard.D6589 05 (2015)2Further developme
37、nt of these proposed statistical evaluationprocedures was needed, as it was found that the rote applica-tion of statistical metrics, such as those listed in (8), wasincapable of discerning differences in model performance (12),whereas if the evaluation results were sorted by stability anddistance do
38、wnwind, then differences in modeling skill could bediscerned (13). It was becoming increasingly evident that themodels were characterizing only a small portion of the ob-served variations in the concentration values (14). To betterdeduce the statistical significance of differences seen in modelperfo
39、rmance in the face of large unaccounted for uncertaintiesand variations, investigators began to explore the use ofbootstrap techniques (15). By the late 1980s, most of the modelperformance evaluations involved the use of bootstrap tech-niques in the comparison of maximum values of modeled andobserve
40、d cumulative frequency distributions of the concentra-tions values (16). Even though the procedures and metrics to beemployed in describing the performance of air quality simula-tion models are still evolving (17-19), there has been a generalacceptance that defining performance of air quality models
41、needs to address the large uncertainties inherent in attemptingto characterize atmospheric fate, transport and dispersionprocesses. There also has been a consensus reached on thephilosophical reasons that models of earth science processescan never be validated, in the sense of claiming that a model
42、istruthfully representing natural processes. No general empiricalproposition about the natural world can be certain, since therewill always remain the prospect that future observations maycall the theory in question (20). It is seen that numerical modelsof air pollution are a form of a highly comple
43、x scientifichypothesis concerning natural processes, that can be confirmedthrough comparison with observations, but never validated.6.2 Components of Model EvaluationA model evaluationincludes science peer reviews and statistical evaluations withfield data. The completion of each of these components
44、assumes specific model goals and evaluation objectives (seeSection 10) have been defined.6.3 Science Peer ReviewsGiven the complexity of char-acterizing atmospheric processes, and the inevitable necessityof limiting model algorithms to a resolvable set, one compo-nent of a model evaluation is to rev
45、iew the models science toconfirm that the construct is reasonable and defensible for thedefined evaluation objectives. A key part of the scientific peerreview will include the review of residual plots where modeledand observed evaluation objectives are compared over a rangeof model inputs, for examp
46、le, maximum concentrations as afunction of estimated plume rise or as a function of distancedownwind.6.4 Statistical Evaluations with Field DataThe objectivecomparison of modeled concentrations with observed field dataprovides a means for assessing model performance. Due to thelimited supply of eval
47、uation data sets, there are severe practicallimits in assessing model performance. For this reason, theconclusions reached in the science peer reviews (see 6.3) andthe supportive analyses (see 6.5) have particular relevance indeciding whether a model can be applied for the defined modelevaluation ob
48、jectives. In order to conduct a statisticalcomparison, one will have to define one or more evaluationobjectives for which objective comparisons are desired (Sec-tion 10). As discussed in 8.4.4, the process of summarizing theoverall performance of a model over the range of conditionsexperienced withi
49、n a field experiment typically involves deter-mining two points for each of the model evaluation objectives:which of the models being assessed has on average the smallestcombined bias and scatter in comparisons with observations,and whether the differences seen in the comparisons with theother models statistically are significant in light of the uncer-tainties in the observations.6.5 Other Tasks Supportive to Model EvaluationAs atmo-spheric dispersion models become more sophisticated, it is noteasy to detect coding errors in the implementation of the modelalgo