1、Designation: D7915 14D7915 18 An American National StandardStandard Practice forApplication of Generalized Extreme Studentized Deviate(GESD) Technique to Simultaneously Identify MultipleOutliers in a Data Set1This standard is issued under the fixed designation D7915; the number immediately following
2、 the designation indicates the year oforiginal adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. Asuperscript epsilon () indicates an editorial change since the last revision or reapproval.1. Scope Scope*1.1 This practice
3、provides a step by step procedure for the application of the Generalized Extreme Studentized Deviate (GESD)Many-Outlier Procedure to simultaneously identify multiple outliers in a data set. (See Bibliography.)1.2 This practice is applicable to a data set comprising observations that is represented o
4、n a continuous numerical scale.1.3 This practice is applicable to a data set comprising a minimum of six observations.1.4 This practice is applicable to a data set where the normal (Gaussian) model is reasonably adequate for the distributionalrepresentation of the observations in the data set.1.5 Th
5、e probability of false identification of outliers associated with the decision criteria set by this practice is 0.01.1.6 It is recommended that the execution of this practice be conducted under the guidance of personnel familiar with thestatistical principles and assumptions associated with the GESD
6、 technique.1.7 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibilityof the user of this standard to establish appropriate safety safety, health, and healthenvironmental practices and determine theapplicability of regulatory li
7、mitations prior to use.1.8 This international standard was developed in accordance with internationally recognized principles on standardizationestablished in the Decision on Principles for the Development of International Standards, Guides and Recommendations issuedby the World Trade Organization T
8、echnical Barriers to Trade (TBT) Committee.2. Terminology2.1 Definitions of Terms Specific to This Standard:2.1.1 outlier, nan observation (or a subset of observations) which appears to be inconsistent with the remainder of the dataset.3. Significance and Use3.1 The GESD procedure can be used to sim
9、ultaneously identify up to a pre-determined number of outliers (r) in a data set,without having to pre-examine the data set and make a priori decisions as to the location and number of potential outliers.3.2 The GESD procedure is robust to masking. Masking describes the phenomenon where the existenc
10、e of multiple outliers canprevent an outlier identification procedure from declaring any of the observations in a data set to be outliers.3.3 The GESD procedure is automation-friendly, and hence can easily be programmed as automated computer algorithms.4. Procedure4.1 Specify the maximum number of o
11、utliers (r) in a data set to be identified. This is the number of cycles required to beexecuted (see 4.2) for the identification of up to r outliers.4.1.1 The recommended maximum number of outliers (r) by this practice is two (2) for data sets with six to twelve observations.1 This practice is under
12、 the jurisdiction of ASTM Committee D02 on Petroleum Products, Liquid Fuels, and Lubricants and is the direct responsibility of SubcommitteeD02.94 on Coordinating Subcommittee on Quality Assurance and Statistics.Current edition approved May 1, 2014July 1, 2018. Published June 2014August 2018. Origin
13、ally approved in 1988. Last previous edition approved in 2014 as D7915 14.DOI: 10.1520/D7915-14.10.1520/D7915-18.This document is not an ASTM standard and is intended only to provide the user of an ASTM standard an indication of what changes have been made to the previous version. Becauseit may not
14、be technically possible to adequately depict all changes accurately, ASTM recommends that users consult prior editions as appropriate. In all cases only the current versionof the standard as published by ASTM is to be considered the official document.*A Summary of Changes section appears at the end
15、of this standardCopyright ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States14.1.2 For data sets with more than twelve observations, the recommended maximum number of outliers (r) is the lesser of ten(10) or 20 %.4.1.3 The recommended values for r
16、 in 4.1.1 and 4.1.2 are not intended to be mandatory. Users can specify other values basedon their specific needs.4.2 Set the current cycle number c to 1 (c = 1).4.2.1 Assign the original data set to be assessed (in 4.1) as the data set for the current cycle 1 and label it as DTS1.4.3 Compute test s
17、tatistic T for each observation in the initial starting data set assigned to the current cycle (DTS0c) as follows:T 5|x 2x|s (1)where:x = an observation in the data set,x = average calculated using all observations in the data set, ands = sample standard deviation calculated using all observations i
18、n the data set.4.4 RemoveIdentify the observation in the data set associated with the largest absolute magnitude of the test statistic T and forma reduced in the data set (DTSi), where i = number of observations removed from the initial data set.of the current cycle.4.4 Re-calculate T for all observ
19、ations in the reduced data set from 4.3.4.5 Repeat steps If current cycle c is less than r, execute 4.34.5.1 to 4.44.5.4 until; otherwise rgo numberto 4.6of observationshave been removed from the initial data set. That is, until calculation of all .Ts for all observations in the reduced data set DTS
20、rhas been completed.4.5.1 Remove the observation identified in 4.4 from the data set of the current cycle.4.5.2 Increment the current cycle number by 1:c = ccurrent + 1.4.5.3 Assign the reduced data set in 4.5.1 (that is, data set with the observation identified in 4.4 removed) as the data set forth
21、e new cycle number and label it as DTSc.4.5.4 Repeat steps 4.3 to 4.5.4.6 Compare Beginning with c = r, compare the maximum T computed in each data set (DTSthe dataset 0 to DTSrc), to a criticalvalue critical associated with the data set DTSfori, cycle c, where critical is chosen based on a false id
22、entification probability of 0.01.See Table A1.1 in Annex A1 for critical values applicable to different data set sizes.sizes and cycle numbers (c).4.7 Identify the data set DTSm for which the maximum T exceeds critical, and m (number of observations removed from theinitial data set DTS0) is the larg
23、est value (0 DTS01 T0 DTS1 T1 DTS2 T2 DTS3 T3 DTS4 T4 DTS5 T5 DTS6 T635.0 0.30 35.0 0.44 35.0 0.64 35.0 0.97 35.0 0.94 35.0 1.05 35.0 1.1635.0 0.30 35.0 0.44 35.0 0.64 35.0 0.97 35.0 0.94 35.0 1.0536.6 0.05 36.6 0.04 36.6 0.17 36.6 0.37 36.6 0.32 36.6 0.40 36.6 0.4936.6 0.05 36.6 0.04 36.6 0.17 36.6
24、 0.37 36.6 0.32 36.6 0.4034.7 0.37 34.7 0.52 34.7 0.73 34.7 1.08 34.7 1.06 34.7 1.17 34.7 1.2934.7 0.37 34.7 0.52 34.7 0.73 34.7 1.08 34.7 1.06 34.7 1.1736.2 0.04 36.2 0.14 36.2 0.29 36.2 0.52 36.2 0.48 36.2 0.56 36.2 0.6636.2 0.04 36.2 0.14 36.2 0.29 36.2 0.52 36.2 0.48 36.2 0.5637.0 0.14 37.0 0.06
25、 37.0 0.05 37.0 0.22 37.0 0.17 37.0 0.24 37.0 0.3237.0 0.14 37.0 0.06 37.0 0.05 37.0 0.22 37.0 0.17 37.0 0.2425.3 2.44 25.3 2.8537.2 0.18 37.2 0.11 37.2 0.00 37.2 0.15 37.2 0.09 37.2 0.16 37.2 0.2437.2 0.18 37.2 0.11 37.2 0.00 37.2 0.15 37.2 0.09 37.2 0.1641.3 1.09 41.3 1.12 41.3 1.20 41.3 1.38 41.3
26、 1.50 41.3 1.49 41.3 1.4941.3 1.09 41.3 1.12 41.3 1.20 41.3 1.38 41.3 1.50 41.3 1.4926.0 2.29 26.0 2.68 26.0 3.2724.6 2.6033.5 0.63 33.5 0.81 33.5 1.08 33.5 1.53 33.5 1.52 33.5 1.6535.5 0.19 35.5 0.32 35.5 0.49 35.5 0.78 35.5 0.75 35.5 0.85 35.5 0.9535.5 0.19 35.5 0.32 35.5 0.49 35.5 0.78 35.5 0.75
27、35.5 0.8535.4 0.21 35.4 0.34 35.4 0.52 35.4 0.82 35.4 0.79 35.4 0.89 35.4 1.0035.4 0.21 35.4 0.34 35.4 0.52 35.4 0.82 35.4 0.79 35.4 0.8939.9 0.78 39.9 0.78 39.9 0.79 39.9 0.86 39.9 0.96 39.9 0.93 39.9 0.9039.9 0.78 39.9 0.78 39.9 0.79 39.9 0.86 39.9 0.96 39.9 0.9339.2 0.62 39.2 0.60 39.2 0.59 39.2
28、0.60 39.2 0.69 39.2 0.65 39.2 0.6039.2 0.62 39.2 0.60 39.2 0.59 39.2 0.60 39.2 0.69 39.2 0.6536.6 0.05 36.6 0.04 36.6 0.17 36.6 0.37 36.6 0.32 36.6 0.40 36.6 0.4936.6 0.05 36.6 0.04 36.6 0.17 36.6 0.37 36.6 0.32 36.6 0.4037.2 0.18 37.2 0.11 37.2 0.00 37.2 0.15 37.2 0.09 37.2 0.16 37.2 0.2437.2 0.18
29、37.2 0.11 37.2 0.00 37.2 0.15 37.2 0.09 37.2 0.1633.2 0.70 33.2 0.89 33.2 1.16 33.2 1.64 33.2 1.6434.0 0.52 34.0 0.69 34.0 0.93 34.0 1.34 34.0 1.33 34.0 1.45 34.0 1.5934.0 0.52 34.0 0.69 34.0 0.93 34.0 1.34 34.0 1.33 34.0 1.4535.7 0.15 35.7 0.27 35.7 0.43 35.7 0.71 35.7 0.67 35.7 0.77 35.7 0.8735.7
30、0.15 35.7 0.27 35.7 0.43 35.7 0.71 35.7 0.67 35.7 0.7739.2 0.62 39.2 0.60 39.2 0.59 39.2 0.60 39.2 0.69 39.2 0.65 39.2 0.6039.2 0.62 39.2 0.60 39.2 0.59 39.2 0.60 39.2 0.69 39.2 0.6542.1 1.26 42.1 1.32 42.1 1.43 42.1 1.6835.7 0.15 35.7 0.27 35.7 0.43 35.7 0.71 35.7 0.67 35.7 0.77 35.7 0.8735.7 0.15
31、35.7 0.27 35.7 0.43 35.7 0.71 35.7 0.67 35.7 0.7740.2 0.84 40.2 0.85 40.2 0.88 40.2 0.97 40.2 1.08 40.2 1.05 40.2 1.0240.2 0.84 40.2 0.85 40.2 0.88 40.2 0.97 40.2 1.08 40.2 1.0536.6 0.05 36.6 0.04 36.6 0.17 36.6 0.37 36.6 0.32 36.6 0.40 36.6 0.4936.6 0.05 36.6 0.04 36.6 0.17 36.6 0.37 36.6 0.32 36.6
32、 0.4041.1 1.04 41.1 1.07 41.1 1.14 41.1 1.31 41.1 1.43 41.1 1.41 41.1 1.4041.1 1.04 41.1 1.07 41.1 1.14 41.1 1.31 41.1 1.43 41.1 1.4141.1 1.04 41.1 1.07 41.1 1.14 41.1 1.31 41.1 1.43 41.1 1.41 41.1 1.4041.1 1.04 41.1 1.07 41.1 1.14 41.1 1.31 41.1 1.43 41.1 1.4139.1 0.60 39.1 0.58 39.1 0.56 39.1 0.56
33、 39.1 0.65 39.1 0.61 39.1 0.5639.1 0.60 39.1 0.58 39.1 0.56 39.1 0.56 39.1 0.65 39.1 0.6140.6 0.93 40.6 0.95 40.6 1.00 40.6 1.12 40.6 1.23 40.6 1.21 40.6 1.1940.6 0.93 40.6 0.95 40.6 1.00 40.6 1.12 40.6 1.23 40.6 1.2141.3 1.09 41.3 1.12 41.3 1.20 41.3 1.38 41.3 1.50 41.3 1.49 41.3 1.4941.3 1.09 41.3
34、 1.12 41.3 1.20 41.3 1.38 41.3 1.50 41.3 1.49average 36.37 36.78 37.19 37.60 37.43 37.60 37.77average 36.37 36.78 37.19 37.60 37.43 37.60std dev 4.54 4.02 3.42 2.68 2.58 2.48 2.38D7915 1836. Keywords6.1 GESD; outliersANNEX(Mandatory Information)A1. critical FOR VARIOUS DATA SET SIZESdata set= DTS01
35、T0 DTS1 T1 DTS2 T2 DTS3 T3 DTS4 T4 DTS5 T5 DTS6 T6std dev 4.54 4.02 3.42 2.68 2.58 2.48Tmax 2.60 2.85 3.27 1.68 1.64 1.65 1.59Tmax 2.60 2.85 3.27 1.68 1.64 1.65critical 3.24 3.22 3.20 3.18 3.16 3.14 3.11critical 3.24 3.22 3.20 3.18 3.16 3.14m=0 m=1 m=2 m=3 m=4 m=5 m=6c = 1 c = 2 c = 3 c = 4 c = 5 c
36、= 6D7915 184TABLE A1.1 critical for Various Data Set Sizes (0.01 significant)NOTE 1Values (in italic) for cycles greater than r are shown for information only.m=0c = 1 m=1c = 2 m=2c = 3 m=3c = 4 m=4c = 5 m=5c = 6 m=6c = 7 m=7c = 8 m=8c = 9 m=9c =10 m=10r N critical critical critical critical critica
37、l critical critical critical critical critical critical2 6 1.97 1.76 1.502 7 2.14 1.97 1.762 8 2.27 2.14 1.972 9 2.39 2.27 2.142 10 2.48 2.39 2.272 11 2.56 2.48 2.392 12 2.64 2.56 2.483 13 2.70 2.64 2.56 2.483 14 2.76 2.70 2.64 2.563 15 2.81 2.76 2.70 2.643 16 2.85 2.81 2.76 2.703 17 2.89 2.85 2.81
38、2.764 18 2.93 2.89 2.85 2.81 2.764 19 2.97 2.93 2.89 2.85 2.814 20 3.00 2.97 2.93 2.89 2.854 21 3.03 3.00 2.97 2.93 2.894 22 3.06 3.03 3.00 2.97 2.935 23 3.09 3.06 3.03 3.00 2.97 2.935 24 3.11 3.09 3.06 3.03 3.00 2.975 25 3.14 3.11 3.09 3.06 3.03 3.005 26 3.16 3.14 3.11 3.09 3.06 3.036 27 3.18 3.16
39、3.14 3.11 3.09 3.066 28 3.20 3.18 3.16 3.14 3.11 3.09 3.066 29 3.22 3.20 3.18 3.16 3.14 3.11 3.096 30 3.24 3.22 3.20 3.18 3.16 3.14 3.116 31 3.25 3.24 3.22 3.20 3.18 3.16 3.146 32 3.27 3.25 3.24 3.22 3.20 3.18 3.167 33 3.29 3.27 3.25 3.24 3.22 3.20 3.18 3.167 34 3.30 3.29 3.27 3.25 3.24 3.22 3.20 3.
40、187 35 3.32 3.30 3.29 3.27 3.25 3.24 3.22 3.207 36 3.33 3.32 3.30 3.29 3.27 3.25 3.24 3.227 37 3.34 3.33 3.32 3.30 3.29 3.27 3.25 3.248 38 3.36 3.34 3.33 3.32 3.30 3.29 3.27 3.25 3.248 39 3.37 3.36 3.34 3.33 3.32 3.30 3.29 3.27 3.258 40 3.38 3.37 3.36 3.34 3.33 3.32 3.30 3.29 3.278 41 3.39 3.38 3.37
41、 3.36 3.34 3.33 3.32 3.30 3.298 42 3.40 3.39 3.38 3.37 3.36 3.34 3.33 3.32 3.309 43 3.41 3.40 3.39 3.38 3.37 3.36 3.34 3.33 3.32 3.309 44 3.43 3.41 3.40 3.39 3.38 3.37 3.36 3.34 3.33 3.329 45 3.44 3.43 3.41 3.40 3.39 3.38 3.37 3.36 3.34 3.339 46 3.45 3.44 3.43 3.41 3.40 3.39 3.38 3.37 3.36 3.349 47
42、3.46 3.45 3.44 3.43 3.41 3.40 3.39 3.38 3.37 3.3610 48 3.46 3.46 3.45 3.44 3.43 3.41 3.40 3.39 3.38 3.37 3.3610 48 3.46 3.46 3.45 3.44 3.43 3.41 3.40 3.39 3.38 3.3710 49 3.47 3.46 3.46 3.45 3.44 3.43 3.41 3.40 3.39 3.38 3.3710 49 3.47 3.46 3.46 3.45 3.44 3.43 3.41 3.40 3.39 3.3810 50 3.48 3.47 3.46
43、3.46 3.45 3.44 3.43 3.41 3.40 3.39 3.3810 50 3.48 3.47 3.46 3.46 3.45 3.44 3.43 3.41 3.40 3.3910 51 3.49 3.48 3.47 3.46 3.46 3.45 3.44 3.43 3.41 3.40 3.3910 51 3.49 3.48 3.47 3.46 3.46 3.45 3.44 3.43 3.41 3.4010 52 3.50 3.49 3.48 3.47 3.46 3.46 3.45 3.44 3.43 3.41 3.4010 52 3.50 3.49 3.48 3.47 3.46
44、3.46 3.45 3.44 3.43 3.4110 53 3.51 3.50 3.49 3.48 3.47 3.46 3.46 3.45 3.44 3.43 3.4110 53 3.51 3.50 3.49 3.48 3.47 3.46 3.46 3.45 3.44 3.4310 54 3.52 3.51 3.50 3.49 3.48 3.47 3.46 3.46 3.45 3.44 3.4310 54 3.52 3.51 3.50 3.49 3.48 3.47 3.46 3.46 3.45 3.4410 55 3.52 3.52 3.51 3.50 3.49 3.48 3.47 3.46
45、3.46 3.45 3.4410 55 3.52 3.52 3.51 3.50 3.49 3.48 3.47 3.46 3.46 3.4510 56 3.53 3.52 3.52 3.51 3.50 3.49 3.48 3.47 3.46 3.46 3.4510 56 3.53 3.52 3.52 3.51 3.50 3.49 3.48 3.47 3.46 3.4610 57 3.54 3.53 3.52 3.52 3.51 3.50 3.49 3.48 3.47 3.46 3.4610 57 3.54 3.53 3.52 3.52 3.51 3.50 3.49 3.48 3.47 3.461
46、0 58 3.55 3.54 3.53 3.52 3.52 3.51 3.50 3.49 3.48 3.47 3.4610 58 3.55 3.54 3.53 3.52 3.52 3.51 3.50 3.49 3.48 3.4710 59 3.55 3.55 3.54 3.53 3.52 3.52 3.51 3.50 3.49 3.48 3.4710 59 3.55 3.55 3.54 3.53 3.52 3.52 3.51 3.50 3.49 3.4810 60 3.56 3.55 3.55 3.54 3.53 3.52 3.52 3.51 3.50 3.49 3.4810 60 3.56
47、3.55 3.55 3.54 3.53 3.52 3.52 3.51 3.50 3.4910 61 3.57 3.56 3.55 3.55 3.54 3.53 3.52 3.52 3.51 3.50 3.4910 61 3.57 3.56 3.55 3.55 3.54 3.53 3.52 3.52 3.51 3.5010 62 3.57 3.57 3.56 3.55 3.55 3.54 3.53 3.52 3.52 3.51 3.50D7915 185TABLE A1.1 Continuedm=0c = 1 m=1c = 2 m=2c = 3 m=3c = 4 m=4c = 5 m=5c =
48、6 m=6c = 7 m=7c = 8 m=8c = 9 m=9c =10 m=10r N critical critical critical critical critical critical critical critical critical critical critical10 62 3.57 3.57 3.56 3.55 3.55 3.54 3.53 3.52 3.52 3.5110 63 3.58 3.57 3.57 3.56 3.55 3.55 3.54 3.53 3.52 3.52 3.5110 63 3.58 3.57 3.57 3.56 3.55 3.55 3.54
49、3.53 3.52 3.5210 64 3.59 3.58 3.57 3.57 3.56 3.55 3.55 3.54 3.53 3.52 3.5210 64 3.59 3.58 3.57 3.57 3.56 3.55 3.55 3.54 3.53 3.5210 65 3.59 3.59 3.58 3.57 3.57 3.56 3.55 3.55 3.54 3.53 3.5210 65 3.59 3.59 3.58 3.57 3.57 3.56 3.55 3.55 3.54 3.5310 66 3.60 3.59 3.59 3.58 3.57 3.57 3.56 3.55 3.55 3.54 3.5310 66 3.60 3.59 3.59 3.58 3.57 3.57 3.56 3.55 3.55 3.5410 67 3.60 3.60 3.59 3.59 3.58 3.57 3.57 3.56 3.55 3.55 3.5410 67 3.60 3.60 3.59 3.59 3.58 3.57 3.57 3.56 3.55 3.5510 68 3.61 3.60 3.60 3.59 3.59 3.58 3.57 3.57 3.56 3.55 3.5510 68 3.61 3.60