1、Designation: E 178 02An American National StandardStandard Practice forDealing With Outlying Observations1This standard is issued under the fixed designation E 178; the number immediately following the designation indicates the year oforiginal adoption or, in the case of revision, the year of last r
2、evision. A number in parentheses indicates the year of last reapproval. Asuperscript epsilon (e) indicates an editorial change since the last revision or reapproval.1. Scope1.1 This practice covers outlying observations in samplesand how to test the statistical significance of them.An outlyingobserv
3、ation, or “outlier,” is one that appears to deviate mark-edly from other members of the sample in which it occurs. Inthis connection, the following two alternatives are of interest:1.1.1 An outlying observation may be merely an extrememanifestation of the random variability inherent in the data. Ift
4、his is true, the value should be retained and processed in thesame manner as the other observations in the sample.1.1.2 On the other hand, an outlying observation may be theresult of gross deviation from prescribed experimental proce-dure or an error in calculating or recording the numerical value.I
5、n such cases, it may be desirable to institute an investigationto ascertain the reason for the aberrant value. The observationmay even actually be rejected as a result of the investigation,though not necessarily so. At any rate, in subsequent dataanalysis the outlier or outliers will be recognized a
6、s probablybeing from a different population than that of the other samplevalues.1.2 It is our purpose here to provide statistical rules that willlead the experimenter almost unerringly to look for causes ofoutliers when they really exist, and hence to decide whetheralternative 1.1.1 above, is not th
7、e more plausible hypothesis toaccept, as compared to alternative 1.1.2, in order that the mostappropriate action in further data analysis may be taken. Theprocedures covered herein apply primarily to the simplest kindof experimental data, that is, replicate measurements of someproperty of a given ma
8、terial, or observations in a supposedlysingle random sample. Nevertheless, the tests suggested docover a wide enough range of cases in practice to have broadutility.2. Referenced Documents2.1 ASTM Standards:E 456 Terminology Relating to Quality and Statistics23. Terminology3.1 Definitions: The termi
9、nology defined in TerminologyE 456 applies to this standard unless modified herein.3.1.1 outliersee outlying observation.3.1.2 outlying observation, nan observation that appearsto deviate markedly in value from other members of the samplein which it appears.4. Significance and Use4.1 When the experi
10、menter is clearly aware that a grossdeviation from prescribed experimental procedure has takenplace, the resultant observation should be discarded, whether ornot it agrees with the rest of the data and without recourse tostatistical tests for outliers. If a reliable correction procedure,for example,
11、 for temperature, is available, the observation maysometimes be corrected and retained.4.2 In many cases evidence for deviation from prescribedprocedure will consist primarily of the discordant value itself.In such cases it is advisable to adopt a cautious attitude. Use ofone of the criteria discuss
12、ed below will sometimes permit aclear-cut decision to be made. In doubtful cases the experi-menters judgment will have considerable influence. When theexperimenter cannot identify abnormal conditions, he should atleast report the discordant values and indicate to what extentthey have been used in th
13、e analysis of the data.4.3 Thus, for purposes of orientation relative to the over-allproblem of experimentation, our position on the matter ofscreening samples for outlying observations is precisely thefollowing:4.3.1 Physical Reason Known or Discovered for Outlier(s):4.3.1.1 Reject observation(s).4
14、.3.1.2 Correct observation(s) on physical grounds.4.3.1.3 Reject it (them) and possibly take additional obser-vation(s).4.3.2 Physical Reason UnknownUse Statistical Test:4.3.2.1 Reject observation(s).4.3.2.2 Correct observation(s) statistically.4.3.2.3 Reject it (them) and possibly take additional o
15、bser-vation(s).4.3.2.4 Employ truncated-sample theory for censored obser-vations.1This practice is under the jurisdiction ofASTM Committee E11 on Quality andStatistics and is the direct responsibility of Subcommittee E11.10 on Sampling.Current edition approved May 10, 2002. Published July 2002. Orig
16、inallypublished as E 178 61 T. Last previous edition E 178 00.2Annual Book of ASTM Standards, Vol 14.02.1Copyright ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959, United States.4.4 The statistical test may always be used to support ajudgment that a physical
17、reason does actually exist for anoutlier, or the statistical criterion may be used routinely as abasis to initiate action to find a physical cause.5. Basis of Statistical Criteria for Outliers5.1 There are a number of criteria for testing outliers. In allof these, the doubtful observation is include
18、d in the calculationof the numerical value of a sample criterion (or statistic), whichis then compared with a critical value based on the theory ofrandom sampling to determine whether the doubtful observa-tion is to be retained or rejected. The critical value is that valueof the sample criterion whi
19、ch would be exceeded by chancewith some specified (small) probability on the assumption thatall the observations did indeed constitute a random samplefrom a common system of causes, a single parent population,distribution or universe. The specified small probability iscalled the “significance level”
20、 or “percentage point” and can bethought of as the risk of erroneously rejecting a good obser-vation. It becomes clear, therefore, that if there exists a realshift or change in the value of an observation that arises fromnonrandom causes (human error, loss of calibration of instru-ment, change of me
21、asuring instrument, or even change of timeof measurements, etc.), then the observed value of the samplecriterion used would exceed the “critical value” based onrandom-sampling theory. Tables of critical values are usuallygiven for several different significance levels, for example,5 %, 1 %. For stat
22、istical tests of outlying observations, it isgenerally recommended that a low significance level, such as1 %, be used and that significance levels greater than 5 %should not be common practice.NOTE 1In this practice, we will usually illustrate the use of the 5 %significance level. Proper choice of l
23、evel in probability depends on theparticular problem and just what may be involved, along with the risk thatone is willing to take in rejecting a good observation, that is, if thenull-hypothesis stating “all observations in the sample come from thesame normal population” may be assumed correct.5.2 I
24、t should be pointed out that almost all criteria foroutliers are based on an assumed underlying normal (Gaussian)population or distribution. When the data are not normally orapproximately normally distributed, the probabilities associ-ated with these tests will be different. Until such time as crite
25、rianot sensitive to the normality assumption are developed, theexperimenter is cautioned against interpreting the probabilitiestoo literally.5.3 Although our primary interest here is that of detectingoutlying observations, we remark that some of the statisticalcriteria presented may also be used to
26、test the hypothesis ofnormality or that the random sample taken did come from anormal or Gaussian population. The end result is for allpractical purposes the same, that is, we really wish to knowwhether we ought to proceed as if we have in hand a sample ofhomogeneous normal observations.6. Recommend
27、ed Criteria for Single Samples6.1 Let the sample of n observations be denoted in order ofincreasing magnitude by x1# x2# x3# . # xn. Let xnbe thedoubtful value, that is the largest value. The test criterion, Tn,recommended here for a single outlier is as follows:Tn5 xn2 x!/s (1)where:x = arithmetic
28、average of all n values, ands = estimate of the population standard deviation based onthe sample data, calculated as follows:s =(i 5 1nxi2 x!2n 2 15(i 5 1nxi22 n x2n 2 15(i 5 1nxi22 (i 5 1nxi!2/ nn 2 1If x1rather than xnis the doubtful value, the criterion is asfollows:T15 x 2 x1!/s (2)The critical
29、values for either case, for the 1 and 5 % levels ofsignificance, are given in Table 1. Table 1 and the followingtables give the “one-sided” significance levels. In the previoustentative recommended practice (1961), the tables listed valuesof significance levels double those in the present practice,
30、sinceit was considered that the experimenter would test either thelowest or the highest observation (or both) for statisticalsignificance. However, to be consistent with actual practice andin an attempt to avoid further misunderstanding, single-sidedsignificance levels are tabulated here so that bot
31、h viewpointscan be represented.6.2 The hypothesis that we are testing in every case is thatall observations in the sample come from the same normalpopulation. Let us adopt, for example, a significance level of0.05. If we are interested only in outliers that occur on the highside, we should always us
32、e the statistic Tn= (xn x)/s andtake as critical value the 0.05 point of Table 1. On the otherhand, if we are interested only in outliers occurring on the lowside, we would always use the statistic T1= (xx1)/s andagain take as a critical value the 0.05 point of Table 1. Suppose,however, that we are
33、interested in outliers occurring on eitherside, but do not believe that outliers can occur on both sidessimultaneously. We might, for example, believe that at sometime during the experiment something possibly happened tocause an extraneous variation on the high side or on the lowside, but that it wa
34、s very unlikely that two or more such eventscould have occurred, one being an extraneous variation on thehigh side and the other an extraneous variation on the low side.With this point of view we should use the statistic Tn=(xnx)/s or the statistic T1=(x x1)/s whichever is larger. If in thisinstance
35、 we use the 0.05 point of Table 1 as our critical value,the true significance level would be twice 0.05 or 0.10. If wewish a significance level of 0.05 and not 0.10, we must in thiscase use as a critical value the 0.025 point of Table 1. Similarconsiderations apply to the other tests given below.6.2
36、.1 Example 1As an illustration of the use of TnandTable 1, consider the following ten observations on breakingstrength (in pounds) of 0.104-in. hard-drawn copper wire: 568,570, 570, 570, 572, 572, 572, 578, 584, 596. The doubtfulobservation is the high value, x10= 596. Is the value of 596E178022sign
37、ificantly high? The mean is x = 575.2 and the estimatedstandard deviation is s = 8.70. We computeT105 596 2 575.2!/8.70 5 2.39 (3)From Table 1, for n = 10, note that a T10as large as 2.39would occur by chance with probability less than 0.05. In fact,so large a value would occur by chance not much mo
38、re oftenthan 1 % of the time. Thus, the weight of the evidence isagainst the doubtful value having come from the same popu-lation as the others (assuming the population is normallydistributed). Investigation of the doubtful value is thereforeindicated.TABLE 1 Critical Values for T (One-Sided Test) W
39、hen Standard Deviation is Calculated from the Same SampleANumber ofObservations,nUpper 0.1 %SignificanceLevelUpper 0.5 %SignificanceLevelUpper 1 %SignificanceLevelUpper 2.5 %SignificanceLevelUpper 5 %SignificanceLevelUpper 10 %SignificanceLevel3 1.155 1.155 1.155 1.155 1.153 1.1484 1.499 1.496 1.492
40、 1.481 1.463 1.4255 1.780 1.764 1.749 1.715 1.672 1.6026 2.011 1.973 1.944 1.887 1.822 1.7297 2.201 2.139 2.097 2.020 1.938 1.8288 2.358 2.274 2.221 2.126 2.032 1.9099 2.492 2.387 2.323 2.215 2.110 1.97710 2.606 2.482 2.410 2.290 2.176 2.03611 2.705 2.564 2.485 2.355 2.234 2.08812 2.791 2.636 2.550
41、2.412 2.285 2.13413 2.867 2.699 2.607 2.462 2.331 2.17514 2.935 2.755 2.659 2.507 2.371 2.21315 2.997 2.806 2.705 2.549 2.409 2.24716 3.052 2.852 2.747 2.585 2.443 2.27917 3.103 2.894 2.785 2.620 2.475 2.30918 3.149 2.932 2.821 2.651 2.504 2.33519 3.191 2.968 2.854 2.681 2.532 2.36120 3.230 3.001 2.
42、884 2.709 2.557 2.38521 3.266 3.031 2.912 2.733 2.580 2.40822 3.300 3.060 2.939 2.758 2.603 2.42923 3.332 3.087 2.963 2.781 2.624 2.44824 3.362 3.112 2.987 2.802 2.644 2.46725 3.389 3.135 3.009 2.822 2.663 2.48626 3.415 3.157 3.029 2.841 2.681 2.50227 3.440 3.178 3.049 2.859 2.698 2.51928 3.464 3.19
43、9 3.068 2.876 2.714 2.53429 3.486 3.218 3.085 2.893 2.730 2.54930 3.507 3.236 3.103 2.908 2.745 2.56331 3.528 3.253 3.119 2.924 2.759 2.57732 3.546 3.270 3.135 2.938 2.773 2.59133 3.565 3.286 3.150 2.952 2.786 2.60434 3.582 3.301 3.164 2.965 2.799 2.61635 3.599 3.316 3.178 2.979 2.811 2.62836 3.616
44、3.330 3.191 2.991 2.823 2.63937 3.631 3.343 3.204 3.003 2.835 2.65038 3.646 3.356 3.216 3.014 2.846 2.66139 3.660 3.369 3.228 3.025 2.857 2.67140 3.673 3.381 3.240 3.036 2.866 2.68241 3.687 3.393 3.251 3.046 2.877 2.69242 3.700 3.404 3.261 3.057 2.887 2.70043 3.712 3.415 3.271 3.067 2.896 2.71044 3.
45、724 3.425 3.282 3.075 2.905 2.71945 3.736 3.435 3.292 3.085 2.914 2.72746 3.747 3.445 3.302 3.094 2.923 2.73647 3.757 3.455 3.310 3.103 2.931 2.74448 3.768 3.464 3.319 3.111 2.940 2.75349 3.779 3.474 3.329 3.120 2.948 2.76050 3.789 3.483 3.336 3.128 2.956 2.76851 3.798 3.491 3.345 3.136 2.964 2.7755
46、2 3.808 3.500 3.353 3.143 2.971 2.78353 3.816 3.507 3.361 3.151 2.978 2.79054 3.825 3.516 3.368 3.158 2.986 2.798E178023TABLE 1 ContinuedNumber ofObservations,nUpper 0.1 %SignificanceLevelUpper 0.5 %SignificanceLevelUpper 1 %SignificanceLevelUpper 2.5 %SignificanceLevelUpper 5 %SignificanceLevelUppe
47、r 10 %SignificanceLevel55 3.834 3.524 3.376 3.166 2.992 2.80456 3.842 3.531 3.383 3.172 3.000 2.81157 3.851 3.539 3.391 3.180 3.006 2.81858 3.858 3.546 3.397 3.186 3.013 2.82459 3.867 3.553 3.405 3.193 3.019 2.83160 3.874 3.560 3.411 3.199 3.025 2.83761 3.882 3.566 3.418 3.205 3.032 2.84262 3.889 3.
48、573 3.424 3.212 3.037 2.84963 3.896 3.579 3.430 3.218 3.044 2.85464653.9033.9103.5863.5923.4373.4423.2243.2303.0493.0552.8602.86666 3.917 3.598 3.449 3.235 3.061 2.87167 3.923 3.605 3.454 3.241 3.066 2.87768 3.930 3.610 3.460 3.246 3.071 2.88369 3.936 3.617 3.466 3.252 3.076 2.88870 3.942 3.622 3.47
49、1 3.257 3.082 2.89371 3.948 3.627 3.476 3.262 3.087 2.89772 3.954 3.633 3.482 3.267 3.092 2.90373 3.960 3.638 3.487 3.272 3.098 2.90874 3.965 3.643 3.492 3.278 3.102 2.91275 3.971 3.648 3.496 3.282 3.107 2.91776 3.977 3.654 3.502 3.287 3.111 2.92277 3.982 3.658 3.507 3.291 3.117 2.92778 3.987 3.663 3.511 3.297 3.121 2.93179 3.992 3.669 3.516 3.301 3.125 2.93580 3.998 3.673 3.521 3.305 3.130 2.94081 4.002 3.677 3.525 3.309 3.134 2.94582 4.007 3.682 3.529 3.315 3.139 2.94983 4.012 3.687 3.534 3.319 3.143 2.95384 4.017 3.691 3.539 3.323 3.147 2.95785 4.021 3