1、Designation: E2586 14 An American National StandardStandard Practice forCalculating and Using Basic Statistics1This standard is issued under the fixed designation E2586; the number immediately following the designation indicates the year oforiginal adoption or, in the case of revision, the year of l
2、ast revision. A number in parentheses indicates the year of last reapproval. Asuperscript epsilon () indicates an editorial change since the last revision or reapproval.1. Scope1.1 This practice covers methods and equations for comput-ing and presenting basic descriptive statistics using a set ofsam
3、ple data containing a single variable or two variables. Thispractice includes simple descriptive statistics for variable data,tabular and graphical methods for variable data, and methodsfor summarizing simple attribute data. Some interpretation andguidance for use is also included.1.2 The system of
4、units for this practice is not specified.Dimensional quantities in the practice are presented only asillustrations of calculation methods. The examples are notbinding on products or test methods treated.1.3 This standard does not purport to address all of thesafety concerns, if any, associated with
5、its use. It is theresponsibility of the user of this standard to establish appro-priate safety and health practices and determine the applica-bility of regulatory limitations prior to use.2. Referenced Documents2.1 ASTM Standards:2E178 Practice for Dealing With Outlying ObservationsE456 Terminology
6、Relating to Quality and StatisticsE2282 Guide for Defining the Test Result of a Test Method2.2 ISO Standards:3ISO 3534-1 StatisticsVocabulary and Symbols, part 1:Probability and General Statistical TermsISO 3534-2 StatisticsVocabulary and Symbols, part 2:Applied Statistics3. Terminology3.1 Definitio
7、ns:3.1.1 Unless otherwise noted, terms relating to quality andstatistics are as defined in Terminology E456.3.1.2 characteristic, na property of items in a sample orpopulation which, when measured, counted, or otherwiseobserved, helps to distinguish among the items. E22823.1.3 coeffcient of determin
8、ation, nsquare of the correla-tion coefficient, r.3.1.4 coeffcient of variation, CV, nfor a nonnegativecharacteristic, the ratio of the standard deviation to the meanfor a population or sample3.1.4.1 DiscussionThe coefficient of variation is oftenexpressed as a percentage.3.1.4.2 DiscussionThis stat
9、istic is also known as therelative standard deviation, RSD.3.1.5 confidence bound, nsee confidence limit.3.1.6 confidence coeffcient, nsee confidence level.3.1.7 confidence interval, nan interval estimate L, Uwith the statistics L and U as limits for the parameter andwith confidence level 1 , where
10、Pr(L U) 1.3.1.7.1 DiscussionThe confidence level, 1 , reflects theproportion of cases that the confidence interval L, U wouldcontain or cover the true parameter value in a series of repeatedrandom samples under identical conditions. Once L and U aregiven values, the resulting confidence interval eit
11、her does ordoes not contain it. In this sense “confidence“ applies not to theparticular interval but only to the long run proportion of caseswhen repeating the procedure many times.3.1.8 confidence level, nthe value, 1 , of the probabilityassociated with a confidence interval, often expressed as ape
12、rcentage.3.1.8.1 Discussion is generally a small number. Confi-dence level is often 95 % or 99 %.3.1.9 confidence limit, neach of the limits, L and U, of aconfidence interval, or the limit of a one-sided confidenceinterval.3.1.10 correlation coeffecient, nfor a population, ,ademensionless measure of
13、 association between two variables Xand Y, equal to the covariance divided by the product of Xtimes Y.3.1.11 correlation coeffecient, nfor a sample, r, the quan-tity:1This practice is under the jurisdiction of ASTM Committee E11 on Quality andStatistics and is the direct responsibility of Subcommitt
14、ee E11.10 on Sampling /Statistics.Current edition approved June 1, 2014. Published January 2015. Originallyapproved in 2007. Last previous edition approved in 2014 as E2586 13. DOI:10.1520/E2586-14.2For referenced ASTM standards, visit the ASTM website, www.astm.org, orcontact ASTM Customer Service
15、at serviceastm.org. For Annual Book of ASTMStandards volume information, refer to the standards Document Summary page onthe ASTM website.3Available from American National Standards Institute (ANSI), 25 W. 43rd St.,4th Floor, New York, NY 10036, http:/www.ansi.org.Copyright ASTM International, 100 Ba
16、rr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States1x 2 x!y 2 y!n 2 1!sxsy(1)3.1.12 covariance, nof a population, cov (X, Y), for twovariables, X and Y, the expected value of (X X)(Y Y).3.1.13 covariance, nof a sample; the quantity:x 2 x!y 2 y!n 2 1!(2)3.1.14 dependent vari
17、able, na variable to be predictedusing an equation.3.1.15 degrees of freedom, nthe number of independentdata points minus the number of parameters that have to beestimated before calculating the variance.3.1.16 estimate, nsample statistic used to approximate apopulation parameter.3.1.17 histogram, n
18、graphical representation of the fre-quency distribution of a characteristic consisting of a set ofrectangles with area proportional to the frequency. ISO 3534-13.1.17.1 DiscussionWhile not required, equal bar or classwidths are recommended for histograms.3.1.18 independent variable, na variable used
19、 to predictanother using an equation.3.1.19 interquartile range, IQR, nthe 75thpercentile (0.75quantile) minus the 25thpercentile (0.25 quantile), for a dataset.3.1.20 kurtosis, 2,g2,nfor a population or a sample, ameasure of the weight of the tails of a distribution relative tothe center, calculate
20、d as the ratio of the fourth central moment(empirical if a sample, theoretical if a population applies) to thestandard deviation (sample, s, or population, ) raised to thefourth power, minus 3 (also referred to as excess kurtosis).3.1.21 mean, nof a population, , average or expectedvalue of a charac
21、teristic in a population of a sample, x, sumof the observed values in the sample divided by the samplesize.3.1.22 median,X,nthe 50thpercentile in a population orsample.3.1.22.1 DiscussionThe sample median is the (n + 1) 2order statistic if the sample size n is odd and is the average ofthe n/2 and n/
22、2 + 1 order statistics if n is even.3.1.23 midrange, naverage of the minimum and maxi-mum values in a sample.3.1.24 order statistic, x(k),nvalue of the kthobserved valuein a sample after sorting by order of magnitude.3.1.24.1 DiscussionFor a sample of size n, the first orderstatistic x(1)is the mini
23、mum value, x(n)is the maximum value.3.1.25 parameter, nsee population parameter.3.1.26 percentile, nquantile of a sample or a population,for which the fraction less than or equal to the value isexpressed as a percentage.3.1.27 population, nthe totality of items or units ofmaterial under consideratio
24、n.3.1.28 population parameter, nsummary measure of thevalues of some characteristic of a population. ISO 3534-23.1.29 prediction interval, nan interval for a future valueor set of values, constructed from a current set of data, in a waythat has a specified probability for the inclusion of the future
25、value.3.1.30 regression, nthe process of estimating parameter(s)of an equation using a set of date.3.1.31 residual, nobserved value minus fitted value, whena model is used.3.1.32 statistic, nsee sample statistic.3.1.33 quantile, nvalue such that a fraction f of the sampleor population is less than o
26、r equal to that value.3.1.34 range, R, nmaximum value minus the minimumvalue in a sample.3.1.35 sample, na group of observations or test results,taken from a larger collection of observations or test results,which serves to provide information that may be used as a basisfor making a decision concern
27、ing the larger collection.3.1.36 sample size, n, nnumber of observed values in thesample3.1.37 sample statistic, nsummary measure of the ob-served values of a sample.3.1.38 skewness, 1,g1,nfor population or sample, ameasure of symmetry of a distribution, calculated as the ratioof the third central m
28、oment (empirical if a sample, andtheoretical if a population applies) to the standard deviation(sample, s, or population, ) raised to the third power.3.1.39 standard errorstandard deviation of the populationof values of a sample statistic in repeated sampling, or anestimate of it.3.1.39.1 Discussion
29、If the standard error of a statistic isestimated, it will itself be a statistic with some variance thatdepends on the sample size.3.1.40 standard deviationof a population, , the squareroot of the average or expected value of the squared deviationof a variable from its mean; of a sample, s, the squar
30、e rootof the sum of the squared deviations of the observed values inthe sample divided by the sample size minus 1.3.1.41 variance, 2,s2,nsquare of the standard deviationof the population or sample.3.1.41.1 DiscussionFor a finite population, 2is calcu-lated as the sum of squared deviations of values
31、from the mean,divided by n. For a continuous population, 2is calculated byintegrating (x )2with respect to the density function. For asample, s2is calculated as the sum of the squared deviations ofobserved values from their average divided by one less than thesample size.3.1.42 Z-score, nobserved va
32、lue minus the sample meandivided by the sample standard deviation.4. Significance and Use4.1 This practice provides approaches for characterizing asample of n observations that arrive in the form of a data set.Large data sets from organizations, businesses, and govern-mental agencies exist in the fo
33、rm of records and otherempirical observations. Research institutions and laboratoriesE2586 142at universities, government agencies, and the private sectoralso generate considerable amounts of empirical data.4.1.1 A data set containing a single variable usually consistsof a column of numbers. Each ro
34、w is a separate observation orinstance of measurement of the variable. The numbers them-selves are the result of applying the measurement process to thevariable being studied or observed. We may refer to eachobservation of a variable as an item in the data set. In manysituations, there may be severa
35、l variables defined for study.4.1.2 The sample is selected from a larger set called thepopulation. The population can be a finite set of items, a verylarge or essentially unlimited set of items, or a process. In aprocess, the items originate over time and the population isdynamic, continuing to emer
36、ge and possibly change over time.Sample data serve as representatives of the population fromwhich the sample originates. It is the population that is ofprimary interest in any particular study.4.2 The data (measurements and observations) may be ofthe variable type or the simple attribute type. In th
37、e case ofattributes, the data may be either binary trials or a count of adefined event over some interval (time, space, volume, weight,or area). Binary trials consist of a sequence of 0s and 1s inwhich a “1” indicates that the inspected item exhibited theattribute being studied and a “0” indicates t
38、he item did notexhibit the attribute. Each inspection item is assigned either a“0” or a “1.” Such data are often governed by the binomialdistribution. For a count of events over some interval, thenumber of times the event is observed on the inspectioninterval is recorded for each of n inspection int
39、ervals. ThePoisson distribution often governs counting events over aninterval.4.3 For sample data to be used to draw conclusions aboutthe population, the process of sampling and data collectionmust be considered, at least potentially, repeatable. Descriptivestatistics are calculated using real sampl
40、e data that will vary inrepeating the sampling process. As such, a statistic is a randomvariable subject to variation in its own right. The samplestatistic usually has a corresponding parameter in the popula-tion that is unknown (see Section 5). The point of using astatistic is to summarize the data
41、 set and estimate a correspond-ing population characteristic or parameter.4.4 Descriptive statistics consider numerical, tabular, andgraphical methods for summarizing a set of data. The methodsconsidered in this practice are used for summarizing theobservations from a single variable.4.5 The descrip
42、tive statistics described in this practice are:4.5.1 Mean, median, min, max, range, mid range, orderstatistic, quartile, empirical percentile, quantile, interquartilerange, variance, standard deviation, Z-score, coefficient ofvariation, skewness and kurtosis, and standard error.4.6 Tabular methods d
43、escribed in this practice are:4.6.1 Frequency distribution, relative frequencydistribution, cumulative frequency distribution, and cumulativerelative frequency distribution.4.7 Graphical methods described in this practice are:4.7.1 Histogram, ogive, boxplot, dotplot, normal probabilityplot, and q-q
44、plot.4.8 While the methods described in this practice may beused to summarize any set of observations, the results obtainedby using them may be of little value from the standpoint ofinterpretation unless the data quality is acceptable and satisfiescertain requirements. To be useful for inductive gen
45、eralization,any sample of observations that is treated as a single group forpresentation purposes must represent a series of measurements,all made under essentially the same test conditions, on amaterial or product, all of which have been produced underessentially the same conditions. When these cri
46、teria are met,we are minimizing the danger of mixing two or more distinctlydifferent sets of data.4.8.1 If a given collection of data consists of two or moresamples collected under different test conditions or represent-ing material produced under different conditions (that is,different populations)
47、, it should be considered as two or moreseparate subgroups of observations, each to be treated inde-pendently in a data analysis program. Merging of suchsubgroups, representing significantly different conditions, maylead to a presentation that will be of little practical value.Briefly, any sample of
48、 observations to which these methods areapplied should be homogeneous or, in the case of a process,have originated from a process in a state of statistical control.4.9 The methods developed in Sections 6, 7, and 8 apply tothe sample data. There will be no misunderstanding when, forexample, the term
49、“mean” is indicated, that the meaning issample mean, not population mean, unless indicated otherwise.It is understood that there is a data set containing n observa-tions. The data set may be denoted as:x1, x2, x3 xn(3)4.9.1 There is no order of magnitude implied by thesubscript notation unless subscripts are contained in parenthe-sis (see 6.7).5. Characteristics of Populations5.1 A population is the totality of a set of items underconsideration. Populations may b