ASTM E2586-2007 Standard Practice for Calculating and Using Basic Statistics《计算和使用基础统计表的标准实施规程》.pdf

资源描述

1、Designation: E 2586 07Standard Practice forCalculating and Using Basic Statistics1This standard is issued under the fixed designation E 2586; the number immediately following the designation indicates the year oforiginal adoption or, in the case of revision, the year of last revision. A number in pa

2、rentheses indicates the year of last reapproval. Asuperscript epsilon (e) indicates an editorial change since the last revision or reapproval.1. Scope1.1 This practice covers methods and formulas for comput-ing and presenting basic descriptive statistics using a set ofsample data containing a single

3、 variable. This practice includessimple descriptive statistics for variable data, tabular andgraphical methods for variable data, and methods for summa-rizing simple attribute data. Some interpretation and guidancefor use is also included.1.2 The system of units for this Practice is not specified.Di

4、mensional quantities in the Practice are presented only asillustrations of calculation methods. The examples are notbinding on products or test methods treated.1.3 This standard does not purport to address all of thesafety concerns, if any, associated with its use. It is theresponsibility of the use

5、r of this standard to establish appro-priate safety and health practices and determine the applica-bility of regulatory limitations prior to use.2. Referenced Documents2.1 ASTM Standards:2E 178 Practice for Dealing With Outlying ObservationsE 456 Terminology Relating to Quality and Statistics2.2 ISO

6、 Standards3ISO 3534-1 StatisticsVocabulary and Symbols, part 1:Probability and General Statistical TermsISO 3534-2 StatisticsVocabulary and Symbols, part 2:Applied Statistics3. Terminology3.1 Definitions: Unless otherwise noted, terms relating toquality and statistics are as defined in Terminology E

7、 456.3.1.1 coeffcient of variation,CV, nfor a nonnegative char-acteristic, the ratio of the standard deviation to the mean for apopulation or sample3.1.1.1 DiscussionThe coefficient of variation is oftenexpressed as a percentage.3.1.1.2 DiscussionThis statistic is also known as therelative standard

8、deviation, RSD.3.1.2 characteristic, na property of items in a sample orpopulation which, when measured, counted, or otherwiseobserved, helps to distinguish between the items. E 22823.1.3 empirical percentile, nestimate of a populationpercentile using the sample data. This is a sample value suchthat

9、 a percentage p of the sample is less than that value.3.1.4 histogram, ngraphical representation of the fre-quency distribution of a characteristic consisting of a set ofrectangles with area proportional to the frequency. ISO 3534-13.1.4.1 DiscussionWhile not required, equal bar or classwidths are r

10、ecommended for histograms.3.1.5 interquartile range, IQR, nthe 75thpercentile (0.75quantile) minus the 25thpercentile (0.25 quantile), for a dataset.3.1.6 kurtosis, g2,g2, nfor a population or a sample, ameasure of the weight of the tails of a distribution relative tothe center, calculated as the ra

11、tio of the fourth central moment(empirical if a sample, theoretical if a population applies) to thestandard deviation (sample, s, or population, s) raised to thefourth power, minus 3 (also referred to as excess kurtosis).3.1.7 mean, nof a population, , average or expectedvalue of a characteristic in

12、 a population of a sample, x, sumof the observed values in the sample divided by the samplesize.3.1.8 median, X, nthe 50thpercentile in a population orsample.3.1.8.1 DiscussionThe sample median is the (n+1)/2order statistic if the sample size n is odd and is the average ofthe n/2 and n/2+1 order sta

13、tistics if n is even.3.1.9 midrange, naverage of the minimum and maximumvalues in a sample.3.1.10 order statistic, x(k), nvalue of the kthobservedvalue in a sample after sorting by order of magnitude.3.1.10.1 DiscussionFor a sample of size n, the first orderstatistic x(1)is the minimum value, x(n)is

14、 the maximum value.3.1.11 population parameter, nsummary measure of thevalues of some characteristic of a population ISO 3534-23.1.12 population, nthe totality of items or units ofmaterial under consideration.1This practice is under the jurisdiction of ASTM Committee E11 on Quality andStatistics and

15、 is the direct responsibility of Subcommittee E11.10 on Sampling.Current edition approved Oct. 1, 2007. Published October 20072For referenced ASTM standards, visit the ASTM website, www.astm.org, orcontact ASTM Customer Service at serviceastm.org. For Annual Book of ASTMStandards volume information,

16、 refer to the standards Document Summary page onthe ASTM website.3Available from the American National Standards Institute, 25 W. 43rdSt., 4thFloor, New York, NY 10036.1Copyright ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959, United States.3.1.13 quantile,

17、nvalue such that a fraction f of the sampleor population is less than or equal to that value3.1.14 range, R, nmaximum value minus the minimumvalue in a sample.3.1.15 sample, na group of observations or test results,taken from a larger collection of observations or test results,which serves to provid

18、e information that may be used as a basisfor making a decision concerning the larger collection.3.1.16 sample size, n, nnumber of observed values in thesample3.1.17 sample statistic, nsummary measure of the ob-served values of a sample.3.1.18 skewness, g1,g1, nfor population or sample, ameasure of s

19、ymmetry of a distribution, calculated as the ratioof the third central moment (empirical if a sample, andtheoretical if a population applies) to the standard deviation(sample, s, or population, s) raised to the third power.3.1.19 standard deviationof a population, s, the squareroot of the average or

20、 expected value of the squared deviationof a variable from its mean of a sample x, the square rootof the sum of the squared deviations of the observed values inthe sample divided by the sample size minus 1.3.1.20 variance, s2,s2, nsquare of the standard deviationof the population or sample.3.1.20.1

21、DiscussionFor a finite population, s2is calcu-lated as the sum of squared deviations of values from the mean,divided by n. For a continuous population, s2is calculated byintegrating (x-)2with respect to the density function. For asample, s2is calculated as the sum of the squared deviations ofobserve

22、d values from their average divided by one less than thesample size.3.1.21 Z-score, nobserved value minus the sample meandivided by the sample standard deviation.4. Significance and Use4.1 This practice provides approaches for characterizing asample of n observations that arrive in the form of a dat

23、a set.Large data sets from organizations, businesses, and govern-mental agencies exist in the form of records and otherempirical observations. Research institutions and laboratoriesat universities, government agencies, and the private sectoralso generate considerable amounts of empirical data.4.1.1

24、Adata set containing a single variable usually consistsof a column of numbers. Each row is a separate observation orinstance of measurement of the variable. The numbers them-selves are the result of applying the measurement process to thevariable being studied or observed. We may refer to eachobserv

25、ation of a variable as an item in the data set. In manysituations, there may be several variables defined for study.4.1.2 The sample is selected from a larger set called thepopulation. The population can be a finite set of items, a verylarge or essentially unlimited set of items, or a process. In ap

26、rocess, the items originate over time and the population isdynamic, continuing to emerge and possibly change over time.Sample data serve as representatives of the population fromwhich the sample originates. It is the population that is ofprimary interest in any particular study.4.2 The data (measure

27、ments and observations) may be ofthe variable type or the simple attribute type. In the case ofattributes, the data may be either binary trials or a count of adefined event over some interval (time, space, volume, weight,or area). Binary trials consist of a sequence of 0s and 1s inwhich a “1” indica

28、tes that the inspected item exhibited theattribute being studied and a “0” indicates the item did notexhibit the attribute. Each inspection item is assigned either a“0” or a “1.” Such data are often governed by the binomialdistribution. For a count of events over some interval, thenumber of times th

29、e event is observed on the inspectioninterval is recorded for each of n inspection intervals. ThePoisson distribution often governs counting events over aninterval.4.3 For sample data to be used to draw conclusions aboutthe population, the process of sampling and data collectionmust be considered, a

30、t least potentially, repeatable. Descriptivestatistics are calculated using real sample data that will vary inrepeating the sampling process. As such, a statistic is a randomvariable subject to variation in its own right. The samplestatistic usually has a corresponding parameter in the popula-tion t

31、hat is unknown (see Section 5). The point of using astatistic is to summarize the data set and estimate a correspond-ing population characteristic or parameter.4.4 Descriptive statistics consider numerical, tabular, andgraphical methods for summarizing a set of data. The methodsconsidered in this pr

32、actice are used for summarizing theobservations from a single variable.4.5 The descriptive statistics described in this practice are:4.5.1 Mean, median, min, max, range, mid range, orderstatistic, quartile, empirical percentile, quantile, interquartilerange, variance, standard deviation, Z-score, co

33、efficient ofvariation, skewness and kurtosis, and standard error.4.6 Tabular methods described in this practice are:4.6.1 Frequency distribution, relative frequency distribu-tion, cumulative frequency distribution, and cumulative rela-tive frequency distribution.4.7 Graphical methods described in th

34、is practice are:4.7.1 Histogram, ogive, boxplot, dotplot, normal probabilityplot, and q-q plot.4.8 While the methods described in this practice may beused to summarize any set of observations, the results obtainedby using them may be of little value from the standpoint ofinterpretation unless the da

35、ta quality is acceptable and satisfiescertain requirements. To be useful for inductive generalization,any sample of observations that is treated as a single group forpresentation purposes must represent a series of measurements,all made under essentially the same test conditions, on amaterial or pro

36、duct, all of which have been produced underessentially the same conditions. When these criteria are met,we are minimizing the danger of mixing two distinctly differentsets of data.4.8.1 If a given collection of data consists of two or moresamples collected under different test conditions or represen

37、t-ing material produced under different conditions (that is,different populations), it should be considered as two or moreseparate subgroups of observations, each to be treated inde-pendently in a data analysis program. Merging of such sub-groups, representing significantly different conditions, may

38、lead to a presentation that will be of little practical value.E2586072Briefly, any sample of observations to which these methods areapplied should be homogeneous or, in the case of a process,have originated from a process in a state of statistical control.4.9 The methods developed in Sections 6, 7,

39、and 8 apply tothe sample data. There will be no misunderstanding when, forexample, the term “mean” is indicated, that the meaning issample mean, not population mean, unless indicated otherwise.It is understood that there is a data set containing n observa-tions. The data set may be denoted as:x1,x2,

40、x3.xn(1)4.9.1 There is no order of magnitude implied by thesubscript notation unless subscripts are contained in parenthe-sis (see 6.7).5. Characteristics of Populations5.1 A population is the totality of a set of items underconsideration. Populations may be finite or unlimited in sizeand may be exi

41、sting or continuing to emerge as, for example,in a process. For continuous variables, X, representing anessentially unlimited population or a process, the population ismathematically characterized by a probability density function,f (x). The density function visually describes the shape of thedistri

42、bution as for example in Fig. 1. Mathematically, the onlyrequirements of a density function are that its ordinates be allpositive and that the total area under the curve be equal to 1.5.1.1 Area under the density function curve is equivalent toprobability for the variable X. The probability that X s

43、hall occurbetween any two values, say s and t, is given by the area underthe curve bounded by the two given values of s and t. This isexpressed mathematically as a definite integral over the densityfunction between s and t:Ps , X#t! 5*stfx!dx (2)5.1.2 A great variety of distribution shapes are theor

44、eticallypossible. When the curve is symmetric, we say that thedistribution is symmetric; otherwise, it is asymmetric. Adistribution having a longer tail on the right side is called rightskewed; a distribution having a longer tail on the left is calledleft skewed.5.1.3 For a given density function, f

45、 (x), the relationship tocumulative area under the curve may be graphically shown inthe form of a cumulative distribution function, F (x). Thefunction F (x) plots the cumulative area under f (x) as x movesto the right. Fig. 2 shows a symmetric distribution with itsdensity function, f (x), plotted on

46、 the left-hand axis anddistribution function, F (x), plotted on the right-hand axis.5.1.4 Referring to the F (x) axis in Fig. 2, observe that F(30) = 0.5. The point x = 30 divides the distribution into twoequal halves with respect to probability (50 % on each side ofx). In general, where F (x) = 0.5

47、 we call the point x the medianor 50thpercentile of the distribution. In like manner, we maydefine any percentile, for example, the 25thor the 90thpercentiles. In general, for 0 0asinFig. 4. If the distribution has a longertail on the left, we say that it is left skewed and g10E2586074parameters us

48、ing the sample data. These estimates are calleddescriptive statistics. For example, the sample mean andstandard deviation are attempting to estimate the parameters and s, sample skewness and kurtosis are attempting to estimateg1and g2, and sample percentiles may be calculated that areattempting to e

49、stimate population percentiles. In some cases,there may be more than one statistic that may be used for thesame purpose.5.4.1 In addition to estimation, descriptive statistics serve toorganize and give meaning to the raw sample data. By itself aset of numbers in columnar format may yield little usefulinformation. The methods of descriptive statistics includenumerical, tabular, and graphical methods that will lead togreat insight for the underlying phenomena being studied.6. Descriptive Statistics6.1 Mean or Arithmetic AverageThe mean is a m

展开阅读全文