ASTM E2586-2018 Standard Practice for Calculating and Using Basic Statistics.pdf

资源描述

1、Designation: E2586 16E2586 18 An American National StandardStandard Practice forCalculating and Using Basic Statistics1This standard is issued under the fixed designation E2586; the number immediately following the designation indicates the year oforiginal adoption or, in the case of revision, the y

2、ear of last revision. A number in parentheses indicates the year of last reapproval. Asuperscript epsilon () indicates an editorial change since the last revision or reapproval.1. Scope1.1 This practice covers methods and equations for computing and presenting basic descriptive statistics using a se

3、t of sampledata containing a single variable. statistics. This practice includes simple descriptive statistics for variable data, and attribute data,elementary methods of statistical inference, and tabular and graphical methods for variable data, and methods for summarizingsimple attribute data. Som

4、e interpretation and guidance for use is also included.1.2 The system of units for this practice is not specified. Dimensional quantities in the practice are presented only as illustrationsof calculation methods. The examples are not binding on products or test methods treated.1.3 This standard does

5、 not purport to address all of the safety concerns, if any, associated with its use. It is the responsibilityof the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability ofregulatory limitations prior to use.1.4 This international

6、standard was developed in accordance with internationally recognized principles on standardizationestablished in the Decision on Principles for the Development of International Standards, Guides and Recommendations issuedby the World Trade Organization Technical Barriers to Trade (TBT) Committee.2.

7、Referenced Documents2.1 ASTM Standards:2E178 Practice for Dealing With Outlying ObservationsE456 Terminology Relating to Quality and StatisticsE2234 Practice for Sampling a Stream of Product by Attributes Indexed by AQLE2282 Guide for Defining the Test Result of a Test MethodE3080 Practice for Regre

8、ssion Analysis2.2 ISO Standards:3ISO 3534-1 StatisticsVocabulary and Symbols, part 1: Probability and General Statistical TermsISO 3534-2 StatisticsVocabulary and Symbols, part 2: Applied Statistics3. Terminology3.1 DefinitionsUnless otherwise noted, terms relating to quality and statistics are as d

9、efined in Terminology E456.3.1.1 alternative hypothesis, Ha, na probability distribution or type of probability distribution distinguished from the nullhypothesis.3.1.1.1 DiscussionThe alternative hypothesis is typically a research hypothesis or a statement that we hope to show is more plausible tha

10、n the nullhypothesis using real data.3.1.2 characteristic, na property of items in a sample or population which, when measured, counted, or otherwise observed,helps to distinguish among the items. E22821 This practice is under the jurisdiction ofASTM Committee E11 on Quality and Statistics and is th

11、e direct responsibility of Subcommittee E11.10 on Sampling / Statistics.Current edition approved Nov. 1, 2016May 15, 2018. Published November 2016May 2018. Originally approved in 2007. Last previous edition approved in 20142016 asE2586 14.E2586 16. DOI: 10.1520/E2586-16.10.1520/E2586-18.2 For refere

12、ncedASTM standards, visit theASTM website, www.astm.org, or contactASTM Customer Service at serviceastm.org. For Annual Book of ASTM Standardsvolume information, refer to the standards Document Summary page on the ASTM website.3 Available from American National Standards Institute (ANSI), 25 W. 43rd

13、 St., 4th Floor, New York, NY 10036, http:/www.ansi.org.This document is not an ASTM standard and is intended only to provide the user of an ASTM standard an indication of what changes have been made to the previous version. Becauseit may not be technically possible to adequately depict all changes

14、accurately, ASTM recommends that users consult prior editions as appropriate. In all cases only the current versionof the standard as published by ASTM is to be considered the official document.Copyright ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United

15、 States13.1.3 coeffcient of variation, CV, nfor a nonnegative characteristic, the ratio of the standard deviation to the mean for apopulation or sample3.1.3.1 DiscussionThe coefficient of variation is often expressed as a percentage.3.1.3.2 DiscussionThis statistic is also known as the relative stan

16、dard deviation, RSD.3.1.4 confidence bound, nsee confidence limit.3.1.5 confidence coeffcient, nsee confidence level.3.1.6 confidence interval, nan interval estimate L, U with the statistics L and U as limits for the parameter and withconfidence level 1 , where Pr(L U) 1 .3.1.6.1 DiscussionThe confi

17、dence level, 1 , reflects the proportion of cases that the confidence interval L, U would contain or cover the trueparameter value in a series of repeated random samples under identical conditions. Once L and U are given values, the resultingconfidence interval either does or does not contain it. In

18、 this sense “confidence” applies not to the particular interval but only tothe long run proportion of cases when repeating the procedure many times.3.1.7 confidence level, nthe value, 1 , of the probability associated with a confidence interval, often expressed as apercentage.3.1.7.1 Discussion is g

19、enerally a small number. Confidence level is often 95 % or 99 %.3.1.8 confidence limit, neach of the limits, L and U, of a confidence interval, or the limit of a one-sided confidence interval.3.1.9 critical value, nin hypothesis testing, the boundary (number) of the rejection region for a test stati

20、stic in a hypothesistest.3.1.10 degrees of freedom, df, nthe number of independent data points minus the number of parameters that have to beestimated before calculating the variance.3.1.11 estimate, nsample statistic used to approximate a population parameter.3.1.12 histogram, ngraphical representa

21、tion of the frequency distribution of a characteristic consisting of a set of rectangleswith area proportional to the frequency. ISO 3534-13.1.12.1 DiscussionWhile not required, equal bar or class widths are recommended for histograms.3.1.13 interquartile range, IQR, nthe 75th percentile (0.75 quant

22、ile) minus the 25th percentile (0.25 quantile), for a data set.3.1.14 kurtosis, 2, g2, nfor a population or a sample, a measure of the weight of the tails of a distribution relative to the center,calculated as the ratio of the fourth central moment (empirical if a sample, theoretical if a population

23、 applies) to the standarddeviation (sample, s, or population, ) raised to the fourth power, minus 3 (also referred to as excess kurtosis).3.1.15 mean, nof a population, , average or expected value of a characteristic in a population of a sample,X, sum of theobserved values in the sample divided by t

24、he sample size.3.1.16 median, X, nthe 50th percentile in a population or sample.3.1.16.1 DiscussionThe sample median is the (n + 1) 2 order statistic if the sample size n is odd and is the average of the n/2 and n/2 + 1 orderstatistics if n is even.3.1.17 midrange, naverage of the minimum and maximu

25、m values in a sample.E2586 1823.1.18 null hypothesis, H0, na statement about a parameter of a probability distribution or about the type of probabilitydistribution, tentatively regarded as true until rejected using a statistical hypothesis test.3.1.19 order statistic, x(k), nvalue of the kth observe

26、d value in a sample after sorting by order of magnitude.3.1.19.1 DiscussionFor a sample of size n, the first order statistic x(1) is the minimum value, x(n) is the maximum value.3.1.20 parameter, nsee population parameter.3.1.21 percentile, nquantile of a sample or a population, for which the fracti

27、on less than or equal to the value is expressedas a percentage.3.1.22 population, nthe totality of items or units of material under consideration.3.1.23 population parameter, nsummary measure of the values of some characteristic of a population. ISO 3534-23.1.24 power, nin hypothesis testing, the pr

28、obability that a statistical hypothesis test rejects a null hypothesis, calculated usingan alternative hypothesis.3.1.25 prediction interval, nan interval for a future value or set of values, constructed from a current set of data, in a way thathas a specified probability for the inclusion of the fu

29、ture value.3.1.26 p-value, nin hypothesis testing, the probability of observing a test statistic at least as extreme as what was actuallyobtained, under the assumption of the null hypothesis.3.1.26.1 Discussionp-value must not be thought of as the probability the null hypothesis is true.3.1.27 quant

30、ile, nvalue such that a fraction f of the sample or population is less than or equal to that value.3.1.28 range, R, nmaximum value minus the minimum value in a sample.3.1.29 residual, nobserved value minus fitted value, when a model is used. E30803.1.30 sample, na group of observations or test resul

31、ts, taken from a larger collection of observations or test results, whichserves to provide information that may be used as a basis for making a decision concerning the larger collection.3.1.31 sample size, n, nnumber of observed values in the sample.3.1.32 sample statistic, nsummary measure of the o

32、bserved values of a sample.3.1.33 significance level, , nthe probability a hypothesis test would reject the null hypothesis, based on the distribution ofthe test statistic and assuming the null hypothesis to be true.3.1.33.1 DiscussionFor a composite hypothesis, the maximum probability of rejecting.

33、3.1.34 skewness, 1, g1, nfor population or sample, a measure of symmetry of a distribution, calculated as the ratio of the thirdcentral moment (empirical if a sample, and theoretical if a population applies) to the standard deviation (sample, s, or population,) raised to the third power.3.1.35 stand

34、ard errorstandard deviation of the population of values of a sample statistic in repeated sampling, or an estimateof it.3.1.35.1 DiscussionIf the standard error of a statistic is estimated, it will itself be a statistic with some variance that depends on the sample size.3.1.36 standard deviationof a

35、 population, , the square root of the average or expected value of the squared deviation of avariable from its mean; of a sample, s, the square root of the sum of the squared deviations of the observed values in the samplefrom their mean divided by the sample size minus 1.3.1.37 statistic, nsee samp

36、le statistic.3.1.38 statistical hypothesis test, na procedure and decision criteria used to decide whether or not to reject a null hypothesis.3.1.38.1 DiscussionE2586 183Synonyms include statistical test, hypothesis test, and significance test.3.1.39 test statistic, na statistic, calculable from the

37、 sample observations of the variable of interest, whose probabilitydistribution is known under the assumption of a null hypothesis.3.1.40 type I error, nthe error of rejecting a null hypothesis when it is actually true.3.1.41 type II error, nthe error of not rejecting a null hypothesis when it is ac

38、tually false.3.1.42 variance, 2, s2, nsquare of the standard deviation of the population or sample.3.1.42.1 DiscussionFor a finite population, 2 is calculated as the sum of squared deviations of values from the mean, divided by n. For a continuouspopulation, 2 is calculated by integrating (x )2 with

39、 respect to the density function. For a sample, s2 is calculated as the sumof the squared deviations of observed values from their average divided by one less than the sample size.3.1.43 Z-score, nobserved value minus the sample mean divided by the sample standard deviation.4. Significance and Use4.

40、1 This practice provides approaches for characterizing a sample of n observations that arrive in the form of a data set. Largedata sets from organizations, businesses, and governmental agencies exist in the form of records and other empirical observations.Research institutions and laboratories at un

41、iversities, government agencies, and the private sector also generate considerableamounts of empirical data.4.1.1 A data set containing a single variable usually consists of a column of numbers. Each row is a separate observation orinstance of measurement of the variable. The numbers themselves are

42、the result of applying the measurement process to thevariable being studied or observed. We may refer to each observation of a variable as an item in the data set. In many situations,there may be several variables defined for study.4.1.2 The sample is selected from a larger set called the population

43、 The population can be a finite set of items, a very largeor essentially unlimited set of items, or a process. In a process, the items originate over time and the population is dynamic,continuing to emerge and possibly change over time. Sample data serve as representatives of the population from wh

44、ich the sampleoriginates. It is the population that is of primary interest in any particular study.4.2 The data (measurements and observations) may be of the variable type or the simple attribute type. In the case of attributes,the data may be either binary trials or a count of a defined event over

45、some interval (time, space, volume, weight, or area). Binarytrials consist of a sequence of 0s and 1s in which a “1” indicates that the inspected item exhibited the attribute being studied anda “0” indicates the item did not exhibit the attribute. Each inspection item is assigned either a “0” or a “

46、1.” Such data are oftengoverned by the binomial distribution. For a count of events over some interval, the number of times the event is observed on theinspection interval is recorded for each of n inspection intervals. The Poisson distribution often governs counting events over aninterval.4.3 For s

47、ample data to be used to draw conclusions about the population, the process of sampling and data collection must beconsidered, at least potentially, repeatable. Descriptive statistics are calculated using real sample data that will vary in repeatingthe sampling process. As such, a statistic is a ran

48、dom variable subject to variation in its own right. The sample statistic usuallyhas a corresponding parameter in the population that is unknown (see Section 5). The point of using a statistic is to summarizethe data set and estimate a corresponding population characteristic or parameter.parameter, o

49、r to test a hypothesis.4.4 Descriptive statistics consider numerical, tabular, and graphical methods for summarizing a set of data. The methodsconsidered in this practice are used for summarizing the observations from a single variable.4.4 The Descriptive statistics consider numerical, tabular, and graphical methods for summarizing a set of data. The methodsconsidered in this practice are used for summarizing the observations from a single variable. The descriptive

展开阅读全文