1、Designation: E3080 16E3080 17 An American National StandardStandard Practice forRegression Analysis1This standard is issued under the fixed designation E3080; the number immediately following the designation indicates the year oforiginal adoption or, in the case of revision, the year of last revisio
2、n. A number in parentheses indicates the year of last reapproval. Asuperscript epsilon () indicates an editorial change since the last revision or reapproval.1. Scope1.1 This practice covers regression analysis methodology for estimating, evaluating, and using the simple linear regressionmodel to de
3、fine the statistical relationship between two numerical variables.1.2 The system of units for this practice is not specified. Dimensional quantities in the practice are presented only as illustrationsof calculation methods. The examples are not binding on products or test methods treated.1.3 This st
4、andard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibilityof the user of this standard to establish appropriate safety safety, health, and healthenvironmental practices and determine theapplicability of regulatory limitations prior to use
5、.1.4 This international standard was developed in accordance with internationally recognized principles on standardizationestablished in the Decision on Principles for the Development of International Standards, Guides and Recommendations issuedby the World Trade Organization Technical Barriers to T
6、rade (TBT) Committee.2. Referenced Documents2.1 ASTM Standards:2E178 Practice for Dealing With Outlying ObservationsE456 Terminology Relating to Quality and StatisticsE2282 Guide for Defining the Test Result of a Test MethodE2586 Practice for Calculating and Using Basic Statistics3. Terminology3.1 D
7、efinitionsUnless otherwise noted, terms relating to quality and statistics are as defined in Terminology E456.3.1.1 characteristic, na property of items in a sample or population which, when measured, counted, or otherwise observed,helps to distinguish among the items. E22823.1.1 coeffcient of deter
8、mination, r2, nsquare of the correlation coefficient.3.1.3 confidence interval, nan interval estimate L, U with the statistics L and U as limits for the parameter and withconfidence level 1 , where Pr(L U) 1 . E25863.1.3.1 DiscussionThe confidence level, 1 , reflects the proportion of cases that the
9、 confidence interval L, U would contain or cover the trueparameter value in a series of repeated random samples under identical conditions. Once L and U are given values, the resultingconfidence interval either does or does not contain it. In this sense “confidence” applies not to the particular int
10、erval but only tothe long run proportion of cases when repeating the procedure many times.3.1.4 confidence level, nthe value, 1 , of the probability associated with a confidence interval, often expressed as apercentage. E25863.1.4.1 Discussion1 This practice is under the jurisdiction ofASTM Committe
11、e E11 on Quality and Statistics and is the direct responsibility of Subcommittee E11.10 on Sampling / Statistics.Current edition approved Nov. 1, 2016Nov. 1, 2017. Published November 2016January 2018. Originally approved in 2019. Last previous edition approved in 2016 asE3080 16. DOI: 10.1520/E3080-
12、16.10.1520/E3080-17.2 For referencedASTM standards, visit theASTM website, www.astm.org, or contactASTM Customer Service at serviceastm.org. For Annual Book of ASTM Standardsvolume information, refer to the standards Document Summary page on the ASTM website.This document is not an ASTM standard and
13、 is intended only to provide the user of an ASTM standard an indication of what changes have been made to the previous version. Becauseit may not be technically possible to adequately depict all changes accurately, ASTM recommends that users consult prior editions as appropriate. In all cases only t
14、he current versionof the standard as published by ASTM is to be considered the official document.Copyright ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States1 is generally a small number. Confidence level is often 95 % or 99 %.3.1.5 correlation co
15、effcient, nfor a population, a dimensionless measure of association between two variables X and Y,equal to the covariance divided by the product of X times Y.3.1.6 correlation coeffcient, nfor a sample, r, the estimate of the parameter from the data.3.1.7 covariance, nof a population, cov(X, Y), for
16、 two variables, X and Y, the expected value of (X X)(Y Y).3.1.8 covariance, nof a sample; the estimate of the parameter cov(X,Y) from the data.3.1.9 dependent variable, na variable to be predicted using an equation.3.1.2 degrees of freedom, nthe number of independent data points minus the number of
17、parameters that have to be estimatedbefore calculating the variance. E25863.1.11 deviation, d, nthe difference of an observed value from its mean.3.1.12 estimate, nsample statistic used to approximate a population parameter. E25863.1.13 independent variable, na variable used to predict another using
18、 an equation.3.1.14 mean, nof a population, , average or expected value of a characteristic in a population of a sample,X, sum of theobserved values in the sample divided by the sample size. E25863.1.15 parameter, nsee population parameter. E25863.1.16 population, nthe totality of items or units of
19、material under consideration. E25863.1.17 population parameter, nsummary measure of the values of some characteristic of a population. E25863.1.18 prediction interval, nan interval for a future value or set of values, constructed from a current set of data, in a way thathas a specified probability f
20、or the inclusion of the future value. E25863.1.19 regression, nthe process of estimating parameter(s) of an equation using a set of data.3.1.3 residual, nobserved value minus fitted value, when a model is used.3.1.21 statistic, nsee sample statistic. E25863.1.4 quantile, predictor variable, X, nvalu
21、e such that a fractiona variable fused of the sample or population is less than orequal to that value.to predict a response variable using a regression model. E25863.1.4.1 DiscussionAlso called an independent or explanatory variable.3.1.5 sample, regression analysis, na group of observations or test
22、 results, taken from a larger collection of observations ortest results, which serves to provide information that may be used as a basis for making a decision concerning the largercollection.statistical procedure used to characterize the association between two numerical variables for prediction of
23、the responsevariable from the predictor variable. E25863.1.24 sample size, n, nnumber of observed values in the sample. E25863.1.6 sample statistic, response variable, Y, nsummary measure of the observed values of a sample.a variable predicted froma regression model. E25863.1.6.1 DiscussionAlso call
24、ed a dependent variable.3.1.26 standard errorstandard deviation of the population of values of a sample statistic in repeated sampling, or an estimateof it. E25863.1.26.1 DiscussionIf the standard error of a statistic is estimated, it will itself be a statistic with some variance that depends on the
25、 sample size.3.1.7 standard deviationsample correlation coeffcient, r, nof a population,dimensionless , the square root of the averageor expected value of the squared deviation of a variable from its mean; measure of association between two variables estimatedfrom of a sample, s,the square root of t
26、he sum of the squared deviations of the observed values in the sample from their meandivided by the sample size minus 1.data. E25863.1.8 variance, sample covariance,2, sxy2, nsquare an estimate of the standard deviation of the population or sample.asso-ciation of the response variable and predictor
27、variable calculated from the data. E2586E3080 1723.1.28.1 DiscussionFor a finite population, 2 is calculated as the sum of squared deviations of values from the mean, divided by n. For a continuouspopulation, 2 is calculated by integrating (x )2 with respect to the density function. For a sample, s2
28、 is calculated as the sumof the squared deviations of observed values from their average divided by one less than the sample size.3.2 Definitions of Terms Specific to This Standard:3.2.1 intercept, nof a regression model, 0, the value of the response variable when the predictor variable is zero.3.2.
29、2 regression model parameter, na descriptive constant defining a regression model that is to be estimated.3.2.3 residual standard deviation, nof a regression model, , the square root of the residual variance.3.2.4 residual variance, nof a regression model, 2, the variance of the residuals (see resid
30、ual).3.2.5 slope, nof a regression model, 1, the incremental change in the response variable due to a unit change in the predictorvariable.3.3 Symbols:b0 = intercept estimate (5.2.2)b1 = slope estimate (5.2.2)0 = intercept parameter in model (5.1.2)1 = slope parameter in model (5.1.2)E = general poi
31、nt estimate of a parameter (5.4.2)ei = residual for data point i (5.2.5) = residual parameter in model (5.1.3)F = F statistic (X1.3.2)h = index for any value in data range (5.4.5)i = index for a data point (5.2.1)n = number of data points (5.2.1)r = sample correlation coefficient (5.3.2.1)r2 = coeff
32、icient of determination (5.3.2.2)S(b0,b1) = sum of squared deviations of Yi to the regression line (X1.1.2)sb1 = standard error of slope estimate (5.4.3)sb0 = standard error of intercept estimate (5.4.4)sE = general standard error of a point estimate (5.4.2) = residual standard deviation (5.1.3)s =
33、estimate of (5.2.6)2 = residual variance (5.1.3)s2 = estimate of 2 (5.2.6)sX2 = variance of X data (X1.2.1)sY2 = variance of Y data (X1.2.1)SXX = sum of squares of deviations of X data from average (5.2.3)SXY = sum of cross products of X and Y from their averages (5.2.3)sXY = sample covariance of X
34、and Y (X1.2.1)sYh = standard error of Y h (5.4.5)sYhind! = standard error of future individual Y value (5.4.6)SYY = sum of squares of deviations of Y data from average (5.2.3)t = Students t distribution (5.4.2)X = predictor variable (5.1.1)X = average of X data (5.2.3)Xh = general value of X in its
35、range (5.4.5)Xi = value of X for data point i (5.2.1)Y = response variable (5.1.1)Y = average of Y data (5.2.3)Y hind! = predicted future individual Y for a value Xh (5.4.6)Yi = value of Y for data point i (5.2.1)Y h = predicted value of Y for any value Xh (5.4.5)Y i = predicted value of Y for data
36、point i (5.2.4)3.4 Acronyms:3.4.1 ANOVA, nAnalysis of Variance3.4.2 df, nDegrees of Freedom3.4.3 LOF, nLack of FitE3080 1733.4.4 MS, nMean Square3.4.5 MSE, nMean Square Error3.4.6 MSR, nMean Square Regression3.4.7 MST, nMean Square Total3.4.8 PE, nPure Error3.4.9 SS, nSum of Squares3.4.10 SSE, nSum
37、of Squares Error3.4.11 SSR, nSum of Squares Regression3.4.12 SST, nSum of Squares Total4. Significance and Use4.1 Regression analysis is a statistical procedure that studies the relations statistical relationships between two or morenumerical variables andRef. utilizes(1, 2existing).3 data to determ
38、ine a model equation for prediction of one variable from another.In this standard, a simple linear regression model, that is, a straight line relationship between two variables, is consideredIn general,one of these variables is designated as a response variable and the rest of the variables are desi
39、gnated as predictor variables. Thenthe objective of the model is to predict (the1, 2).response from the predictor variables.4.1.1 This standard considers a numerical response variable and only a single numerical predictor variable.4.1.2 The regression model consists of: (1) a mathematical function t
40、hat relates the mean values of the response variabledistribution to fixed values of the predictor variable, and (2) a description of statistical distribution that describes the variabilityin the response variable at fixed levels of the predictor variable.4.1.3 The regression procedure utilizes exper
41、imental or observational data to estimate the parameters defining a regressionmodel and their precision. Diagnostic procedures are utilized to assess the resulting model fit and can suggest other models forimproved prediction performance.4.1.4 The regression model can be useful for developing proces
42、s knowledge through description of the variable relationship, inmaking predictions of future values, and in developing control methods for the process generating values of the variables.4.2 Section 5 in this standard deals with the simple linear regression model using a straight line mathematical re
43、lationshipbetween the two variables where variability of the response variable over the range of values of the predictor variable is describedby a normal distribution with constant variance. Appendix X1 provides supplemental information.5. Straight Line Regression and CorrelationSimple Linear Regres
44、sion Analysis5.1 Two VariablesSimple Linear Regression Model: The data set includes two variables, X and Y, measured over a collectionof sampling units, experimental units or other type of observational units. Each variable occurs the same number of times and thetwo variables are paired one to one.
45、Data of this type constitute a set of n ordered pairs of the form (xi, yi), where the index variable(i) runs from 1 through n.5.1.1 Select the response variable Y is always to beand the predictor variable treatedX. as a random variable. The predictor Xmay be either a random variable sampled from a p
46、opulation with an error that is negligible compared to the error of is assumedto have known values with little or no measurement error. The response Y,Y or values chosen as in the design of an experimentwhere the values represent levels that are fixed and without error. We refer to has a distributio
47、n of values for a given X as theindependent variable and value, and this distribution is defined for all YX as the dependent variable.values in a given range.5.1.2 The practitioner typically wants to see if a relationship exists between regression function for X and Y. In theory, manydifferent types
48、 of relationships can occur between X andY. The most common is a simple linear relationship of the formthe straightYline = relationship is Y5011X+ . The Xtwo + , where parameters for the function are the intercept 0 and are modeltheslope 1 coefficients and is a random error term representing variati
49、on in the observed . The intercept is the value of Y at givenwhen X,X and is assumed to have a mean of 0 and some unknown standard deviation .Astatistical analysis that seeks to determinea linear relationship between a dependent variable, = 0, but Y, and a single independent variable, X, is called simple linearregression. In this type of analysis it is assumed that the error structure is normally distributed with mean 0 and some unknow