ASTM E3080-17 Standard Practice for Regression Analysis.pdf

资源描述

1、Designation: E3080 17 An American National StandardStandard Practice forRegression Analysis1This standard is issued under the fixed designation E3080; the number immediately following the designation indicates the year oforiginal adoption or, in the case of revision, the year of last revision. A num

2、ber in parentheses indicates the year of last reapproval. Asuperscript epsilon () indicates an editorial change since the last revision or reapproval.1. Scope1.1 This practice covers regression analysis methodologyfor estimating, evaluating, and using the simple linear regres-sion model to define th

3、e statistical relationship between twonumerical variables.1.2 The system of units for this practice is not specified.Dimensional quantities in the practice are presented only asillustrations of calculation methods. The examples are notbinding on products or test methods treated.1.3 This standard doe

4、s not purport to address all of thesafety concerns, if any, associated with its use. It is theresponsibility of the user of this standard to establish appro-priate safety, health, and environmental practices and deter-mine the applicability of regulatory limitations prior to use.1.4 This internation

5、al standard was developed in accor-dance with internationally recognized principles on standard-ization established in the Decision on Principles for theDevelopment of International Standards, Guides and Recom-mendations issued by the World Trade Organization TechnicalBarriers to Trade (TBT) Committ

6、ee.2. Referenced Documents2.1 ASTM Standards:2E178 Practice for Dealing With Outlying ObservationsE456 Terminology Relating to Quality and StatisticsE2586 Practice for Calculating and Using Basic Statistics3. Terminology3.1 DefinitionsUnless otherwise noted, terms relating toquality and statistics a

7、re as defined in Terminology E456.3.1.1 coeffcient of determination, r2,nsquare of thecorrelation coefficient.3.1.2 degrees of freedom, nthe number of independentdata points minus the number of parameters that have to beestimated before calculating the variance. E25863.1.3 residual, nobserved value

8、minus fitted value, when amodel is used.3.1.4 predictor variable, X, na variable used to predict aresponse variable using a regression model.3.1.4.1 DiscussionAlso called an independent or explana-tory variable.3.1.5 regression analysis, na statistical procedure used tocharacterize the association b

9、etween two numerical variablesfor prediction of the response variable from the predictorvariable.3.1.6 response variable, Y, na variable predicted from aregression model.3.1.6.1 DiscussionAlso called a dependent variable.3.1.7 sample correlation coeffcient, r, na dimensionlessmeasure of association

10、between two variables estimated fromthe data.3.1.8 sample covariance, sxy,nan estimate of the associa-tion of the response variable and predictor variable calculatedfrom the data.3.2 Definitions of Terms Specific to This Standard:3.2.1 intercept, nof a regression model, 0, the value ofthe response v

11、ariable when the predictor variable is zero.3.2.2 regression model parameter, na descriptive constantdefining a regression model that is to be estimated.3.2.3 residual standard deviation, nof a regression model, the square root of the residual variance.3.2.4 residual variance, nof a regression model

12、 2, thevariance of the residuals (see residual).3.2.5 slope, nof a regression model, 1, the incrementalchange in the response variable due to a unit change in thepredictor variable.3.3 Symbols:b0= intercept estimate (5.2.2)b1= slope estimate (5.2.2)0= intercept parameter in model (5.1.2)1= slope pa

13、rameter in model (5.1.2)1This practice is under the jurisdiction of ASTM Committee E11 on Quality andStatistics and is the direct responsibility of Subcommittee E11.10 on Sampling /Statistics.Current edition approved Nov. 1, 2017. Published January 2018. Originallyapproved in 2019. Last previous edi

14、tion approved in 2016 as E3080 16. DOI:10.1520/E3080-17.2For referenced ASTM standards, visit the ASTM website, www.astm.org, orcontact ASTM Customer Service at serviceastm.org. For Annual Book of ASTMStandards volume information, refer to the standards Document Summary page onthe ASTM website.Copyr

15、ight ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United StatesThis international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for theDevelopment of Internatio

16、nal Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.1E = general point estimate of a parameter (5.4.2)ei= residual for data point i (5.2.5) = residual parameter in model (5.1.3)F = F statistic (X1.3.2)h = index for any value in

17、 data range (5.4.5)i = index for a data point (5.2.1)n = number of data points (5.2.1)r = sample correlation coefficient (5.3.2.1)r2= coefficient of determination (5.3.2.2)S(b0,b1) = sum of squared deviations of Yito the regressionline (X1.1.2)sb1= standard error of slope estimate (5.4.3)sb0= standa

18、rd error of intercept estimate (5.4.4)sE= general standard error of a point estimate (5.4.2) = residual standard deviation (5.1.3)s = estimate of (5.2.6)2= residual variance (5.1.3)s2= estimate of 2(5.2.6)sX2= variance of X data (X1.2.1)sY2= variance of Y data (X1.2.1)SXX= sum of squares of deviatio

19、ns of X data fromaverage (5.2.3)SXY= sum of cross products of X and Y from theiraverages (5.2.3)sXY= sample covariance of X and Y (X1.2.1)sYh=standard error of Yh(5.4.5)sYhind!= standard error of future individual Y value (5.4.6)SYY= sum of squares of deviations of Y data fromaverage (5.2.3)t = Stud

20、ents t distribution (5.4.2)X = predictor variable (5.1.1)X= average of X data (5.2.3)Xh= general value of X in its range (5.4.5)Xi= value of X for data point i (5.2.1)Y = response variable (5.1.1)Y= average of Y data (5.2.3)Yhind!= predicted future individual Y for a value Xh(5.4.6)Yi= value of Y fo

21、r data point i (5.2.1)Yh= predicted value of Y for any value Xh(5.4.5)Yi= predicted value of Y for data point i (5.2.4)3.4 Acronyms:3.4.1 ANOVA, nAnalysis of Variance3.4.2 df, nDegrees of Freedom3.4.3 LOF, nLack of Fit3.4.4 MS, nMean Square3.4.5 MSE, nMean Square Error3.4.6 MSR, nMean Square Regress

22、ion3.4.7 MST, nMean Square Total3.4.8 PE, nPure Error3.4.9 SS, nSum of Squares3.4.10 SSE, nSum of Squares Error3.4.11 SSR, nSum of Squares Regression3.4.12 SST, nSum of Squares Total4. Significance and Use4.1 Regression analysis is a statistical procedure that studiesthe statistical relationships be

23、tween two or more variables Ref.(1, 2).3In general, one of these variables is designated as aresponse variable and the rest of the variables are designated aspredictor variables. Then the objective of the model is topredict the response from the predictor variables.4.1.1 This standard considers a nu

24、merical response variableand only a single numerical predictor variable.4.1.2 The regression model consists of: (1) a mathematicalfunction that relates the mean values of the response variabledistribution to fixed values of the predictor variable, and (2)adescription of statistical distribution that

25、 describes the variabil-ity in the response variable at fixed levels of the predictorvariable.4.1.3 The regression procedure utilizes experimental orobservational data to estimate the parameters defining a regres-sion model and their precision. Diagnostic procedures areutilized to assess the resulti

26、ng model fit and can suggest othermodels for improved prediction performance.4.1.4 The regression model can be useful for developingprocess knowledge through description of the variablerelationship, in making predictions of future values, and indeveloping control methods for the process generating v

27、aluesof the variables.4.2 Section 5 in this standard deals with the simple linearregression model using a straight line mathematical relation-ship between the two variables where variability of theresponse variable over the range of values of the predictorvariable is described by a normal distributi

28、on with constantvariance. Appendix X1 provides supplemental information.5. Simple Linear Regression Analysis5.1 Simple Linear Regression Model:5.1.1 Select the response variable Y and the predictorvariable X. The predictor X is assumed to have known valueswith little or no measurement error. The res

29、ponse Y has adistribution of values for a given X value, and this distributionis defined for all X values in a given range.5.1.2 The regression function for the straight line relation-ship is Y5011X. The two parameters for the function are theintercept 0and the slope 1. The intercept is the value of

30、 Ywhen X = 0, but this parameter may not be of practical interestwhen the range of X is far removed from zero. The slope is theamount of incremental change in Y units for a unit change in X.5.1.3 The statistical distribution for Y is assumed to be anormal (Gaussian) distribution having a mean of 011

31、X witha standard deviation . The simple linear regression model isthen stated as Y5011X1, where is a random error that isnormally distributed with mean zero and standard deviation (variance 2).5.1.4 An example of a linear regression model is depicted inFig. 1 over a range of X from0to40X units. Norm

32、aldistributions of response Y with = 1.3 Y units are depicted atX = 10, 20, and 30 X units.3The boldface numbers in parentheses refer to a list of references at the end ofthis standard.E3080 1725.2 Estimating Regression Model Parameters:5.2.1 The model parameters 0, and 1, are estimated froma sample

33、 of data consisting of n pairs of values designated as(Xi, Yi), with the sample number i ranging from 1 through n.The data can arise in two different ways. Observational dataconsists of X and Y values measured on a set of n randomsamples. Experimental data consists of Y values measured on nexperimen

34、tal units with X values set at fixed values. In bothcases the Y values may have measurement error, but the Xvalues are assumed known with negligible measurement error.5.2.2 The regression line parameters 0, and 1are esti-mated by the method of least squares, which finds theircorresponding estimates

35、b0and b1that minimize the sum of thesquares of the vertical distances between the Yivalues and theirrespective line values at Xi. (For a further discussion of theleast squares method, see X1.1.2.)5.2.3 Calculate the following statistics from the X and Yvalues in the data set.5.2.3.1 Calculate the av

36、erages of X and Y:X5(i51nXin(1)Y5(i51nYin(2)5.2.3.2 Calculate the sums of squared deviations SXXandSYYof X and Y from their respective averages and the sum ofcross products SXYof the X and Y deviations from theiraverages:SXX5(i51nXi2 X!2(3)SYY5(i51nYi2 Y!2(4)SXY5(i51nXi2 X!Yi2 Y! (5)SXXis a known fi

37、xed constant. SYYand SXYare randomvariables.5.2.3.3 The least squares solution gives the parameter esti-mates:b15 SXY SXX(6)b05 Y2 b1X(7)SYYis not used here but will be used in subsequent sections.5.2.4 The fitted values Yifor each data point Yiare calcu-lated from the estimated regression function

38、as:Yi5 b01b1Xi(8)5.2.5 The residual eiis the difference between the responsedata point Yiand its fitted value Yi:ei5 Yi2 Yi(9)Residuals are graphically the vertical distances on the scatterplot between the response data points Yiand the estimatedregression line.5.2.6 The estimates s2of the variance

39、2and s of thestandard deviation of the Y distribution are calculated as thesum of the squared residuals divided by their degrees offreedom:s25(i51nei2n 2 2!5(i21nYi2 Yi!2 n 2 2! (10)s 5 =s2(11)These estimates have n 2 degrees of freedom because ofprior estimation of two parameters, the slope and int

40、ercept ofthe line, which removed two degrees of freedom from the dataset of n data points prior to calculation of the residuals.5.2.7 Regression Analysis Procedure with ExampleThesteps in the regression analysis procedure for the simple linearmodel, that are illustrated in the example below, are as

41、follows:FIG. 1 Graphical Depiction of a Straight Line Regression ModelE3080 173(1) Choose the predictor variable X and response variableY.(2) Obtain data pairs of X and Y from available data or byconducting an experiment.(3) Evaluate the distribution of the predictor variable andthe XY relationship

42、using plots.(4) If the model is supported by the data plots, estimate themodel parameters from the data.(5) Evaluate the fitted model against the model assump-tions.(6) Use the regression model for future prediction of Yfrom X.5.2.7.1 A data set from Duncan, Ref. (3) lists measurementsof shear stren

43、gth (inch-pounds) and weld diameter (mils)measured on 10 random test specimens, so this is an observa-tional data set with n = 10 pairs. Regression analysis will beused to investigate the relationship between weld diameter andshear strength, with the objective of predicting shear strength Yfrom weld

44、 diameter X. The weld diameters are considered to bemeasured with small error. The data are listed in Table 1.5.2.7.2 A dot plot of the X data is shown as Fig. 2, and theplot indicated that the data was spread out fairly evenly acrossthe range of 190270 mils and some of the parts had the samediamete

45、rs.5.2.7.3 A scatter plot of the data is recommended as a firstor concurrent step for a visual look at the relationship, andmost computer packages have this as an option. This is a plotof Y (on the vertical axis) versus X (on the horizontal axis) foreach data pair. If a straight line relationship ex

46、ists, the clusterof points will appear to be elongated in a particular directionalong a straight line, and the plot will visually reveal anycurvature or any other deviations from a straight linerelationship, as well as any outlying data points. The estimatedregression line can also be included on th

47、e plot to give a visualimpression of the fit of the model to the data.The scatter plot for this example is shown in Fig. 3. Theshear strength appears to be increasing in a linear fashion withweld diameter. There is some scatter but no apparent outlyingdata points.5.2.7.4 The calculations, with equat

48、ion numbers for eachcalculation, are shown in Table 1. The averages of X and Y arerespectively 233.9 mils and 975.0 inch-pounds. The deviationsof X and Y from their averages are listed for each observation,and these are used to calculate values of the statistics SXX, SYY,and SXY. The least squares e

49、stimates of the slope and interceptare calculated, resulting in the estimated model equation givingfitted values Yi5-569.4716.898 Xi, and these values are listed foreach observation. The residuals ei5Yi5Yiare also listed foreach observation. Estimates of the variance and standarddeviation of the Y distribution are calculated from squares ofthe residuals. The estimated standard deviation is 99.90 inch-pounds.5.2.7.5 The least squares straight line is depicted wi

展开阅读全文