1、Applied Business Forecasting and planning,Multiple Regression Analysis,Introduction,In simple linear regression we studied the relationship between one explanatory variable and one response variable. Now, we look at situations where several explanatory variables works together to explain the respons
2、e.,Introduction,Following our principles of data analysis, we look first at each variable separately, then at relationships among the variables. We look at the distribution of each variable to be used in multiple regression to determine if there are any unusual patterns that may be important in buil
3、ding our regression analysis.,Multiple Regression,Example. In a study of direct operating cost, Y, for 67 branch offices of consumer finance charge, four independent variables were considered: X1: Average size of loan outstanding during the year, X2 : Average number of loans outstanding, X3 : Total
4、number of new loan applications processed, and X4 : Office salary scale index. The model for this example is,Formal Statement of the Model,General regression model0, 1, , k are parameters X1, X2, ,Xk are known constants , the error terms are independent N(o, 2),Estimating the parameters of the model
5、,The values of the regression parameters i are not known. We estimate them from data. As in the simple linear regression case, we use the least-squares method to fit a linear functionto the data. The least-squares method chooses the bs that make the sum of squares of the residuals as small as possib
6、le.,Estimating the parameters of the model,The least-squares estimates are the values that minimize the quantitySince the formulas for the least-squares estimates are complicated and hand calculation is out of question, we are content to understand the least-squares principle and let software do the
7、 computations.,Estimating the parameters of the model,The estimate of i is bi and it indicates the change in the mean response per unit increase in Xi when the rest of the independent variables in the model are held constant. The parameters i are frequently called partial regression coefficients bec
8、ause they reflect the partial effect of one independent variable when the rest of independent variables are included in the model and are held constant,Estimating the parameters of the model,The observed variability of the responses about this fitted model is measured by the varianceand the regressi
9、on standard error,Estimating the parameters of the model,In the model 2 and measure the variability of the responses about the population regression equation. It is natural to estimate 2 by s2 and by s.,Analysis of Variance Table,The basic idea of the regression ANOVA table are the same in simple an
10、d multiple regression. The sum of squares decomposition and the associated degrees of freedom are:df:,Analysis of Variance Table,F-test for the overall fit of the model,To test the statistical significance of the regression relation between the response variable y and the set of variables x1, xk, i.
11、e. to choose between the alternatives:We use the test statistic:,F-test for the overall fit of the model,The decision rule at significance level is: Reject H0 ifWhere the critical value F(, k, n-k-1) can be found from an F-table. The existence of a regression relation by itself does not assure that
12、useful prediction can be made by using it. Note that when k=1, this test reduces to the F-test for testing in simple linear regression whether or not 1= 0,Interval estimation of i,For our regression model, we have:Therefore, an interval estimate for i with 1- confidence coefficient is:Where,Signific
13、ance tests for i,To test:We may use the test statistic:Reject H0 if,Multiple regression model Building,Often we have many explanatory variables, and our goal is to use these to explain the variation in the response variable. A model using just a few of the variables often predicts about as well as t
14、he model using all the explanatory variables.,Multiple regression model Building,We may find that the reciprocal of a variable is a better choice than the variable itself, or that including the square of an explanatory variable improves prediction. We may find that the effect of one explanatory vari
15、able may depends upon the value of another explanatory variable. We account for this situation by including interaction terms.,Multiple regression model Building,The simplest way to construct an interaction term is to multiply the two explanatory variables together. How can we find a good model?,Sel
16、ecting the best Regression equation.,After a lengthy list of potentially useful independent variables has been compiled, some of the independent variables can be screened out. An independent variable May not be fundamental to the problem May be subject to large measurement error May effectively dupl
17、icate another independent variable in the list.,Selecting the best Regression Equation.,Once the investigator has tentatively decided upon the functional forms of the regression relations (linear, quadratic, etc.), the next step is to obtain a subset of the explanatory variables (x) that “best” expl
18、ain the variability in the response variable y.,Selecting the best Regression Equation.,An automatic search procedure that develops sequentially the subset of explanatory variables to be included in the regression model is called stepwise procedure. It was developed to economize on computational eff
19、orts. It will end with the identification of a single regression model as “best”.,Example: Sales Forecasting,Sales Forecasting Multiple regression is a popular technique for predicting product sales with the help of other variables that are likely to have a bearing on sales. Example The growth of ca
20、ble television has created vast new potential in the home entertainment business. The following table gives the values of several variables measured in a random sample of 20 local television stations which offer their programming to cable subscribers. A TV industry analyst wants to build a statistic
21、al model for predicting the number of subscribers that a cable station can expect.,Example:Sales Forecasting,Y = Number of cable subscribers (SUSCRIB) X1 = Advertising rate which the station charges local advertisers for one minute of prim time space (ADRATE) X2 = Kilowatt power of the stations non-
22、cable signal (KILOWATT) X3 = Number of families living in the stations area of dominant influence (ADI), a geographical division of radio and TV audiences (APIPOP) X4 = Number of competing stations in the ADI (COMPETE),Example:Sales Forecasting,The sample data are fitted by a multiple regression mod
23、el using Excel program. The marginal t-test provides a way of choosing the variables for inclusion in the equation. The fitted Model is,Example:Sales Forecasting,Excel Summary output,Example:Sales Forecasting,Do we need all the four variables in the model? Based on the partial t-test, the variables
24、signal and compete are the least significant variables in our model. Lets drop the least significant variables one at a time.,Example:Sales Forecasting,Excel Summary Output,Example:Sales Forecasting,The variable Compete is the next variable to get rid of.,Example:Sales Forecasting,Excel Summary Outp
25、ut,Example:Sales Forecasting,All the variables in the model are statistically significant, therefore our final model is:Final Model,Interpreting the Final Model,What is the interpretation of the estimated parameters. Is the association positive or negative? Does this make sense intuitively, based on
26、 what the data represents? What other variables could be confounders? Are there other analysis that you might consider doing? New questions raised?,Multicollinearity,In multiple regression analysis, one is often concerned with the nature and significance of the relations between the explanatory vari
27、ables and the response variable. Questions that are frequently asked are: What is the relative importance of the effects of the different independent variables? What is the magnitude of the effect of a given independent variable on the dependent variable?,Multicollinearity,Can any independent variab
28、le be dropped from the model because it has little or no effect on the dependent variable? Should any independent variables not yet included in the model be considered for possible inclusion? Simple answers can be given to these questions if The independent variables in the model are uncorrelated am
29、ong themselves. They are uncorrelated with any other independent variables that are related to the dependent variable but omitted from the model.,Multicollinearity,When the independent variables are correlated among themselves, multicollinearity or colinearity among them is said to exist. In many no
30、n-experimental situations in business, economics, and the social and biological sciences, the independent variables tend to be correlated among themselves. For example, in a regression of family food expenditures on the variables: family income, family savings, and the age of head of household, the
31、explanatory variables will be correlated among themselves.,Multicollinearity,Further, the explanatory variables will also be correlated with other socioeconomic variables not included in the model that do affect family food expenditures, such as family size.,Multicollinearity,Some key problems that
32、typically arise when the explanatory variables being considered for the regression model are highly correlated among themselves are: Adding or deleting an explanatory variable changes the regression coefficients. The estimated standard deviations of the regression coefficients become large when the
33、explanatory variables in the regression model are highly correlated with each other. The estimated regression coefficients individually may not be statistically significant even though a definite statistical relation exists between the response variable and the set of explanatory variables.,Multicol
34、linearity Diagnostics,A formal method of detecting the presence of multicollinearity that is widely used is by the means of Variance Inflation Factor. It measures how much the variances of the estimated regression coefficients are inflated as compared to when the independent variables are not linear
35、ly related.Is the coefficient of determination from the regression of the jth independent variable on the remaining k-1 independent variables.,Multicollinearity Diagnostics,AVIF near 1 suggests that multicollinearity is not a problem for the independent variables. Its estimated coefficient and assoc
36、iated t value will not change much as the other independent variables are added or deleted from the regression equation. A VIF much greater than 1 indicates the presence of multicollinearity. A maximum VIF value in excess of 10 is often taken as an indication that the multicollinearity may be unduly
37、 influencing the least square estimates.the estimated coefficient attached to the variable is unstable and its associated t statistic may change considerably as the other independent variables are added or deleted.,Multicollinearity Diagnostics,The simple correlation coefficient between all pairs of
38、 explanatory variables (i.e., X1, X2, , Xk ) is helpful in selecting appropriate explanatory variables for a regression model and is also critical for examining multicollinearity. While it is true that a correlation very close to +1 or 1 does suggest multicollinearity, it is not true (unless there a
39、re only two explanatory variables) to infer multicollinearity does not exist when there are no high correlations between any pair of explanatory variables.,Example:Sales Forecasting,Pearson Correlation Coefficients, N = 20Prob |r| under H0: Rho=0SUBSCRIB ADRATE KILOWATT APIPOP COMPETESUBSCRIB 1.0000
40、0 -0.02848 0.44762 0.90447 0.79832SUBSCRIB 0.9051 0.0478 .0001 .0001ADRATE -0.02848 1.00000 -0.01021 0.32512 0.34147ADRATE 0.9051 0.9659 0.1619 0.1406KILOWATT 0.44762 -0.01021 1.00000 0.45303 0.46895KILOWATT 0.0478 0.9659 0.0449 0.0370APIPOP 0.90447 0.32512 0.45303 1.00000 0.87592APIPOP .0001 0.1619
41、 0.0449 .0001COMPETE 0.79832 0.34147 0.46895 0.87592 1.00000COMPETE .0001 0.1406 0.0370 .0001,Example:Sales Forecasting,Example:Sales Forecasting,VIF calculation: Fit the model,Example:Sales Forecasting,Fit the model,Example:Sales Forecasting,Fit the model,Example:Sales Forecasting,Fit the model,Exa
42、mple:Sales Forecasting,VIF calculation Results:There is no significant multicollinearity.,Qualitative Independent Variables,Many variables of interest in business, economics, and social and biological sciences are not quantitative but are qualitative. Examples of qualitative variables are gender (ma
43、le, female), purchase status (purchase, no purchase), and type of firms. Qualitative variables can also be used in multiple regression.,Qualitative Independent Variables,An economist wished to relate the speed with which a particular insurance innovation is adopted (y) to the size of the insurance f
44、irm (x1) and the type of firm. The dependent variable is measured by the number of months elapsed between the time the first firm adopted the innovation and and the time the given firm adopted the innovation. The first independent variable, size of the firm, is quantitative, and measured by the amou
45、nt of total assets of the firm. The second independent variable, type of firm, is qualitative and is composed of two classes-Stock companies and mutual companies.,Indicator variables,Indicator, or dummy variables are used to determine the relationship between qualitative independent variables and a
46、dependent variable. Indicator variables take on the values 0and 1. For the insurance innovation example, where the qualitative variable has two classes, we might define the indicator variable x2 as follows:,Indicator variables,A qualitative variable with c classes will be represented by c-1 indicato
47、r variables. A regression function with an indicator variable with two levels (c = 2) will yield two estimated lines.,Interpretation of Regression Coefficients,In our insurance innovation example, the regression model is:Where:,Interpretation of Regression Coefficients,To understand the meaning of t
48、he regression coefficients in this model, consider first the case of mutual firm. For such a firm, x2 = 0 and we have:For a stock firm x2 = 1 and the response function is:,Interpretation of Regression Coefficients,The response function for the mutual firms is a straight line, with y intercept 0 and
49、slope 1. For stock firms, this also is a straight line, with the same slope 1 but with y intercept 0+2. With reference to the insurance innovation example, the mean time elapsed before the innovation is adopted is linear function of size of firm (x1), with the same slope 1for both types of firms.,In
50、terpretation of Regression Coefficients,2 indicates how much lower or higher the response function for stock firm is than the one for the mutual firm. 2 measures the differential effect of type of firms. In general, 2 shows how much higher (lower) the mean response line is for the class coded 1 than the line for the class coded 0, for any level of x1.,