1、Association between two variables,Example: University fees for the Big Ten Universities Data were collected to study the association between the percentage of students that were from out of state and the tuition paid by nonresident students (in thousand dollars).Does the tuition money increase with
2、the percentage of non- resident students? (Does the percentage on non-resident students increase with the tuition money?),Example: Size of diamond and price of ring,The source of the data is a full page advertisement placed in the Straits Times newspaper issue of February 29, 1992, by a Singapore-ba
3、sed retailer of diamond jewelry. The variables are the size of the diamond in carats (1 carat = .2 gram) and the price of ladies rings (single diamond stone) in Singapore dollars. Carats Singapore dollars .17 355 .16 328 .17 350 .18 325 .25 642 . How would you describe the association between the tw
4、o variables?,Association between variables,Data are pairs (xi, yi) collected for two variables X and Y on each individual/unitTwo variables are associated if changes in one variable correspond to changes in the second variable. If there is a strong association, knowing one variable helps predicting
5、the other. Diamond carat size & ring price Blood pressure level and number of cigarettes smoked per dayIf the association is weak, information about one variable is not very useful in studying the other. In neither case is there any implied causality.,Useful terminology,The following terms are often
6、 used: Response variable: measures the outcome of the study (Dependent variable)Explanatory variable: explains or causes changes in the response variable (Independent variable) Can you identify this distinction in the examples shown earlier?1) Tuition = Response variable Non-residents=Explanatory va
7、riable2) Carat=Explanatory variable Price=Response variableIn this case, knowledge of the data may lead us to believe causality.,Scatter plots: displaying data about two variables,Scatter plots show the relationship between two quantitative variables. One variable (independent variable) appears on t
8、he x-axis (horizontal axis) and the dependent variable appears on the y-axis (vertical axis). Each observation is represented by a point in the plot.,Tuition,Nonresident students,NWU,UMich,Interpreting scatter plots,Look for the overall pattern and for striking deviationsDefine form, direction and s
9、trength of the relationship: Form: roughly linear if the points follow a straight lineor nonlinear Direction: positive or negative? Strength: how closely the points follow a clear formCheck for the presence of outliers, individual values that fall outside the overall patternTwo variables are positiv
10、ely (negatively) associated if the increase of one variable correspond to an increase (decrease) in the other variable.,2000 Presidential Elections,Did the butterfly ballots confuse voters? Did voters for Al Gore instead cast their votes for other candidates? Bush spokesman Ari Fleishcher stated on
11、Nov. 9 that “Palm Beach County is a Pat Buchanan stronghold and thats why Pat Buchanan received 3,407 votes there.“,What is the level of support that Pat Buchanan enjoys in Palm Beach County? The published election results show the association between the vote totals for Pat Buchanan and the total p
12、opulation for Florida counties.,Is the association positive or negative? Is the form of the relationship almost linear?,The Correlation Coefficient r,The correlation coefficient r measures the direction and the strength of the linear relationship between two variables. It is a value between 1 and 1I
13、f r is negative, Y tends to decrease linearly with XIf r is positive, Y tends to increase linearly with X.The closer r is to 1 or 1, the stronger the linear association is. Values of r close to 0 imply weak linear association.r is defined as,Where X has average and standard deviation sx, and Y has a
14、verage and standard deviation sy.,Examples of correlation,Birth rate (1,000 pop),Log G.N.P.,r = -0.74,Selling price (100$),Annual Taxes ($),r=0.65,Negative association,Positive association,Diamond rings data,Carat,Price,Strong positive associationr = 0.989,Carats vs Price,Positive Correlation,In eac
15、h plot there are 100 points. The correlation coefficient measures the amount of clustering around a line,If r is close to 1, then points lie close to a straight line!,Negative Correlation,Negative correlation: as x increases, y tends to decrease.,Guess the correlation,Match the diagrams with the fol
16、lowing correlations: 0.93 0.75 0.20 0.27 0.63 1.0,Different correlations?,In which diagram below is the correlation coefficient the largest? The smallest?,Summary,The correlation coefficient r varies between 1 and 1. If r=0 then there no linear association between X and Y. Positive r indicates posit
17、ive association between X and Y. Negative r indicates negative association between X and Y.Both variables X and Y must be quantitative. The correlation coefficient between X and Y is the same as the correlation between Y and XThe correlation measures only the linear relationship between two variable
18、sr can be strongly affected by the presence of outliers.,Compute correlation in Excel,The correlation coefficient is computed using the CORR function in the Data Analysis Toolpak. Click on TOOLS DATA ANALYSIS Correlation Or you can use the function “=CORREL(data range X, data range Y)”For instance if X values are in B2:B25 and Y values are in C2:C25:=CORREL(B2:B25, C2:C25),
copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1