1、Introduction of Thomas H. Taylor, Jr., PE,Georgia Institute of Technology, BS Applied Mathematics, 1975 Georgia State University, MS Decision Sciences, Statistics Concentration, 1985 Registered Professional Engineer, Industrial 25 years in private-sector energy industry + 8 years in micro-biology an
2、d public health, in federal government Senior Executive in utility consulting industry Senior federal employee, well published in scientific journals. Holder of Methods Patent for new computational approach and associated SASTM-based software for series-dilution bioassays Career conclusions: Modelin
3、g (and much of statistics in general) is transferable across sectors, industries, and disciplines. The jargon varies across sectors, industries, and disciplines,Presentation Outline,Introduction of T. Taylor Regression Modeling Motivation Implicit in the development of a real-world model is the expe
4、ctation that it be used for decision making. The decision-making is the guiding principle for model development. Modeling Examples Course of Disease response decisions Epidemiological, Chronic policy and treatment decisions Epidemiological, Outbreak announcements & recalls Software for modeling SAST
5、M is superior to ExcelTM in modeling situations, due to documentation, reproducibility, and audit-worthiness. Regression modeling in the real world is not as clean as it is in many textbooks,Decision-making and Risk,Implicit in decision making is the minimization of risk Risk = probability (event) X
6、 loss function (event) Loss functions are different in different industries and sectors “Risk” is used incorrectly in some sectors and industries. Government decision criteria are considerably different from private sector Public welfare is not expected to be cost-effective Epidemiology Objective: R
7、educe burden of disease or rate of mortality Intervention: Vaccine introduction; educational campaigns, e.g. hand-washing; avoidance of specific behaviors; food and drug recalls Energy Objective: reduce energy use, or re-arrange energy use Actions: green marketing; efficiency mandates; development o
8、f alternatives Classic Marketing Objective: increase sales; maximize profit; minimize risk Decisions: pricing, product/service choice; R&D,exposure,Individual tolerance,spores,Spore eqiuvalent of toxin level,y=x,sick,not sick,Decision/Outcome Criterion,Exposure=Personal Tolerance,Fulminant Stage,Pro
9、dromal Stage,Exposure Personal Tolerance,Fulminant Stage,exposure,Individual tolerance,10-11 days to peak toxin level (asymptomatic),Not sick,10-11 days to prodromal disease,6-7 days till prodromal,4-5 days till prodromal,2-3 days,3 hrs.,600,50,000,100,000,600,50,000,100,000,Decision Timepoints (fro
10、m Model!),Popular Regression Models,Time series Simple Trends, e.g. energy increase per year Application-specific functions, e.g. sigmoidal ARIMA et al “Causal” not really: association cause Energy End-use: BTU=f(appliance stock, efficiency) Econometric: BTU=f(cost of energy, income, inflation) Epid
11、emiological Case-status=f(age, sex, race, genetic factors) Case-status=f(exposure1, exposure2,) “Survival” (Time-to-Event) models,SASTM Regression Procedures,General Regression: The REG Procedure Nonlinear Regression: The NLIN Procedure Response Surface Regression: The RSREG Procedure Partial Least
12、Squares Regression: The PLS Procedure Regression for Ill-conditioned Data: The ORTHOREG Procedure Local Regression: The LOESS Procedure Robust Regression: The ROBUSTREG Procedure Logistic Regression: The LOGISTIC Procedure Regression with Transformations: The TRANSREG Procedure Regression Using the
13、GLM, CATMOD, LOGISTIC, PROBIT, and LIFEREG Procedures Interactive Features in the CATMOD, GLM, and REG Procedureshttp:/ Regression Help (1),CATMOD analyzes data that can be represented by a contingency table. PROC CATMOD fits linear models to functions of response frequencies, and it can be used for
14、 linear and logistic regression. The CATMOD procedure is discussed in detail in Chapter 5, “Introduction to Categorical Data Analysis Procedures.“ GENMOD fits generalized linear models. PROC GENMOD is especially suited for responses with discrete outcomes, and it performs logistic regression and Poi
15、sson regression as well as fitting Generalized Estimating Equations for repeated measures data. See Chapter 5, “Introduction to Categorical Data Analysis Procedures,“ and Chapter 29, “The GENMOD Procedure,“ for more information. GLM uses the method of least squares to fit general linear models. In a
16、ddition to many other analyses, PROC GLM can perform simple, multiple, polynomial, and weighted regression. PROC GLM has many of the same input/output capabilities as PROC REG, but it does not provide as many diagnostic tools or allow interactive changes in the model or data. See Chapter 4, “Introdu
17、ction to Analysis-of-Variance Procedures,“ for a more detailed overview of the GLM procedure. LIFEREG fits parametric models to failure-time data that may be right censored. These types of models are commonly used in survival analysis. See Chapter 10, “Introduction to Survival Analysis Procedures,“
18、for a more detailed overview of the LIFEREG procedure. http:/ Regression Help (2),LOGISTIC fits logistic models for binomial and ordinal outcomes. PROC LOGISTIC provides a wide variety of model-building methods and computes numerous regression diagnostics. See Chapter 5, “Introduction to Categorical
19、 Data Analysis Procedures,“ for a brief comparison of PROC LOGISTIC with other procedures. NLIN builds nonlinear regression models. Several different iterative methods are available. ORTHOREG performs regression using the Gentleman-Givens computational method. For ill-conditioned data, PROC ORTHOREG
20、 can produce more accurate parameter estimates than other procedures such as PROC GLM and PROC REG. PLS performs partial least squares regression, principal components regression, and reduced rank regression, with cross validation for the number of components. http:/ Regression Help (3),PROBIT perfo
21、rms probit regression as well as logistic regression and ordinal logistic regression. The PROBIT procedure is useful when the dependent variable is either dichotomous or polychotomous and the independent variables are continuous. REG performs linear regression with many diagnostic capabilities, sele
22、cts models using one of nine methods, produces scatter plots of raw data and statistics, highlights scatter plots to identify particular observations, and allows interactive changes in both the regression model and the data used to fit the model. RSREG builds quadratic response-surface regression mo
23、dels. PROC RSREG analyzes the fitted response surface to determine the factor levels of optimum response and performs a ridge analysis to search for the region of optimum response. TRANSREG fits univariate and multivariate linear models, optionally with spline and other nonlinear transformations. Mo
24、dels include ordinary regression and ANOVA, multiple and multivariate regression, metric and nonmetric conjoint analysis, metric and nonmetric vector and ideal point preference mapping, redundancy analysis, canonical correlation, and response surface regression. http:/ Regression Help (4),Several SA
25、S/ETS procedures also perform regression. The following procedures are documented in the SAS/ETS Users Guide. AUTOREG implements regression models using time-series data where the errors are autocorrelated. PDLREG performs regression analysis with polynomial distributed lags. SYSLIN handles linear s
26、imultaneous systems of equations, such as econometric models. MODEL handles nonlinear simultaneous systems of equations, such as econometric models. http:/ vs. SASTM code,SASTM has tremendously more capability Use of SASTM procedures provides documentation, formally and operationally Spreadsheets an
27、d point-and-click environments cannot withstand audits Regulatory agencies: FERC, FDA, NRC, USDA (FDA: 21 CFR Part 11) Labor intensive point-and-click can be replaced with SASTM code to save time and, therefore, focus on analysis, not mechanics.,Specific Models,Disease A (used as decision/outcome ex
28、ample above) Course of disease - NOT regression Disease P Time series Simple periodic with exception!,Seasonal Data with Aberrations,1996,1997,1998,1999,Sinusoidal Piecewise Regression with Trend,Specific Models,Disease A Course of disease - NOT regression Disease P Time series Simple periodic with
29、exception! Sigmoid Laboratory applications,Plot of Measured Response* by Dilution “Well-behaved” Specimen,Dilution,True Midpoint (LD50, ED50, etc),True 50% Titer,0%,100%,Measured Response,Observed50% Titer,*Measured response can be cell counts, optical density, luminescence, or other lab-measured qu
30、antity.,What about? High-Variance Specimens Robustness of True 50% Endpoint,Dilution,Midpoint (50%),50%,Observed Response,Specific Models,Disease A Course of disease - NOT regression Disease P Time series Simple periodic with exception! Sigmoid Laboratory applications Investigation of foodborne dise
31、ase outbreak Not a laboratory Not a controlled experiment Not even a designed experiment Observational data,Foodborne Disease Outbreak,Associative (not causal) models Epidemiological Case-status=f(exposure1, exposure2,),George Box: “all models are wrong, but some are useful.”,George Edward Pelham Bo
32、x (18 October 1919 ) is one of the most influential statisticians of the 20th century and a pioneer in the areas of quality control, time series analysis, design of experiments and Bayesian inference.He served as President of the American Statistical Association in 1978 and of the Institute of Mathe
33、matical Statistics in 1979. He received the Shewhart Medal from the American Society for Quality Control in 1968, the Wilks Memorial Award from the American Statistical Association in 1972, the R. A. Fisher Lectureship in 1974, and the Guy Medal in Gold from the Royal Statistical Society in 1993. He was elected a member of the American Academy of Arts and Sciences in 1974 and a Fellow of the Royal Society in 1979.,