1、A Constrained Regression Technique for COCOMO Calibration,Presented by Vu Nguyen On behalf of Vu Nguyen, Bert Steece, Barry Boehm nguyenvu, berts, boehmusc.edu,Outline,Introduction Multiple Linear Regression OLS, Stepwise, Lasso, Ridge Constrained Linear Regression Validation and Comparison COCOMO o
2、verview Cross validation Conclusions Limitations Future Work,Introduction,Building software estimation models is a search problem to find the best possible parameters that generate high prediction accuracy satisfy predefined constraints,Multiple Linear Regression,Multiple linear regression is presen
3、ted asyi = 0 + 1xi1 + kxik + i , i = 1,2, n Where, 0, 1, k are the coefficients n is the number of observations k is the number of variables xij is the value of the variable jth for the ith observation yi is the response of the ith observation,Ordinary Least Squares,OLS is the most common method to
4、estimate coefficients 0, 1, k OLS estimates coefficients by minimizing the sum of squared errors (SSE) Minimizeis the estimate of ith observation,Some Limitations of OLS,Highly sensitive to outliers Low bias but high variance (e.g., caused by collinearity or overfitting) Unable to constrain the esti
5、mates of coefficients Estimated coefficients may be counter-intuitive Example, OLS coefficient estimate for RUSE is negative, e.g., increase RUSE rating results in a decrease in effort,Develop for Reuse (RUSE),OLS estimates,Some Other Approaches,Stepwise (forward selection) Start with no variable an
6、d gradually add variables until “optimal” solution is achieved Ridge Minimize SSE and impose a penalty on sum of squared coefficientsLasso Minimize SSE and impose a penalty on sum of absolute coefficients,Outline,Introduction Multiple Linear Regression OLS, Stepwise, Lasso, Ridge Constrained Linear
7、Regression Validation COCOMO overview Cross validation Conclusions Limitations Future Work,Constrained Regression,Principles Use optimization paradigm: optimizing objective function with constraintMinimize f(y, X) subject to cf(z) Impose constraints on coefficients and relative error Expect to reduc
8、e variance by reducing the number of variables (variance and bias tradeoff),Constrained Regression (cont),General formMinimize subject to Constrained Minimum Sum of Squared Errors (CMSE)Constrained Minimum Sum of Absolute Errors (CMAE)Constrained Minimum Sum of Relative Errors (CMRE),Solve the Equat
9、ions,Solving the equations is an optimization problem CMSE: quadratic programming CMRE and CMAE: transformed to the form of linear programming We used lpsolve and quadprog packages in R Determine parameter c using cross-validation,Outline,Introduction Multiple Linear Regression OLS, Stepwise, Lasso,
10、 Ridge Constrained Linear Regression Validation and comparison COCOMO overview Cross validation Conclusions Limitations Future Work,Validation and Comparison,Two COCOMO datasets COCOMO 2000: 161 projects COCOMO 81: 63 projects Comparing with popular model building approaches OLS Stepwise Lasso Ridge
11、 Cross-validation 10-fold cross validation,COCOMO,Cost Constructive Model (COCOMO) first published in 1981 Calibrated using 63 projects (COCOMO 81 dataset) Uses SLOC as a size measure and 15 cost drivers COCOMO II published in 2000 Reflects changes in technologies and practices Uses 22 cost drivers
12、plus size measure Introduces 5 scale factors Calibrated using 161 data points (COCOMO II dataset),COCOMO Overview (cont),COCOMO Effort Equation, non-linearLinearize the model using log-transformation COCOMO 81log(PM) = 0 + 1 log(Size) + 2 log(EM1) + + 16 log(EM15) COCOMO IIlog(PM) = 0 + 1 log(Size)
13、+ i SFi log(Size) + j log(EMj) Estimate coefficients using a linear regression method,Model Accuracy Measures,Magnitude of relative errors (MRE)Mean of MRE (MMRE)Prediction Level: PRED(l) = k/N Where, k is the number of estimates with MRE l,Cross Validation,10-fold cross validation was used Step 1.
14、Randomly split the dataset into K=10 subsets Step 2. For each i = 1 . 10 Remove the subset i th and build the model i th subset is used as testing set to calculate MMREi and PRED(l)I Step 3. Repeat 1 and 2 for r=15 times,Non-cross validation results,COCOMO II dataset (N = 161),COCOMO 81 dataset (N =
15、 63),OLS: Max MRE=1.23 PRED=0.78,* PRED(0.3),Cross-validation Results,COCOMO II dataset,COCOMO 81 dataset,Statistical Significance,Results of statistical significance tests on MMRE (0.05 confidence level used) Mann-Whitney U hypothesis test,CMSE outperforms Ridge, OLS p 0.10 p 0.10,CMSE outperforms
16、Lasso, Stepwise p 0. 05,CMAE outperforms Lasso, Ridge, OLS p 10-3 p 0. 02 Stepwise,CMRE outperforms Lasso, Ridge, OLS p 10-4 p 10-4 Stepwise,Comparing With Published Results,Some best published results in for COCOMO datasets Bayesian analysis (Boehm et al., 2000) Chen et al., 2006 Best cross-validat
17、ed mean PRED(.30):,Productivity Range,COCOMO II.2000 A = 2.94 B = 0.91,CMRE A = 2.27 B = 0.98,Outline,Introduction Multiple Linear Regression OLS, Stepwise, Lasso, Ridge Constrained Linear Regression Validation and comparison COCOMO overview Cross validation Conclusions Limitations Future Work,Concl
18、usions,Technique imposes constraints on the estimates of coefficients and the magnitude of errors term Directly resolving the unexpected estimates of coefficients determined by data Estimation accuracies are favorable CMRE and CMAE outperform OLS, Stepwise, Ridge, Lasso, and CMSE MRE and MAE are fav
19、orable objective functions Technique can be applied in not only COCOMO-like models but also other linear models An alternative for researchers and practitioners to build models,Limitations,As the technique deals with the optimization, sub-optimal solution is returned instead of global-optimal one Mu
20、ltiple solutions exist for the estimates of coefficients There are only two datasets investigated, the technique might not work well on other datasets,Future Work,Validate the technique using other datasets (e.g., NASA datasets) Compare results from the technique with others such as neutral networks
21、, generic programming Apply and compare with other objective functions MdMRE (median of MRE) Z measure (z=estimate/actual),References,Boehm et al., 2000. B. Boehm, E. Horowitz, R. Madachy, D. Reifer, B. K. Clark, B. Steece, A. W. Brown, S. Chulani, and C. Abts, Software Cost Estimation with COCOMO II. Prentice Hall, 2000. Chen et al., 2000, Z. Chen, T. Menzies, D. Port, and B. Boehm. Finding the right data for software cost modeling. IEEE Software, Nov 2005.,Thank You,Q&A,