1、A Practical Guide to SVM,Yihua Liao Dept. of Computer Science 2/3/03,Outline,Support vector machine basics GIST LIBSVM (SVMLight),Classification problems,Given: n training pairs, (, yi), where=(xi1, xi2,xil) is an input vector, and yi=+1/-1, corresponding classification H+ /H- Out: A label y for a n
2、ew vector x,Support vector machines,Goal: to find discriminator That maximize the margins,A little math,Primal problem,Decision function,Example,Functional classifications of Yeast genes based on DNA microarray expression data. Training dataset genes that are known to have the same Function f genes
3、that are known to have a different function than f,Gist,http:/microarray.cpmc.columbia.edu/gist/ Developed by William Stafford Noble etc. Contains tools for SVM classification, feature selection and kernel principal components analysis. Linux/Solaris. Installation is straightforward.,Data files,Samp
4、le.mtx (tab-delimited, same for testing) gene alpha_0X alpha_7X alpha_14X alpha_21X YMR300C -0.1 0.82 0.25 -0.51 YAL003W 0.01 -0.56 0.25 -0.17 YAL010C -0.2 -0.01 -0.01 -0.36 Sample.labels gene Respiration_chain_complexes.mipsfc YMR300C -1 YAL003W 1 YAL010C -1,Usage of Gist,$compute-weights -train sa
5、mple.mtx -class sample.labels sample.weights $classify -train sample.mtx -learned sample.weights -test test.mtx test.predict $score-svm-results -test test.labels test.predict sample.weights,Test.predict,# Generated by classify # Gist, version 2.0 .gene classification discriminant YKL197C -1 -3.349 Y
6、GL022W -1 -4.682 YLR069C -1 -2.799 YJR121W 1 0.7072,Output of score-svm-results,Number of training examples: 1644 (24 positive, 1620 negative) Number of support vectors: 60 (14 positive, 46 negative) 3.65% Training results: FP=0 FN=3 TP=21 TN=1620 Training ROC: 0.99874 Test results: FP=12 FN=1 TP=9
7、TN=801 Test ROC: 0.99397,Parameters,compute-weights -power -radial -widthfactor -posconstraint -negconstraint ,Rules of thumb,Radial basis kernel usually performs better. Scale your data. scale each attribute to 0,1 or -1,+1 to avoid over-fitting. Try different penalty parameters C for two classes i
8、n case of unbalanced data.,LIBSVM,http:/www.csie.ntu.edu.tw/cjlin/libsvm/ Developed by Chih-Jen Lin etc. Tools for (multi-class) SV classification and regression. C+/Java/Python/Matlab/Perl Linux/UNIX/Windows SMO implementation, fast!,Data files for LIBSVM,Training.dat +1 1:0.708333 2:1 3:1 4:-0.320
9、755 -1 1:0.583333 2:-1 4:-0.603774 5:1 +1 1:0.166667 2:1 3:-0.333333 4:-0.433962 -1 1:0.458333 2:1 3:1 4:-0.358491 5:0.374429 Testing.dat,Usage of LIBSVM,$svm-train -c 10 -w1 1 -w-1 5 Train.dat My.model - train classifier with penalty 10 for class 1 and penalty 50 for class 1, RBK $svm-predict Test.
10、dat My.model My.out $svm-scale Train_Test.dat Scaled.dat,Output of LIBSVM,Svm-train optimization finished, #iter = 219 nu = 0.431030 obj = -100.877286, rho = 0.424632 nSV = 132, nBSV = 107 Total nSV = 132,Output of LIBSVM,Svm-predict Accuracy = 86.6667% (234/270) (classification) Mean squared error = 0.533333 (regression) Squared correlation coefficient = 0.532639 (regression) Calculate FP, FN, TP, TN from My.out,