ImageVerifierCode 换一换
格式:PPT , 页数:32 ,大小:556KB ,
资源ID:378043      下载积分:2000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。 如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-378043.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(Additive Models, Trees, etc..ppt)为本站会员(feelhesitate105)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

Additive Models, Trees, etc..ppt

1、Additive Models, Trees, etc.,Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan,Predictive Modeling,Goal: learn a mapping: y = f(x;) Need: 1. A model structure2. A score function3. An optimization strategyCategorical y c1,cm: classification Real-valued y: regression Note: u

2、sually assume c1,cm are mutually exclusive and exhaustive,Generalized Additive Models,Highly flexible form of predictive modeling for regression and classification:,g (“link function”) could be the identity or logit or log or whatever The f s are smooth functions often fit using natural cubic spline

3、s,Basic Backfitting Algorithm,arbitrary smoother - could be natural cubic splines,Example using Rs gam function,library(mgcv) set.seed(0) n-400 x0 - runif(n, 0, 1) x1 - runif(n, 0, 1) x2 - runif(n, 0, 1) x3 - runif(n, 0, 1) pi - asin(1) * 2 f - 2 * sin(pi * x0) f - f + exp(2 * x1) - 3.75887 f - f +

4、0.2 * x211 * (10 * (1 - x2)6 +10 * (10 * x2)3 * (1 - x2)10 - 1.396 e - rnorm(n, 0, 2) y - f + e b-gam(ys(x0)+s(x1)+s(x2)+s(x3) summary(b) plot(b,pages=1),http:/www.math.mcgill.ca/sysdocs/R/library/mgcv/html/gam.html,Tree Models,Easy to understand: recursively divide predictor space into regions wher

5、e response variable has small variance Predicted value is majority class (classification) or average value (regression) Can handle mixed data, missing values, etc. Usually grow a large tree and prune it back rather than attempt to optimally stop the growing process,Training Dataset,This follows an e

6、xample from Quinlans ID3,Output: A Decision Tree for “buys_computer”,age?,overcast,student?,credit rating?,no,yes,fair,excellent,=30,40,no,no,yes,yes,yes,3040,Confusion matrix,Algorithms for Decision Tree Induction,Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive divi

7、de-and-conquer manner At start, all the training examples are at the root Attributes are categorical (if continuous-valued, they are discretized in advance) Examples are partitioned recursively based on selected attributes Test attributes are selected on the basis of a heuristic or statistical measu

8、re (e.g., information gain) Conditions for stopping partitioning All samples for a given node belong to the same class There are no remaining attributes for further partitioning majority voting is employed for classifying the leaf There are no samples left,Information Gain (ID3/C4.5),Select the attr

9、ibute with the highest information gain Assume there are two classes, P and N Let the set of examples S contain p elements of class P and n elements of class N The amount of information, needed to decide if an arbitrary example in S belongs to P or N is defined as,e.g. I(0.5,0.5)=1; I(0.9,0.1)=0.47;

10、 I(0.99,0.01)=0.08;,Information Gain in Decision Tree Induction,Assume that using attribute A a set S will be partitioned into sets S1, S2 , , Sv If Si contains pi examples of P and ni examples of N, the entropy, or the expected information needed to classify objects in all subtrees Si isThe encodin

11、g information that would be gained by branching on A,Attribute Selection by Information Gain Computation,Class P: buys_computer = “yes” Class N: buys_computer = “no” I(p, n) = I(9, 5) =0.940 Compute the entropy for age:,HenceSimilarly,Gini Index (IBM IntelligentMiner),If a data set T contains exampl

12、es from n classes, gini index, gini(T) is defined aswhere pj is the relative frequency of class j in T. If a data set T is split into two subsets T1 and T2 with sizes N1 and N2 respectively, the gini index of the split data contains examples from n classes, the gini index gini(T) is defined asThe at

13、tribute provides the smallest ginisplit(T) is chosen to split the node,Avoid Overfitting in Classification,The generated tree may overfit the training data Too many branches, some may reflect anomalies due to noise or outliers Result is in poor accuracy for unseen samples Two approaches to avoid ove

14、rfitting Prepruning: Halt tree construction earlydo not split a node if this would result in the goodness measure falling below a threshold Difficult to choose an appropriate threshold Postpruning: Remove branches from a “fully grown” treeget a sequence of progressively pruned trees Use a set of dat

15、a different from the training data to decide which is the “best pruned tree”,Approaches to Determine the Final Tree Size,Separate training (2/3) and testing (1/3) sets Use cross validation, e.g., 10-fold cross validation Use minimum description length (MDL) principle: halting growth of the tree when

16、 the encoding is minimized,Dietterich (1999) Analysis of 33 UCI datasets,Missing Predictor Values,For categorical predictors, simply create a value “missing” For continuous predictors, evaluate split using the complete cases; once a split is chosen find a first “surrogate predictor” that gives the m

17、ost similar split Then find the second best surrogate, etc. At prediction time, use the surrogates in order,Bagging and Random Forests,Big trees tend to have high variance and low bias Small trees tend to have low variance and high bias Is there some way to drive the variance down without increasing

18、 bias? Bagging can do this to some extent,Nave Bayes Classification,Recall: p(ck |x) p(x| ck)p(ck) Now suppose:Then: Equivalently:,C,x1,x2,xp,“weights of evidence”,Evidence Balance Sheet,Nave Bayes (cont.),Despite the crude conditional independence assumption, works well in practice (see Friedman, 1

19、997 for a partial explanation) Can be further enhanced with boosting, bagging, model averaging, etc. Can relax the conditional independence assumptions in myriad ways (“Bayesian networks”),Patient Rule Induction (PRIM),Looks for regions of predictor space where the response variable has a high avera

20、ge value Iterative procedure. Starts with a region including all points. At each step, PRIM removes a slice on one dimension If the slice size a is small, this produces a very patient rule induction algorithm,PRIM Algorithm,Start with all of the training data, and a maximal box containing all of the

21、 data Consider shrinking the box by compressing along one face, so as to peel off the proportion a of observations having either the highest values of a predictor Xj or the lowest. Choose the peeling that produces the highest response mean in the remaining box Repeat step 2 until some minimal number of observations remain in the box Expand the box along any face so long as the resulting box mean increases Use cross-validation to choose a box from the sequence of boxes constructed above. Call the box B1 Remove the data in B1 from the dataset and repeat steps 2-5.,

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1