ImageVerifierCode 换一换
格式:PPT , 页数:35 ,大小:460KB ,
资源ID:379287      下载积分:2000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-379287.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(Cancer Classification with Data-dependent Kernels.ppt)为本站会员(lawfemale396)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

Cancer Classification with Data-dependent Kernels.ppt

1、2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,1,Cancer Classification with Data-dependent Kernels,Anne Ya Zhang (with Xue-wen Chen & Huilin Xiong) EECS & ITTC University of Kansas,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,2,Outline,Intr

2、oduction Data-dependent Kernel Results Conclusion,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,3,Cancer facts,Cancer is a group of many related diseases Cells continue to grow and divide and do not die when they should. Changes in the genes that control normal cell gro

3、wth and death. Cancer is the second leading cause of death in the United States Cancer causes 1 of every 4 deaths NIH estimate overall costs for cancer in 2004 at $189.8 billion ($64.9 billion for direct medical cost) Cancer types Breast cancer, Lung cancer, Colon cancer, Death rates vary greatly by

4、 cancer type and stage at diagnosis,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,4,Motivation,Why do we need to classify cancers? The general way of treating cancer is to: Categorize the cancers in different classes Use specific treatment for each of the classes Tradit

5、ional way to classify cancers Morphological appearanceNot accurate! Enzyme-based histochemical analyses. Immunophenotyping. Cytogenetic analysis.Complicated & needs highly specialized laboratories,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,5,Motivation,Why traditiona

6、l ways are not enough ? There exists some tumors in the same class with completely different clinical courses May be more accurate classification is needed Assigning new tumors to known cancer classes is not easy e.g. assigning an acute leukemia tumor to one of the AML (acute myeloid leukemia) ALL (

7、acute lymphoblastic leukemia),2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,6,DNA Microarray-based Cancer Diagnosis,Cancer is caused by changes in the genes that control normal cell growth and death. Molecular diagnostics offer the promise of precise, objective, and sys

8、tematic cancer classification These tests are not widely applied because characteristic molecular markers for most solid tumors have to be identified. Recently, microarray tumor gene expression profiles have been used for cancer diagnosis.,2018/10/10,DIMACS Workshop on Machine Learning Techniques in

9、 Bioinformatics,7,Microarray,A microarray experiment monitors the expression levels for thousands of genes simultaneously. Microarray techniques will lead to a more complete understanding of the molecular variations among tumors, hence to a more reliable classification.,2018/10/10,DIMACS Workshop on

10、 Machine Learning Techniques in Bioinformatics,8,Microarray,Microarray analysis allows the monitoring of the activities of thousands of genes over many different conditions. From a machine learning point of view,The large volume of the data requires the computational aid in analyzing the expression

11、data.,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,9,Machine learning tasks in cancer classification,There are three main types of machine learning problems associated with cancer classification: The identification of new cancer classes using gene expression profiles T

12、he classification of cancer into known classes The identifications of “marker” genes that characterize the different cancer classes In this presentation, we focus on the second type of problems.,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,10,Project Goals,To develop a

13、 more systematic machine learning approach to cancer classification using microarray gene expression profiles.Use an initial collection of samples belonging to the known classes of cancer to create a “class predictor” for new, unknown, samples.,2018/10/10,DIMACS Workshop on Machine Learning Techniqu

14、es in Bioinformatics,11,Challenges in cancer classification,Gene expression data are typically characterized by high dimensionality (i.e. a large number of genes) small sample sizeCurse of dimensionality!,Methods Kernel techniques Data resampling Gene selection,AML,2018/10/10,DIMACS Workshop on Mach

15、ine Learning Techniques in Bioinformatics,12,Outline,Introduction Data-dependent Kernel Results Conclusion,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,13,Data-dependent kernel model,Optimizing the data-dependent kernel is to choose the coefficient vector,Data dependen

16、t,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,14,Optimizing the kernel,Criterion for kernel optimizationMaximum class separability of the training data in the kernel-induced feature space,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,15,T

17、he Kernel Optimization,In reality, the matrix N0 is usually singular,: eigenvector corresponding to the largest eigenvalue,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,16,Kernel optimization,Before Kernel Optimization,After Kernel Optimization,Training data,Test data,2

18、018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,17,Distributed resampling,Original training data: Training data with resampling:,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,18,Gene selection,A filter method: class separability,2018/10/10,DIM

19、ACS Workshop on Machine Learning Techniques in Bioinformatics,19,Outline,Introduction Data-dependent Kernel Results Conclusion,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,20,Comparison with other methods,k-Nearest Neighbor (kNN) Diagonal linear discriminant analysis (

20、DLDA) Uncorrelated Linear Discriminant analysis (ULDA) Support vector machines (SVM),2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,21,Data sets,AML,Subtypes: ALL vs. AML,Status of Estrogen receptor,Status of lymph nodal,Outcome of treatment,Tumor vs. healthy tissue,Subt

21、ypes: MPM vs. ADCA,Different lymphomas cells,Cancer vs. non-cancer,Tumor vs. healthy tissue,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,22,Experimental setup,Data normalization Zero mean and unity variance at the gene direction Random partition data into two disjoint

22、subsets of equal size training data + test data Repeat each experiment 100 times,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,23,Parameters,DLDA: no parameter KNN: Euclidean distance, K=3 ULDA: K=3 SVM: Gaussian kernel, use leave-one-out on the training data to tune pa

23、rameters KerNN: Gaussian kernel for basic kernel k0, 0 andare empirically set. Use leave-one-out on the training data to tune the rest parameters. KNN for classification,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,24,Effect of data resampling,Prostate 102 samples,Lung

24、 181 samples,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,25,Effect of gene selection,ALL-AML,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,26,Effect of gene selection,Colon,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioi

25、nformatics,27,Effect of gene selection,Prostate,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,28,Comparison results,ALL-AML,BreastER,BreastLN,Colon,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,29,Comparison results,CNS,lung,Ovarian,Prostat

26、e,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,30,Outline,Introduction Data-dependent Kernel Results Conclusion,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,31,Conclusion,By maximizing the class separability of training data, the data-dep

27、endent kernel is also able to increase the separability of test data. The kernel method is robust to high dimensional microarray data The distributed resampling strategy helps to alleviate the problem of overfitting,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,32,Concl

28、usion,The classifier assign samples more accurately than other approaches so we can have better treatments respectively. The method can be used for clarifying unusual cases e.g. a patient which was diagnosed as AML but with atypical morphology. The method can be applied to distinctions relating to f

29、uture clinical outcomes.,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,33,Future work,How to estimate the parameters Study the genes selected,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,34,Reference,H. Xiong, M.N.S. Swamy, and M.O. Ahmad.

30、 Optimizing the data-dependent kernel in the empirical feature space. IEEE Trans. on Neural Networks 2005, 16:460-474. H. Xiong, Y. Zhang, and X. Chen. Data-dependent Kernels for Cancer Classification. Under review. A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, and Z. Yakhini. Tissue c

31、lassification with gene expression profiles. J. Computational Biology 2000, 7:559-584. S. Dudoit, J. Fridlyand, and T.P. Speed. Comparison of discrimination method for the classification of tumor using gene expression data. J. Am. Statistical Assoc. 2002, 97:77-87 T.S. Furey, N. Cristianini, N. Duff

32、y, D.W. Bednarski, M. Schummer, and D. Haussler. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 2000, 16:906-914. J. Ye, T. Li, T. Xiong, and R. Janardan. Using uncorrelated discriminant analysis for tissue classification with gene expression data. IEEE/ACM Trans. on Computational Biology and Bioinformatics 2004, 1:181-190.,2018/10/10,DIMACS Workshop on Machine Learning Techniques in Bioinformatics,35,Thanks! Questions?,

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1