ImageVerifierCode 换一换
格式:PPT , 页数:68 ,大小:2.29MB ,
资源ID:378875      下载积分:2000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-378875.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(BCB 444-544.ppt)为本站会员(inwarn120)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

BCB 444-544.ppt

1、BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,BCB 444/544,Lecture 28Gene Prediction - finish itPromoter Prediction #28_Oct29,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Mon Oct 29 - Lecture 28Promoter & Regulatory Element Prediction Chp 9 - pp 113 - 126Wed Oct 30 - Lecture 29Phylogenetic

2、s Basics Chp 10 - pp 127 - 141Thurs Oct 31 - Lab 9 Gene & Regulatory Element PredictionFri Oct 30 - Lecture 29Phylogenetic Tree Construction Methods & Programs Chp 11 - pp 142 - 169,Required Reading (before lecture),BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Assignments & Announcements,Mon O

3、ct 29 - HW#5 - will be posted todayHW#5 = Hands-on exercises with phylogenetics and tree-building softwareDue: Mon Nov 5 (not Fri Nov 1 as previously posted),BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,BCB 544 “Team“ Projects,Last week of classes will be devoted to ProjectsWritten reports due

4、: Mon Dec 3 (no class that day)Oral presentations (20-30) will be: Wed-Fri Dec 5,6,7 1 or 2 teams will present during each class periodSee Guidelines for Projects posted online,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,BCB 544 Only: New Homework Assignment,544 Extra#2 Due: PART 1 - ASAPPART

5、 2 - meeting prior to 5 PM Fri Nov 2Part 1 - Brief outline of Project, email to Drena & Michaelafter response/approval, then: Part 2 - More detailed outline of projectRead a few papers and summarize status of problemSchedule meeting with Drena & Michael to discuss ideas,BCB 444/544 F07 ISU Dobbs #28

6、- Promoter Prediction,Seminars this Week,BCB List of URLs for Seminars related to Bioinformatics:http:/www.bcb.iastate.edu/seminars/index.htmlNov 1 Thurs - BBMB Seminar 4:10 in 1414 MBB Todd Yeates UCLA TBA -something cool about structure and evolution?Nov 2 Fri - BCB Faculty Seminar 2:10 in 102 ScI

7、 Bob Jernigan BBMB, ISU Control of Protein Motions by Structure,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Chp 8 - Gene Prediction,SECTION III GENE AND PROMOTER PREDICTIONXiong: Chp 8 Gene PredictionCategories of Gene Prediction Programs Gene Prediction in Prokaryotes Gene Prediction in Euka

8、ryotes,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Computational Gene Prediction: Approaches,Ab initio methods Search by signal: find DNA sequences involved in gene expression Search by content: Test statistical properties distinguishing coding from non-coding DNA Similarity-based methods Dat

9、abase search: exploit similarity to proteins, ESTs, cDNAs Comparative genomics: exploit aligned genomes Do other organisms have similar sequence? Hybrid methods - best,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Computational Gene Prediction: Algorithms,Neural Networks (NNs) (more on these la

10、ter)e.g., GRAILLinear discriminant analysis (LDA) (see text)e.g., FGENES, MZEFMarkov Models (MMs) & Hidden Markov Models (HMMs) e.g., GeneSeqer - uses MMs GENSCAN - uses 5th order HMMs - (see text)HMMgene - uses conditional maximum likelihood (see text),This is a new slide,BCB 444/544 F07 ISU Dobbs

11、#28- Promoter Prediction,Signals Search,Approach: Build models (PSSMs, profiles, HMMs, ) and search against DNA. Detected instances provide evidence for genes,This is a new slide,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Content Search,Observation: Encoding a protein affects statistical pro

12、perties of DNA sequence: Nucleotide.amino acid distribution GC content (CpG islands, exon/intron) Uneven usage of synonymous codons (codon bias) Hexamer frequency - most discriminative of these for identifying coding potentialMethod: Evaluate these differences (coding statistics) to differentiate be

13、tween coding and non-coding regions,This is a new slide,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Human Codon Usage,This is a new slide,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Predicting Genes based on Codon Usage Differences,Algorithm: Process sliding window Use codon frequencie

14、s to compute probability of coding versus non-coding Plot log-likelihood ratio:,This is a new slide,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,In different genomes: Translate DNA into all 6 reading frames and search against proteins (TBLASTX,BLASTX, etc.)Within same genome: Search with EST/c

15、DNA database (EST2genome, BLAT, etc.).Problems: Will not find “new” or RNA genes (non-coding genes). Limits of similarity are hard to define Small exons might be overlooked,Similarity-Based Methods: Database Search,This is a new slide,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Similarity-Bas

16、ed Methods: Comparative Genomics,Idea: Functional regions are more conserved than non-functional ones; high similarity in alignment indicates geneAdvantages: May find uncharacterized or RNA genes Problems: Finding suitable evolutionary distance Finding limits of high similarity (functional regions),

17、This is a new slide,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Human-Mouse Homology,Comparison of 1196 orthologous genes Sequence identity between genes in human vs mouse Exons: 84.6% Protein: 85.4% Introns: 35% 5 UTRs: 67% 3 UTRs: 69%,This is a new slide,BCB 444/544 F07 ISU Dobbs #28- Promo

18、ter Prediction,Thanks to Volker Brendel, ISU for the following Figs & Slides,Slightly modified from:BSSI Genome Informatics Module http:/www.bioinformatics.iastate.edu/BBSI/course_desc_2005.html#moduleBV Brendel vbrendeliastate.edu,Brendel et al (2004) Bioinformatics 20: 1157,BCB 444/544 F07 ISU Dob

19、bs #28- Promoter Prediction,Perform pairwise alignment with large gaps in one sequence (due to introns) Align genomic DNA with cDNA, ESTs, protein sequences Score semi-conserved sequences at splice junctions Using Bayesian probability model & 1st order MMScore coding constraints in translated exons

20、Using Bayesian model,Spliced Alignment Algorithm,GeneSeqer - Brendel et al.- ISU,http:/deepc2.psi.iastate.edu/cgi-bin/gs.cgi,Brendel et al (2004) Bioinformatics 20: 1157 http:/bioinformatics.oxfordjournals.org/cgi/content/abstract/20/7/1157,Brendel 2005,BCB 444/544 F07 ISU Dobbs #28- Promoter Predic

21、tion,i: ith position in sequence : avg information content over all positions 20 nt from splice site : avg sample standard deviation of ,Splice Site Detection,Do DNA sequences surrounding splice “consensus“ sequences contribute to splicing signal?,YES,Brendel 2005,BCB 444/544 F07 ISU Dobbs #28- Prom

22、oter Prediction,Information Content vs Position,Which sequences are exons & which are introns?How can you tell?,Brendel 2005,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Markov Model for Spliced Alignment,Brendel 2005,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Evaluation of Splice Site

23、 Prediction,Fig 5.11 Baxevanis & Ouellette 2005,This is a new slide,TP = positive instance correctly predicted as positive FP = negative instance incorrectly predicted as positive TN = negative instance correctly predicted as negative FN = positive instance incorrectly predicted as negative,Right!,B

24、CB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Evaluation of Predictions,Normalized specificity:,Specificity:,Misclassification rates:,Coverage,Sensitivity:,Predicted Positives,True Positives,False Positives,Recall,Do not memorize this!,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Evaluatio

25、n of Predictions - in English,Specificity:,Sensitivity:,= Coverage,In English? Sensitivity is the fraction of all positive instances having a true positive prediction.,= Recall,In English? Specificity is the fraction of all predicted positives that are, in fact, true positives.,IMPORTANT: in medical

26、 jargon, Specificity is sometimes defined differently (what we define here as “Specificity“ is sometimes referred to as “Positive predictive value“),IMPORTANT: Sensitivity alone does not tell us much about performance because a 100% sensitivity can be achieved trivially by labeling all test cases po

27、sitive!,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Best Measures for Comparison?,ROC curves (Receiver Operating Characteristic (?!) http:/en.wikipedia.org/wiki/Roc_curveCorrelation Coefficient Matthews correlation coefficient (MCC)MCC = 1 for a perfect prediction0 for a completely random ass

28、ignment-1 for a “perfectly incorrect“ prediction,Do not memorize this!,In signal detection theory, a receiver operating characteristic (ROC), or ROC curve is a plot of sensitivity vs (1 - specificity) for a binary classifier system as its discrimination threshold is varied. The ROC can also be repre

29、sented equivalently by plotting fraction of true positives (TPR = true positive rate) vs fraction of false positives (FPR = false positive rate),This slide has been changed,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Brendel 2005,GeneSeqer: Inputhttp:/deepc2.psi.iastate.edu/cgi-bin/gs.cgi,BCB

30、 444/544 F07 ISU Dobbs #28- Promoter Prediction,Brendel 2005,GeneSeqer: Output,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Brendel 2005,GeneSeqer: Gene Evidence Summary,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Gene Prediction - Problems & Status?,Common errors? False positive interg

31、enic regions: 2 annotated genes actually correspond to a single gene False negative intergenic region: One annotated gene structure actually contains 2 genes False negative gene prediction: Missing gene (no annotation) Other: Partially incorrect gene annotation Missing annotation of alternative tran

32、scriptsCurrent status? For ab initio prediction in eukaryotes: HMMs have better overall performance for detecting intron/exon boundaries Limitation? Training data: predictions are organism specific Combined ab initio/homology based predictions: Improved accurracy Limitation? Availability of identifi

33、able sequence homologs in databases,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Recommended Gene Prediction Software,Ab initio GENSCAN: http:/genes.mit.edu/GENSCAN.html GeneMark.hmm: http:/exon.gatech.edu/GeneMark/ others: GRAIL, FGENES, MZEF, HMMgene Similarity-based BLAST, GenomeScan, EST2G

34、enome, Twinscan Combined: GeneSeqer, http:/deepc2.psi.iastate.edu/cgi-bin/gs.cgi ROSETTA Consensus: because results depend on organisms & specific task, Always use more than one program! Two servers hat report consensus predictions GeneComber DIGIT,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,

35、Other Gene Prediction Resources: at ISU,http:/www.bioinformatics.iastate.edu/bioinformatics2go/,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Other Gene Prediction Resources: GaTech, MIT, Stanford, etc.,Current Protocols in Bioinformatics (BCB/ISU owns a copy - currently in my lab!)Chapter 4 Fi

36、nding Genes 4.1 An Overview of Gene Identification: Approaches, Strategies, and Considerations 4.2 Using MZEF To Find Internal Coding Exons 4.3 Using GENEID to Identify Genes 4.4 Using GlimmerM to Find Genes in Eukaryotic Genomes 4.5 Prokaryotic Gene Prediction Using GeneMark and GeneMark.hmm 4.6 Eu

37、karyotic Gene Prediction Using GeneMark.hmm 4.7 Application of FirstEF to Find Promoters and First Exons in the Human Genome 4.8 Using TWINSCAN to Predict Gene Structures in Genomic DNA Sequences 4.9 GrailEXP and Genome Analysis Pipeline for Genome Annotation 4.10 Using RepeatMasker to Identify Repe

38、titive Elements in Genomic Sequences,Lists of Gene Prediction Softwarehttp:/www.bioinformaticsonline.org/links/ch_09_t_1.htmlhttp:/cmgm.stanford.edu/classes/genefind/,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Chp 9 - Promoter & Regulatory Element Prediction,SECTION III GENE AND PROMOTER PRE

39、DICTIONXiong: Chp 9 Promoter & Regulatory Element PredictionPromoter & Regulatory Elements in Prokaryotes Promoter & Regulatory Elements in Eukaryotes Prediction Algorithms,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Eukaryotic genomes Are packaged in chromatin & sequestered in a nucleus Are

40、larger and have multiple linear chromosomes Contain mostly non-protein coding DNA (98-99%)Prokarytic genomes DNA is associated with a nucleoid, but no nucleus Much larger, usually single, circular chromosome Contain mostly protein encoding DNA,Eukaryotes vs Prokaryotes: Genomes,BCB 444/544 F07 ISU D

41、obbs #28- Promoter Prediction,Eukaryotes vs Prokryotes: Gene Structure,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Eukaryotic genes Are larger and more complex than in prokaryotes Contain introns that are “spliced” out to generate mature mRNAs* Often undergo alternative splicing, giving rise

42、to multiple RNAs* Are transcribed by 3 different RNA polymerases (instead of 1, as in prokaryotes)* In biology, statements such as this include an implicit “usually” or “often”,Eukaryotes vs Prokaryotes: Genes,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Primary level of control?Prokaryotes: T

43、ranscription initiation Eukaryotes: Transcription is also very important, but Expression is regulated at multiple levelsmany of which are post-transcriptional: RNA processing, transport, stability Translation initiation Protein processing, transport, stability Post-translational modification (PTM) S

44、ubcellular localizationRecent important discoveries: small regulatory RNAs (miRNA, siRNA) are abundant and play very important roles in controlling gene expression in eukaryotes, often at post-transcriptional levels,Eukaryotes vs Prokaryotes: Levels of Gene Regulation,BCB 444/544 F07 ISU Dobbs #28-

45、Promoter Prediction,Eukaryotes vs Prokaryotes: Regulatory Elements,Prokaryotes:Promoters & operators (for operons) - cis-acting DNA signalsActivators & repressors - trans-acting proteins (we wont discuss these)Eukaryotes:Promoters & enhancers (for single genes) - cis-acting Transcription factors - t

46、rans-actingImportant difference? What the RNA polymerase actually binds,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Prokaryotic Promoters,RNA polymerase complex recognizes promoter sequences located very close to and on 5 side (“upstream”) of tansription initiation siteProkaryotic RNA polymer

47、ase complex binds directly to promoter, by virtue of its sigma subunit - no requirement for “transcription factors” binding first Prokaryotic promoter sequences are highly conserved: -10 region -35 region,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Eukaryotic Promoters,Eukaryotic RNA polymera

48、se complexes do not bind directly to promoter sequencesTranscription factors must bind first and serve as landmarks recognized by RNA polymerase complexesEukaryotic promoter sequences are less highly conserved, but many promoters (for RNA polymerase II) contain : -30 region “TATA“ box -100 region “C

49、CAAT“ box,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Eukaryotic Promoters vs Enhancers,Both promoters & enhancers are binding sites for transcription factors (TFs) Promoters essential for initiation of transcription located “relatively” close to start site (usually 100 kb),BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Eukaryotic genes are transcribed by 3 different RNA polymerases (Location of promoter regions, TFBSs & TFs differ, too),BIOS Scientific Publishers Ltd, 1999,Brown Fig 9.18,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1