BCB 444-544.ppt

上传人:inwarn120 文档编号:378875 上传时间:2018-10-09 格式:PPT 页数:68 大小:2.29MB
下载 相关 举报
BCB 444-544.ppt_第1页
第1页 / 共68页
BCB 444-544.ppt_第2页
第2页 / 共68页
BCB 444-544.ppt_第3页
第3页 / 共68页
BCB 444-544.ppt_第4页
第4页 / 共68页
BCB 444-544.ppt_第5页
第5页 / 共68页
亲,该文档总共68页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述

1、BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,BCB 444/544,Lecture 28Gene Prediction - finish itPromoter Prediction #28_Oct29,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Mon Oct 29 - Lecture 28Promoter & Regulatory Element Prediction Chp 9 - pp 113 - 126Wed Oct 30 - Lecture 29Phylogenetic

2、s Basics Chp 10 - pp 127 - 141Thurs Oct 31 - Lab 9 Gene & Regulatory Element PredictionFri Oct 30 - Lecture 29Phylogenetic Tree Construction Methods & Programs Chp 11 - pp 142 - 169,Required Reading (before lecture),BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Assignments & Announcements,Mon O

3、ct 29 - HW#5 - will be posted todayHW#5 = Hands-on exercises with phylogenetics and tree-building softwareDue: Mon Nov 5 (not Fri Nov 1 as previously posted),BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,BCB 544 “Team“ Projects,Last week of classes will be devoted to ProjectsWritten reports due

4、: Mon Dec 3 (no class that day)Oral presentations (20-30) will be: Wed-Fri Dec 5,6,7 1 or 2 teams will present during each class periodSee Guidelines for Projects posted online,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,BCB 544 Only: New Homework Assignment,544 Extra#2 Due: PART 1 - ASAPPART

5、 2 - meeting prior to 5 PM Fri Nov 2Part 1 - Brief outline of Project, email to Drena & Michaelafter response/approval, then: Part 2 - More detailed outline of projectRead a few papers and summarize status of problemSchedule meeting with Drena & Michael to discuss ideas,BCB 444/544 F07 ISU Dobbs #28

6、- Promoter Prediction,Seminars this Week,BCB List of URLs for Seminars related to Bioinformatics:http:/www.bcb.iastate.edu/seminars/index.htmlNov 1 Thurs - BBMB Seminar 4:10 in 1414 MBB Todd Yeates UCLA TBA -something cool about structure and evolution?Nov 2 Fri - BCB Faculty Seminar 2:10 in 102 ScI

7、 Bob Jernigan BBMB, ISU Control of Protein Motions by Structure,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Chp 8 - Gene Prediction,SECTION III GENE AND PROMOTER PREDICTIONXiong: Chp 8 Gene PredictionCategories of Gene Prediction Programs Gene Prediction in Prokaryotes Gene Prediction in Euka

8、ryotes,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Computational Gene Prediction: Approaches,Ab initio methods Search by signal: find DNA sequences involved in gene expression Search by content: Test statistical properties distinguishing coding from non-coding DNA Similarity-based methods Dat

9、abase search: exploit similarity to proteins, ESTs, cDNAs Comparative genomics: exploit aligned genomes Do other organisms have similar sequence? Hybrid methods - best,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Computational Gene Prediction: Algorithms,Neural Networks (NNs) (more on these la

10、ter)e.g., GRAILLinear discriminant analysis (LDA) (see text)e.g., FGENES, MZEFMarkov Models (MMs) & Hidden Markov Models (HMMs) e.g., GeneSeqer - uses MMs GENSCAN - uses 5th order HMMs - (see text)HMMgene - uses conditional maximum likelihood (see text),This is a new slide,BCB 444/544 F07 ISU Dobbs

11、#28- Promoter Prediction,Signals Search,Approach: Build models (PSSMs, profiles, HMMs, ) and search against DNA. Detected instances provide evidence for genes,This is a new slide,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Content Search,Observation: Encoding a protein affects statistical pro

12、perties of DNA sequence: Nucleotide.amino acid distribution GC content (CpG islands, exon/intron) Uneven usage of synonymous codons (codon bias) Hexamer frequency - most discriminative of these for identifying coding potentialMethod: Evaluate these differences (coding statistics) to differentiate be

13、tween coding and non-coding regions,This is a new slide,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Human Codon Usage,This is a new slide,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Predicting Genes based on Codon Usage Differences,Algorithm: Process sliding window Use codon frequencie

14、s to compute probability of coding versus non-coding Plot log-likelihood ratio:,This is a new slide,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,In different genomes: Translate DNA into all 6 reading frames and search against proteins (TBLASTX,BLASTX, etc.)Within same genome: Search with EST/c

15、DNA database (EST2genome, BLAT, etc.).Problems: Will not find “new” or RNA genes (non-coding genes). Limits of similarity are hard to define Small exons might be overlooked,Similarity-Based Methods: Database Search,This is a new slide,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Similarity-Bas

16、ed Methods: Comparative Genomics,Idea: Functional regions are more conserved than non-functional ones; high similarity in alignment indicates geneAdvantages: May find uncharacterized or RNA genes Problems: Finding suitable evolutionary distance Finding limits of high similarity (functional regions),

17、This is a new slide,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Human-Mouse Homology,Comparison of 1196 orthologous genes Sequence identity between genes in human vs mouse Exons: 84.6% Protein: 85.4% Introns: 35% 5 UTRs: 67% 3 UTRs: 69%,This is a new slide,BCB 444/544 F07 ISU Dobbs #28- Promo

18、ter Prediction,Thanks to Volker Brendel, ISU for the following Figs & Slides,Slightly modified from:BSSI Genome Informatics Module http:/www.bioinformatics.iastate.edu/BBSI/course_desc_2005.html#moduleBV Brendel vbrendeliastate.edu,Brendel et al (2004) Bioinformatics 20: 1157,BCB 444/544 F07 ISU Dob

19、bs #28- Promoter Prediction,Perform pairwise alignment with large gaps in one sequence (due to introns) Align genomic DNA with cDNA, ESTs, protein sequences Score semi-conserved sequences at splice junctions Using Bayesian probability model & 1st order MMScore coding constraints in translated exons

20、Using Bayesian model,Spliced Alignment Algorithm,GeneSeqer - Brendel et al.- ISU,http:/deepc2.psi.iastate.edu/cgi-bin/gs.cgi,Brendel et al (2004) Bioinformatics 20: 1157 http:/bioinformatics.oxfordjournals.org/cgi/content/abstract/20/7/1157,Brendel 2005,BCB 444/544 F07 ISU Dobbs #28- Promoter Predic

21、tion,i: ith position in sequence : avg information content over all positions 20 nt from splice site : avg sample standard deviation of ,Splice Site Detection,Do DNA sequences surrounding splice “consensus“ sequences contribute to splicing signal?,YES,Brendel 2005,BCB 444/544 F07 ISU Dobbs #28- Prom

22、oter Prediction,Information Content vs Position,Which sequences are exons & which are introns?How can you tell?,Brendel 2005,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Markov Model for Spliced Alignment,Brendel 2005,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Evaluation of Splice Site

23、 Prediction,Fig 5.11 Baxevanis & Ouellette 2005,This is a new slide,TP = positive instance correctly predicted as positive FP = negative instance incorrectly predicted as positive TN = negative instance correctly predicted as negative FN = positive instance incorrectly predicted as negative,Right!,B

24、CB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Evaluation of Predictions,Normalized specificity:,Specificity:,Misclassification rates:,Coverage,Sensitivity:,Predicted Positives,True Positives,False Positives,Recall,Do not memorize this!,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Evaluatio

25、n of Predictions - in English,Specificity:,Sensitivity:,= Coverage,In English? Sensitivity is the fraction of all positive instances having a true positive prediction.,= Recall,In English? Specificity is the fraction of all predicted positives that are, in fact, true positives.,IMPORTANT: in medical

26、 jargon, Specificity is sometimes defined differently (what we define here as “Specificity“ is sometimes referred to as “Positive predictive value“),IMPORTANT: Sensitivity alone does not tell us much about performance because a 100% sensitivity can be achieved trivially by labeling all test cases po

27、sitive!,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Best Measures for Comparison?,ROC curves (Receiver Operating Characteristic (?!) http:/en.wikipedia.org/wiki/Roc_curveCorrelation Coefficient Matthews correlation coefficient (MCC)MCC = 1 for a perfect prediction0 for a completely random ass

28、ignment-1 for a “perfectly incorrect“ prediction,Do not memorize this!,In signal detection theory, a receiver operating characteristic (ROC), or ROC curve is a plot of sensitivity vs (1 - specificity) for a binary classifier system as its discrimination threshold is varied. The ROC can also be repre

29、sented equivalently by plotting fraction of true positives (TPR = true positive rate) vs fraction of false positives (FPR = false positive rate),This slide has been changed,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Brendel 2005,GeneSeqer: Inputhttp:/deepc2.psi.iastate.edu/cgi-bin/gs.cgi,BCB

30、 444/544 F07 ISU Dobbs #28- Promoter Prediction,Brendel 2005,GeneSeqer: Output,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Brendel 2005,GeneSeqer: Gene Evidence Summary,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Gene Prediction - Problems & Status?,Common errors? False positive interg

31、enic regions: 2 annotated genes actually correspond to a single gene False negative intergenic region: One annotated gene structure actually contains 2 genes False negative gene prediction: Missing gene (no annotation) Other: Partially incorrect gene annotation Missing annotation of alternative tran

32、scriptsCurrent status? For ab initio prediction in eukaryotes: HMMs have better overall performance for detecting intron/exon boundaries Limitation? Training data: predictions are organism specific Combined ab initio/homology based predictions: Improved accurracy Limitation? Availability of identifi

33、able sequence homologs in databases,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Recommended Gene Prediction Software,Ab initio GENSCAN: http:/genes.mit.edu/GENSCAN.html GeneMark.hmm: http:/exon.gatech.edu/GeneMark/ others: GRAIL, FGENES, MZEF, HMMgene Similarity-based BLAST, GenomeScan, EST2G

34、enome, Twinscan Combined: GeneSeqer, http:/deepc2.psi.iastate.edu/cgi-bin/gs.cgi ROSETTA Consensus: because results depend on organisms & specific task, Always use more than one program! Two servers hat report consensus predictions GeneComber DIGIT,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,

35、Other Gene Prediction Resources: at ISU,http:/www.bioinformatics.iastate.edu/bioinformatics2go/,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Other Gene Prediction Resources: GaTech, MIT, Stanford, etc.,Current Protocols in Bioinformatics (BCB/ISU owns a copy - currently in my lab!)Chapter 4 Fi

36、nding Genes 4.1 An Overview of Gene Identification: Approaches, Strategies, and Considerations 4.2 Using MZEF To Find Internal Coding Exons 4.3 Using GENEID to Identify Genes 4.4 Using GlimmerM to Find Genes in Eukaryotic Genomes 4.5 Prokaryotic Gene Prediction Using GeneMark and GeneMark.hmm 4.6 Eu

37、karyotic Gene Prediction Using GeneMark.hmm 4.7 Application of FirstEF to Find Promoters and First Exons in the Human Genome 4.8 Using TWINSCAN to Predict Gene Structures in Genomic DNA Sequences 4.9 GrailEXP and Genome Analysis Pipeline for Genome Annotation 4.10 Using RepeatMasker to Identify Repe

38、titive Elements in Genomic Sequences,Lists of Gene Prediction Softwarehttp:/www.bioinformaticsonline.org/links/ch_09_t_1.htmlhttp:/cmgm.stanford.edu/classes/genefind/,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Chp 9 - Promoter & Regulatory Element Prediction,SECTION III GENE AND PROMOTER PRE

39、DICTIONXiong: Chp 9 Promoter & Regulatory Element PredictionPromoter & Regulatory Elements in Prokaryotes Promoter & Regulatory Elements in Eukaryotes Prediction Algorithms,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Eukaryotic genomes Are packaged in chromatin & sequestered in a nucleus Are

40、larger and have multiple linear chromosomes Contain mostly non-protein coding DNA (98-99%)Prokarytic genomes DNA is associated with a nucleoid, but no nucleus Much larger, usually single, circular chromosome Contain mostly protein encoding DNA,Eukaryotes vs Prokaryotes: Genomes,BCB 444/544 F07 ISU D

41、obbs #28- Promoter Prediction,Eukaryotes vs Prokryotes: Gene Structure,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Eukaryotic genes Are larger and more complex than in prokaryotes Contain introns that are “spliced” out to generate mature mRNAs* Often undergo alternative splicing, giving rise

42、to multiple RNAs* Are transcribed by 3 different RNA polymerases (instead of 1, as in prokaryotes)* In biology, statements such as this include an implicit “usually” or “often”,Eukaryotes vs Prokaryotes: Genes,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Primary level of control?Prokaryotes: T

43、ranscription initiation Eukaryotes: Transcription is also very important, but Expression is regulated at multiple levelsmany of which are post-transcriptional: RNA processing, transport, stability Translation initiation Protein processing, transport, stability Post-translational modification (PTM) S

44、ubcellular localizationRecent important discoveries: small regulatory RNAs (miRNA, siRNA) are abundant and play very important roles in controlling gene expression in eukaryotes, often at post-transcriptional levels,Eukaryotes vs Prokaryotes: Levels of Gene Regulation,BCB 444/544 F07 ISU Dobbs #28-

45、Promoter Prediction,Eukaryotes vs Prokaryotes: Regulatory Elements,Prokaryotes:Promoters & operators (for operons) - cis-acting DNA signalsActivators & repressors - trans-acting proteins (we wont discuss these)Eukaryotes:Promoters & enhancers (for single genes) - cis-acting Transcription factors - t

46、rans-actingImportant difference? What the RNA polymerase actually binds,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Prokaryotic Promoters,RNA polymerase complex recognizes promoter sequences located very close to and on 5 side (“upstream”) of tansription initiation siteProkaryotic RNA polymer

47、ase complex binds directly to promoter, by virtue of its sigma subunit - no requirement for “transcription factors” binding first Prokaryotic promoter sequences are highly conserved: -10 region -35 region,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Eukaryotic Promoters,Eukaryotic RNA polymera

48、se complexes do not bind directly to promoter sequencesTranscription factors must bind first and serve as landmarks recognized by RNA polymerase complexesEukaryotic promoter sequences are less highly conserved, but many promoters (for RNA polymerase II) contain : -30 region “TATA“ box -100 region “C

49、CAAT“ box,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Eukaryotic Promoters vs Enhancers,Both promoters & enhancers are binding sites for transcription factors (TFs) Promoters essential for initiation of transcription located “relatively” close to start site (usually 100 kb),BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,Eukaryotic genes are transcribed by 3 different RNA polymerases (Location of promoter regions, TFBSs & TFs differ, too),BIOS Scientific Publishers Ltd, 1999,Brown Fig 9.18,BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 教学课件 > 大学教育

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1