Beyond Genomics-Detecting Codes and Signals in the .ppt

上传人:wealthynice100 文档编号:378927 上传时间:2018-10-09 格式:PPT 页数:55 大小:2.45MB
下载 相关 举报
Beyond Genomics-Detecting Codes and Signals in the .ppt_第1页
第1页 / 共55页
Beyond Genomics-Detecting Codes and Signals in the .ppt_第2页
第2页 / 共55页
Beyond Genomics-Detecting Codes and Signals in the .ppt_第3页
第3页 / 共55页
Beyond Genomics-Detecting Codes and Signals in the .ppt_第4页
第4页 / 共55页
Beyond Genomics-Detecting Codes and Signals in the .ppt_第5页
第5页 / 共55页
亲,该文档总共55页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述

1、Brendan Frey,Beyond Genomics: Detecting Codes and Signals in the Cellular Transcriptome,Brendan J. Frey University of Toronto,Brendan Frey,Purpose of my talk,To identify aspects of bioinformatics in which attendees of ISIT may be able to make significant contributions,Brendan Frey,Beyond Genomics: D

2、etecting Codes and Signals in the Cellular Transcriptome,Brendan J. Frey University of Toronto,Brendan Frey,The Genome,Brendan Frey,Starting point: Discrete biological sequences,Symbols are Bases: G, C, A, TExamples of biological sequences Genes Peptides DNA RNA Chromosomes Viruses Proteins HIV,Bren

3、dan Frey,DNA Sequence (GCATTCATGC),Sexual cell reproduction,Nucleus,Chromosomes: Inherited DNA sequence,Cell replication,Brendan Frey,The genome,Genome: Chromosomal DNA sequence from an organism or speciesExamplesGenome Length (bases)Human 3,000 million (750MB)Mouse 2,600 millionFly 100 millionYeast

4、 13 million,Brendan Frey,Genes,A gene is a subsequence of the genome that encodes a functioning bio-moleculeThe library of known genes Comprises only 1% of genome sequence Increases in diversity every year Is probably far from complete,Brendan Frey,The Transcriptome,Brendan Frey,Genome: The digital

5、backbone of molecular biology,Transcripts: Perform functions encoded in the genome,Brendan Frey,Traditional genes,DNA,Brendan Frey,Traditional genes,DNA,Genome,Transcriptome,Proteome,Brendan Frey,Transcription,Upstream region,Regulatory proteins,Gene,Brendan Frey,Transcription,Upstream region,Brenda

6、n Frey,Transcription,Code: Set of regulatory codewords Signals: Concentrations of regulatory proteins and the output transcript,Codewords in the upstream region bind to corresponding regulatory proteins,Brendan Frey,Splicing of transcripts,Regulatory proteins,Brendan Frey,Transcript (RNA),Intron,Exo

7、n,Splicing of transcripts,Regulatory proteins,Brendan Frey,Transcript (RNA),Intron,Exon,Splicing of transcripts,The intron is spliced out However, splicing may occur quite differently,Brendan Frey,Splicing of transcripts,Transcript (RNA),Intron,Exon,Regulatory proteins,Brendan Frey,Splicing of trans

8、cripts,Regulatory proteins,Brendan Frey,Splicing of transcripts,Regulatory proteins,The middle exon is skipped, leading to a different transcript,Brendan Frey,TTAGAT,Regulatory proteins,Splicing of transcripts,Code: Set of regulatory codewords Signals: Concentrations of regulatory proteins and diffe

9、rent spliced transcripts,Codewords in the introns and exons bind to corresponding regulatory proteins,TGGGGT,Brendan Frey,The modern transcriptome,Genome,Cell nucleus,Brendan Frey,The modern transcriptome,Genomic DNA,Cell nucleus,SPLICING,TRANSCRIPTION in Liver,TRANSCRIPTION in Brain and Liver,SPLIC

10、ING Brain,SPLICING Liver,ncRNA,TRANSCR., it turns out to be surprising in many ways,Alternative transcripts,# genes, trans, 60% AS, 18k AS, 20% dis, 10k ncRNA,Brendan Frey,The Resources,Brendan Frey,Your collaborators can do lab work,Sequencing: Snag an actual transcript and figure out its sequence

11、Microarrays: Find out if your predicted transcript fragment is expressed in a tissue sample Mass spectrometry: Find out if a protein is present in a sample,Brendan Frey,Databases,Genomes Genome annotations Libraries of observed transcript fragments Microarray datasets containing measured concentrati

12、ons of transcripts ,Brendan Frey,Measuring transcript concentrations using microarrays,A G C C A G T G T A,2. Extract transcripts from cell,3. Add florescent tag,4. Hybridize tagged sequence to microarray,5. Excite florescent tag with laser and measure intensity,Brendan Frey,Inkjet printer technolog

13、y Hughes et al, Nature Biotech 2001,Print nucleic acid sequences using inkjet printer,Brendan Frey,Then and now,First microarrays (late 1990s) Cancer chips, gene chips, 5,000-10,000 probes per slide Noisy Current microarrays Sub-gene resolution 200,000 probes per slide Low noise Multi-chip designs a

14、re cost effective,Brendan Frey,The Case Study: Discovering protein-making transcripts using factor graphs BJ Frey, , TR Hughes Nature Genetics, September 2005,Brendan Frey,Controversy about the gene library,Despite Frey et als impressive computational reconstruction of gene structure, we argue that

15、this does not prove the complexity of the transcriptome FANTOM/RIKEN ConsortiumScience, March 2006,How it all started,Brendan Frey,Research on the transcriptome,Analysis of genome,Detection of transcripts,Our project,2001-2005,1960s-2000 2001-2006,2003-2005,Brendan Frey,Estimates of number of undisc

16、overed genes,2000,2001,2002,2003,2004,2005,Brendan Frey,Our microarrays,Our genome analysis highlighted 1 million possible exons (180,000 already known) We designed one 60-base probe for each possible exon,Brendan Frey,Pool Composition (mRNA per array hybridization) 1 Heart (2 mg), Skeletal muscle (

17、2 mg) 2 Liver (2 mg) 3 Whole brain (1.5 mg), Cerebellum (0.48 mg), Olfactory bulb (0.15 mg) 4 Colon (0.96 mg), Intestine (1.04 mg) 5 Testis (3 mg), Epididymis (0.4 mg) 6 Femur (0.9 mg), Knee (0.4 mg), Calvaria (0.06 mg), Teeth+mandible (1.3 mg), Teeth (0.4 mg) 7 15d Embryo (1.3 mg), 12.5d Embryo (12

18、.5 mg), 9.5d Embryo (0.3 mg), 14.5d Embryo head (0.25 mg), ES cells (0.24 mg) 8 Digit (1.3 mg), Tongue (0.6 mg), Trachea (0.15 mg) 9 Pancreas (1 mg), Mammary gland (0.9 mg), Adrenal gland (0.25 mg), Prostate gland (0.25 mg) 10 Salivary gland (1.26 mg), Lymph node (0.74 mg) 11 12.5d Placenta (1.15 mg

19、), 9.5d Placenta (0.5 mg), 15d Placenta (0.35 mg) 12 Lung (1 mg), Kidney (1 mg), Adipose (1 mg), Bladder (0.05 mg),Twelve pools of mouse mRNA,Our samples (37 tissues),Brendan Frey,Signal: The data (small part of the data from Chromosome 4),Example of a transcript,Code:A vector repetition code with d

20、eletions,Each column is an expression profile,Brendan Frey,The transcript model Each transcript is modeled usingA prototype expression profile# probes before prototype (eg, 1)# probes after prototype (eg, 4)Flag indicating whether each probe corresponds to an exon,Brendan Frey,The factor graph,ONLY

21、1 FREE PARAMETER: k, probability of starting a transcript,Brendan Frey,r1,t1,.,r2,t2,Transcription start/stop indicator,Relative index of prototype,Exon versus non-exon indicator,Expression profile (genomic order),r3,t3,r4,t4,r5,t5,r6,t6,s4,e4,x4,s3,e3,x3,s2,e2,x2,s1,e1,x1,s5,e5,x5,s6,e6,x6,sn,en,xn

22、,rn,tn,Probe sensitivity & noise,.,After expression data (x) is observed, the factor graph becomes a tree,Brendan Frey,.,.,After expression data (x) is observed, the factor graph becomes a tree,r1,.,t1,r2,t2,Transcription start/stop indicator,Relative index of prototype,Exon versus non-exon indicato

23、r,r3,t3,r4,t4,r5,t5,r6,t6,s4,e4,s3,e3,s2,e2,s1,e1,s5,e5,s6,e6,sn,en,rn,tn,Probe sensitivity & noise,.,Computation: The max-product algorithm performs exact inference and learning.,Brendan Frey,Summary of results *,10 X more sensitive than other transcript-based methods Detected 155,839 exons Predict

24、ed 30,000 new exons Reconciled discrepancies in thousands of known transcripts* Exon false positive rate: 2.7%,Brendan Frey,SURPRISE!,Revisiting Estimates of number of undiscovered genes,2000,2001,2002,2003,2004,2005,Brendan Frey,2000,2001,2002,2003,2004,2005,SURPRISE!,Contentious results,Brendan Fr

25、ey, We discovered new mouse protein-coding transcripts, including 5,154 encoding previously-unidentified proteins FANTOM/RIKEN ConsortiumScience, Sep 2005,We wondered: Are these really new genes?,Brendan Frey, we found that 2917 of the FANTOM proteins are in fact splice isoforms of known transcripts

26、 Frey et alScience, March 2006, the number of new protein-coding genes found by us has been revised from 5154 to 2222 FANTOM/RIKEN ConsortiumScience, March 2006,Brendan Frey,Last word, the number of completely new protein-coding genes discovered by the FANTOM consortium is at most in the hundreds Fr

27、ey et alScience, March 2006,Brendan Frey,The Closing Remarks,Brendan Frey,Producing genome-wide libraries of functioning transcripts, including Alternatively-spliced transcripts Transcripts that dont make proteins Understanding functions of transcripts Developing models of how transcription and alte

28、rnative splicing are regulated Developing models of gene interactions Genetic networks,Open problems,Brendan Frey,Should you work in computational biology?,Pluses A major scientific frontier Potential for high impact on society Minuses Mostly a collection of facts Mechanisms are complex and beyond o

29、ur control Lacking a mathematical framework,Brendan Frey,Remember, communication theory also once lacked a mathematical framework,“Ok, Zorg, lets try using a prefix code”,Brendan Frey,Should you work in computational biology?,Minuses Mostly a collection of facts Mechanisms are complex and beyond our

30、 control,Pluses A major scientific frontier Potential for high impact on society,Lacking a mathematical framework,Brendan Frey,How do you enter this field?,Hire a tutor (ie, student or postdoc) Hire a programmer Get involved in several winner projects Be prepared to drop loser projects Build mutuall

31、y-beneficial collaborations How long will it take?,Brendan Frey,For more information,As of Friday July 14, 2006:http:/www.psi.toronto.edu/isit2006.html These slides Pointers to helpful papers, databases, etc,Brendan Frey,Acknowledgements,Frey Group Quaid D Morris (postdoc) Leo Lee (postdoc) Yoseph Barash (postdoc) Ofer Shai (PhD) Inmar Givoni (PhD) Jim Huang (PhD) Marc Robinson (programmer),Genomics Collaborators Hughes Lab Blencowes Lab Emilis Lab Boones Lab,Medical Collaborators:E Sat, J Rossant, BG Bruneau, JE Aubin,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 教学课件 > 大学教育

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1