ImageVerifierCode 换一换
格式:PPT , 页数:98 ,大小:2.82MB ,
资源ID:377825      下载积分:2000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。 如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-377825.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(A Field Guide part 2.ppt)为本站会员(刘芸)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

A Field Guide part 2.ppt

1、A Field Guide part 2,August 30, 2005,University of Colorado Health Sciences Center,Part 2,Entrez: text searching a GenBank record preview/index,BLAST: sequence searching pre-computed searches algorithms whats new?,VAST: structure searching,Example: mapping oligos to a genome,GenBank Records,The Flat

2、file Format,A Typical GenBank Record,LOCUS NM_019570 4279 bp mRNA linear INV 28-OCT-2004 DEFINITION Mus musculus REV1-like(S. cerevisiae)(Rev1l),mRNA ACCESSION NM_019570 VERSION NM_019570.3 GI:50811869 KEYWORDS .,GenBank Record: Feature Table,GenPept identifier,GenBank Record: Feature Table, cont.,G

3、enBank Record: sequence,skip,Indexing for Nucleotide UID 59958365,Field Indexed Termsprimary accession NM_001012399 title Bos taurus hemochromatosis (hfe), mRNA. organism Bos taurus sequence length 1168 modification date 2005/02/19 properties biomol mrnagbdiv mamsrcdb refseq,Global Entrez Search: HF

4、E,HFE,Entrez Nucleotide: HFE,137 records,Not HFE,Smarter Query,hfetitle,AND humanorgn,hfetitle AND humanorgn (cont),Primary data,Preview/Index,Preview/Index,Preview/Index: Properties, srcdb,Properties,Preview/Index: Properties, srcdb,AND srcdb refseqProperties,Preview/Index: Properties, srcdb,AND sr

5、cdb ddbj/embl/genbankProperties,#1 hfe 137 #2 hfetitle AND humanorgn 42#3 #2 AND srcdb refseqprop 11 #4 #2 AND srcdb ddbj/embl/genbankprop 31,Database Queries,#5 #4 AND gbdiv priprop 29 #4 #4 AND gbdiv estprop 2,Molecule Queries,#1 hfe 116 #2 hfetitle AND humanorgn 42#3 #2 AND biomol mrnaprop 29 #4

6、#2 AND biomol genomicprop 13,More Queries,Fields are database-specific,Other Entrez Databases,UniSTS: markers on the Genethon map of human chromosome 12 GenethonMap Name AND humanorganism AND 12chromosome,UniGene: rat clusters that have at least one mRNA ratorganism NOT 0mrna count,Structure: struct

7、ures of bacterial kinases with resolutions below 2 bacteriaorganism AND kinase AND 000.00:002.00resolution,SNP: uniquely mapped microsatellites on human chr2 microsatSNP Class AND 1Map Weight AND 2Chromosome) AND humanorgn,Basic Local Alignment Search Tool,BLAST Web Searches, 2005,200,000,Nucleotide

8、 or protein: Related SequencesBLAST link: BLink,Precomputed BLAST Services,Transcript clusters: UniGeneProtein homologs: HomoloGene,Link to Related Sequences,Related Sequences,Most similar,Least similar,BLink (BLAST Link),BLink Output,Global vs Local Alignment,Global vs Local Alignment,Seq1: WHEREIS

9、WALTERNOW (16aa) Seq2: HEWASHEREBUTNOWISHERE (21aa),The Flavors of BLAST,Standard BLAST nucleotide, protein and translations (blastn, blastp, blastx, tblastn, tblastx) traditional “contiguous” word hit Megablast optimized for large batch searches can use discontiguous words PSI-BLAST constructs PSSM

10、s automatically; uses as query very sensitive protein search RPS BLAST searches a database of PSSMs tool for conserved domain searches,“contiguous”,discontiguous,Fast - heuristic approach based on Smith WatermanLocal alignmentsStatistical significance- Expect valueVersatile- blastn, blastp, blastx,

11、tblastn, tblastx, rps-blast, psi-blast- www, standalone, and network clients,Why Is BLAST So Popular?,How BLAST Works,Make lookup table of “words” for query Scan database for hits Ungapped extensions of hits (initial HSPs) Gapped extensions (no traceback) Gapped extensions (traceback; alignment deta

12、ils),Nucleotide Words,GTACTGGACAT TACTGGACATGACTGGACATGGCTGGACATGGATGGACATGGACGGACATGGACCGACATGGACCCACATGGACCCT,Make a lookup table of words,. . .,Protein Words,GTQ TQIQITITVTVEVEDEDLDLF.,Make a lookup table of words, -f 11 = blastp default ,Minimum Requirements for a Hit,Nucleotide BLAST requires o

13、ne exact matchProtein BLAST requires two neighboring matches within 40 aa,GTQITVEDLFYNISEI YYN,ATCGCCATGCTTAATTGGGCTTCATGCTTAATT,neighborhood words,one exact match,two matches, -A 40 = blastp default ,BLASTP Summary,High-scoring pair (HSP),Scoring Systems - Nucleotides,A G C T A +1 3 3 -3 G 3 +1 3 -

14、3 C 3 3 +1 -3 T 3 3 3 +1,Identity matrix,CAGGTAGCAAGCTTGCATGTCA | | | raw score = 19-9 = 10 CACGTAGCAAGCTTG-GTGTCA, -r 1 -q -3 ,Scoring Systems - Proteins,Position Independent Matrices PAM Matrices (Percent Accepted Mutation)Derived from observation; small dataset of alignmentsImplicit model of evol

15、utionAll calculated from PAM1PAM250 widely used BLOSUM Matrices (BLOck SUbstitution Matrices)Derived from observation; large dataset of highly conserved blocksEach matrix derived separately from blocks with a defined percent identity cutoffBLOSUM62 - default matrix for BLAST Position Specific Score

16、Matrices (PSSMs)PSI- and RPS-BLAST,A 4 R -1 5 N -2 0 6 D -2 -2 1 6 C 0 -3 -3 -3 9 Q -1 1 0 0 -3 5 E -1 0 0 2 -4 2 5 G 0 -2 0 -1 -3 -2 -2 6 H -2 0 1 -1 -3 0 0 -2 8 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 F -

17、2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 V 0 -3 -3 -3 -1

18、-2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 X 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -2 -1 -1 -1A R N D C Q E G H I L K M F P S T W Y V X,BLOSUM62,Position-Specific Score Matrix,DAF-1,Serine/Threonine protein kinases catalytic loop,A R N D C Q E G H I L K M F P S T W Y V435 K -1 0 0 -1 -2 3 0 3

19、 0 -2 -2 1 -1 -1 -1 -1 -1 -1 -1 -2 436 E 0 1 0 2 -1 0 2 -1 0 -1 -1 0 0 0 -1 0 0 -1 -1 -1 437 S 0 0 -1 0 1 1 0 1 1 0 -1 0 0 0 2 0 -1 -1 0 -1438 N -1 0 -1 -1 1 0 -1 3 3 -1 -1 1 -1 0 0 -1 -1 1 1 -1 439 K -2 1 1 -1 -2 0 -1 -2 -2 -1 -2 5 1 -2 -2 -1 -1 -2 -2 -1 440 P -2 -2 -2 -2 -3 -2 -2 -2 -2 -1 -2 -1 0

20、-3 7 -1 -2 -3 -1 -1441 A 3 -2 1 -2 0 -1 0 1 -2 -2 -2 0 -1 -2 3 1 0 -3 -3 0442 M -3 -4 -4 -4 -3 -4 -4 -5 -4 7 0 -4 1 0 -4 -4 -2 -4 -1 2 443 A 4 -4 -4 -4 0 -4 -4 -3 -4 4 -1 -4 -2 -3 -4 -1 -2 -4 -3 4444 H -4 -2 -1 -3 -5 -2 -2 -4 10 -6 -5 -3 -4 -3 -2 -3 -4 -5 0 -5 445 R -4 8 -3 -4 0 -1 -2 -3 -2 -5 -4 0

21、-3 -2 -4 -3 -3 0 -4 -5 446 D -4 -4 -1 8 -6 -2 0 -3 -3 -5 -6 -3 -5 -6 -4 -2 -3 -7 -5 -5447 I -4 -5 -6 -6 -3 -4 -5 -6 -5 3 5 -5 1 1 -5 -5 -3 -4 -3 1448 K 0 0 1 -3 -5 -1 -1 -3 -3 -5 -5 7 -4 -5 -3 -1 -2 -5 -4 -4 449 S 0 -3 -2 -3 0 -2 -2 -3 -3 -4 -4 -2 -4 -5 2 6 2 -5 -4 -4450 K 0 3 0 1 -5 0 0 -4 -1 -4 -3

22、 4 -3 -2 2 1 -1 -5 -4 -4451 N -4 -3 8 -1 -5 -2 -2 -3 -1 -6 -6 -2 -4 -5 -4 -1 -2 -6 -4 -5452 I -3 -5 -5 -6 0 -5 -5 -6 -5 6 2 -5 2 -2 -5 -4 -3 -5 -3 3 453 M -4 -4 -6 -6 -3 -4 -5 -6 -5 0 6 -5 1 0 -5 -4 -3 -4 -3 0 454 V -3 -3 -5 -6 -3 -4 -5 -6 -5 3 3 -4 2 -2 -5 -4 -3 -5 -3 5 455 K -2 1 1 4 -5 0 -1 -2 1

23、-4 -2 4 -3 -2 -3 0 -1 -5 -2 -3456 N 1 1 3 0 -4 -1 1 0 -3 -4 -4 3 -2 -5 -2 2 -2 -5 -4 -4 457 D -3 -2 5 5 -1 -1 1 -1 0 -5 -4 0 -2 -5 -1 0 -2 -6 -4 -5458 L -3 -1 0 -3 0 -3 -2 3 -4 -2 3 0 1 1 -2 -2 -3 5 -1 -3,Position-Specific Score Matrix,catalytic loop,Local Alignment Statistics,High scores of local a

24、lignments between two random sequences follow the Extreme Value Distribution,Score (S),Alignments,Expect Value E = number of database hits you expect to find by chance, S,your score,expected number of random hits,More info: www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html,Gapped Alignments,Gappin

25、g provides more biologically realistic alignmentsGapped BLAST parameters are simulated for each scoring matrixAffine gap costs = -(a+bk) a = gap open penalty b = gap extend penalty A gap of length 1 receives the score -(a+b),An Alignment BLAST Cannot Make,1 GAATATATGAAGACCAAGATTGCAGTCCTGCTGGCCTGAACC

26、ACGCTATTCTTGCTGTTG| | | | | | | | | | | | | | | | | | |1 GAGTGTACGATGAGCCCGAGTGTAGCAGTGAAGATCTGGACCACGGTGTACTCGTTGTCG61 GTTACGGAACCGAGAATGGTAAAGACTACTGGATCATTAAGAACTCCTGGGGAGCCAGTT| | | | | | | | | | | | | |61 GCTATGGTGTTAAGGGTGGGAAGAAGTACTGGCTCGTCAAGAACAGCTGGGCTGAATCCT121 GGGGTGAACAAGGTTATTTCAGGCTT

27、GCTCGTGGTAAAAAC| | | | | | | | | |121 GGGGAGACCAAGGCTACATCCTTATGTCCCGTGACAACAAC,Reason: no contiguous exact match of 7 bp.,BLAST 2 Sequences (blastx) output:,An Alignment BLAST Can Make,Solution: compare protein sequences; BLASTX,Score = 290 bits (741), Expect = 7e-77 Identities = 147/331 (44%), Pos

28、itives = 206/331 (61%), Gaps = 8/331 (2%) Frame = +3,Other BLAST Algorithms,Megablast Discontiguous Megablast PSI-BLAST PHI-BLAST,Megablast: NCBIs Genome Annotator,Long alignments of similar DNA sequences Greedy algorithm Concatenation of query sequences Faster than blastn; less sensitive,MegaBLAST

29、& Word Size,Trade-off: sensitivity vs speed,Too fast for you?,MegaBLAST & Word Size,Trade-off: sensitivity vs speed,Discontiguous Megablast,Uses discontiguous word matches Better for cross-species comparisons,Templates for Discontiguous Words,W = 11, t = 16, coding: 1101101101101101 W = 11, t = 16,

30、non-coding: 1110010110110111 W = 12, t = 16, coding: 1111101101101101 W = 12, t = 16, non-coding: 1110110110110111 W = 11, t = 18, coding: 101101100101101101 W = 11, t = 18, non-coding: 111010010110010111 W = 12, t = 18, coding: 101101101101101101 W = 12, t = 18, non-coding: 111010110010110111 W = 1

31、1, t = 21, coding: 100101100101100101101 W = 11, t = 21, non-coding: 111010010100010010111 W = 12, t = 21, coding: 100101101101100101101 W = 12, t = 21, non-coding: 111010010110010010111,Reference: Ma, B, Tromp, J, Li, M. PatternHunter: faster and more sensitive homology search. Bioinformatics March

32、, 2002; 18(3):440-5,W = word size; # matches in template t = template length,Discontiguous (Cross-species) MegaBLAST,Discontiguous Word Options,MegaBLAST vs Discontiguous MegaBLAST,NM_017460,Homo sapiens cytochrome P450, family 3, subfamily A, polypeptide 4 (CYP3A4), transcript variant 1, mRNA (2768

33、 letters),vs Drosophila,MegaBLAST vs Discontiguous MegaBLAST,MegaBLAST = “No significant similarity found.”,Discontiguous megaBLAST =,Another Example . . .,Discontiguous megaBLAST = numerous hits . . .,Query: NM_078651 Drosophila melanogaster CG18582-PA (mbt) mRNA, (3244 bp) /note= mushroom bodies t

34、iny; synonyms: Pak2, STE20, dPAK2,MegaBLAST = “No significant similarity found.”,Database: nr (nt), Mammaliaorgn,Ex: Discontiguous MegaBLAST,Ex: BLASTN,PSI-BLAST,Example: Confirming relationships of purine nucleotide metabolism proteins,Position-specific Iterated BLAST,gi|113340|sp|P03958|ADA_MOUSE

35、ADENOSINE DEAMINASE (ADENOSINE MAQTPAFNKPKVELHVHLDGAIKPETILYFGKKRGIALPADTVEELRNIIGMDKPLSLPGF VIAGCREAIKRIAYEFVEMKAKEGVVYVEVRYSPHLLANSKVDPMPWNQTEGDVTPDDVVD EQAFGIKVRSILCCMRHQPSWSLEVLELCKKYNQKTVVAMDLAGDETIEGSSLFPGHVEAY RTVHAGEVGSPEVVREAVDILKTERVGHGYHTIEDEALYNRLLKENMHFEVCPWSSYLTGA VRFKNDKANYSLNTDDPLIFK

36、STLDTDYQMTKKDMGFTEEEFKRLNINAAKSSFLPEEEKK,PSI-BLAST,0.005,E value cutoff for PSSM,RESULTS: Initial BLASTP,Same results as protein-protein BLAST; different format,Results of First PSSM Search,Other purine nucleotide metabolizing enzymes not found by ordinary BLAST,Tenth PSSM Search: Convergence,Just b

37、elow threshold, another nucleotide metabolism enzyme,Reverse PSI-BLAST (RPS)-BLAST,Adenosine/AMP Deaminase Domain,. . .,PHI-BLAST,gi|231729|sp|P30429|CED4_CAEEL CELL DEATH PROTEIN 4 MLCEIECRALSTAHTRLIHDFEPRDALTYLEGKNIFTEDHSELISKMSTRLERIANFLRIYRRQASE LIDFFNYNNQSHLADFLEDYIDFAINEPDLLRPVVIAPQFSRQMLDRKLL

38、LGNVPKQMTCYIREYHV IKKLDEMCDLDSFFLFLHGRAGSGKSVIASQALSKSDQLIGINYDSIVWLKDSGTAPKSTFDLFTDI LKSEDDLLNFPSVEHVTSVVLKRMICNALIDRPNTLFVFDDVVQEETIRWAQELRLRCLVTTRDVEI ASQTCEFIEVTSLEIDECYDFLEAYGMPMPVGEKEEDVLNKTIELSSGNPATLMMFFKSCEPKTFEK,GAxxxxGKST,Genome BLAST,Genome BLAST via Map Viewer,Example Search Pathways: H

39、emochromatosis,Gene,“hemochromatosis” HFE,nucleotide sequence,Example: Human Genome BLAST,Human Genome BLAST: Results,Human Genome BLAST: MapViewer,Whats New?,BLAST Databases,Nucleotide refseq_rna = NM_*, XM_* refseq_genomic = NC_*, NG_* env_nt environmental samplefilter, e.g., 16S rRNAProtein refse

40、q = NP_*, XP_* env_nr,New Formatter,Select lower case,Select red,New Formatter,gray line = same database hithsps color-coded independently,BLAST Output: Alignments & Filter,low complexity sequence filtered,Advanced Options,Limit to Organism,allfilter NOT ma,Example Entrez QueriesallFilter NOT mammal

41、iaOrganismray finned fishesOrganismsrcdb refseqPropertiesNucleotide only:biomol mrnaPropertiesbiomol genomicPropertiesOtherAdvancede 10000 expect value-v 2000 descriptions-b 2000 alignments,-e 10000 -v 2000,Searching by Structure,Why search for similar structures?Find homologs with low sequence simi

42、larityExplore protein evolution: similar protein folds can support different functionsIdentify conserved core elements to model related proteins of unknown structure,Indexing into MMDB,Structure,MMDB Molecular Modeling Data Base,Structure Summary,Conserved Domains,3D Domain Neighbors,Structure Neigh

43、bors,3D Domains,1,3,2,4,Conserved Domains,SH3,SH2,VAST: Alignment,For each protein chain,locate SSEs (secondary structure elements),represent SSEs as individual vectors,1,2,3,4,5,6,Human IL-4,IL-4 & Leptin,align the vectors.,VAST,Structure neighbors,Taq DNA polymerase,VAST Results for the Chain,Tabl

44、e view,VAST,Vector Alignment Search Tool,3D Domain structure neighbors,VAST Results for Domain 1,Not found with Chain query!,Best way to convert PDB files to MMDB format for viewing with Cn3D!,submit file to PDB,Example: Mapping Oligos Onto a Genome,forward CCATGGCGACCCTGGAAAAGCreverse CAGCAGCGGCTGT

45、GCCTGCGG,?,?,?,Map Oligos Onto Genome,CCATGGCGACCCTGGAAAAGCNNNNNNNNNNCAGCAGCGGCTGTGCCTGCGG,-W 7 e 1000,Genome BLAST Results,Primer Alignments,forward primer,reverse primer,MapViewer,MapViewer,Sequence View (sv),forward,reverse,Service Addresses,BLAST blast-helpncbi.nlm.nih.gov General Help infoncbi.nlm.nih.gov Wayne Matten mattenncbi.nlm.nih.gov,

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1