1、03/09/11,Oracle Life Science Day & User Group Meeting,Integrating Biological Information using Oracle - KEGG at Kyoto University & the PATHWAY database project,Susumu Goto Bioinformatics Center, Institute for Chemical Research Kyoto University,03/09/11,Oracle Life Science Day & User Group Meeting,Co
2、ntents,Introduction to KEGG Kyoto Encyclopedia of Genes and Genomes Integrated database for pathways, chemical reactions, genomes, expression, and more Data representation with graphs XML representation of PATHWAY database Comparison between pathways and other data Path computation in pathways Appli
3、cation to Oracle 10g Network Model Storing information on PATHWAY as binary relations in Oracle Path computation using Oracle,03/09/11,Oracle Life Science Day & User Group Meeting,KEGG Kyoto Encyclopedia of Genes and Genomes,Integrated Database of Biological Systems Information for Post-genomic era
4、Genomes, genes, pathways of completely sequenced organisms Functional annotation for each gene by comparative genomics Pathway reconstruction based on the annotation A system for computing and comparing biological networks from molecular interaction data Graph representation and application of graph
5、 algorithms http:/www.genome.ad.jp/kegg/,03/09/11,Oracle Life Science Day & User Group Meeting,Information on relations between molecules,Databases in KEGG,Pathway,Genomes Genes,Expression,Chemicals and their reactions,Orthologs,Sequence similarity,03/09/11,Oracle Life Science Day & User Group Meeti
6、ng,GENES and PATHWAY,GENES: 400,000 genes from over 100 organisms Parsing GenBank and EMBL for completely sequenced genomes Parsing LocusLink and RefSeq for model organisms such as human and mouse KEGG annotates function of each gene based on sequence similarity PATHWAY: over 100 maps Metabolic path
7、ways, regulatory pathways and protein complexes Manually drawn and classified Information collected from various text books, literatures and web pages,03/09/11,Oracle Life Science Day & User Group Meeting,Metabolic pathway map,03/09/11,Oracle Life Science Day & User Group Meeting,Graph Representatio
8、n of Metabolic Pathways and Chemical Compounds,Metabolic Pathways Image maps and position of each object on them Graph 1 Node: chemical compounds, Link: enzymatic reactions Graph 2 Node: enzymes, Link: neighborhood relations of enzymes on pathway maps Chemical Compounds Graph 1 Node: atoms, Link: bo
9、nds between atoms Graph 2 for carbohydrates Node: sugars, Link: glycosylation bonds,03/09/11,Oracle Life Science Day & User Group Meeting,Graph representation of PATHWAY (1),03/09/11,Oracle Life Science Day & User Group Meeting,XML representation of PATHWAY (1),Graph 1: Compound as a node / Enzyme a
10、s a link,.,Producing four binary relations from a reaction with two substrates and two products,C00036,C00024,C00158,C00010,R00351,03/09/11,Oracle Life Science Day & User Group Meeting,Graph representation of PATHWAY (2),03/09/11,Oracle Life Science Day & User Group Meeting,XML representation of PAT
11、HWAY (2),Graph 2. Enzyme as a node / Compound as a link,.,C00036,R001325,C00158,R00342,R00351,Producing binary relations between two enzymes with a compound as a link name,03/09/11,Oracle Life Science Day & User Group Meeting,NODE_ID NODE_NAME ACT COSTS SAMPLE_ID ENTRY_ID - - - - - -1 C00022 Y 1 Pyr
12、uvate 492 C00122 Y 1 Fumarate 503 C00036 Y 1 Oxaloacetate 514 C05379 Y 1 Oxalosuccinate 525 C00074 Y 1 Phosphoenolpyruvate 536 C00024 Y 1 Acetyl-CoA 547 C00149 Y 1 (S)-Malate 558 C00311 Y 1 Isocitrate 569 C00417 Y 1 cis-Aconitate 5710 C00042 Y 1 Succinate 58:,LINK_ID LINK_NAME START_NODE_ID END_NODE
13、_ID ACT COST SAMPLE_ID - - - - - - -1 1.1.1.42 (rn:R00268) 4 19 Y 1 isocitrate dehydrogenase (NADP)2 1.1.1.42 (rn:R00268) 19 4 Y 1 isocitrate dehydrogenase (NADP)3 1.1.1.42 (rn:R00268) 4 20 Y 1 isocitrate dehydrogenase (NADP)4 1.1.1.42 (rn:R00268) 20 4 Y 1 isocitrate dehydrogenase (NADP)5 4.1.1.49 (
14、rn:R00341) 3 19 Y 1 phosphoenolpyruvate carboxykinase (ATP)6 4.1.1.49 (rn:R00341) 3 5 Y 1 phosphoenolpyruvate carboxykinase (ATP)7 1.1.1.37 (rn:R00342) 3 7 Y 1 malate dehydrogenase8 6.4.1.1 (rn:R00344) 1 3 Y 1 pyruvate carboxylase:,Pathway Data Definition in Oracle Example,Graph 1 case,Node table,Li
15、nk table,Supplementary information,03/09/11,Oracle Life Science Day & User Group Meeting,Path computation supported by the Oracle Network Model,Shortest path between two nodes Computing the shortest path between two specified compounds All paths search between two nodes Computing all alternative pat
16、hs between two specified compounds,03/09/11,Oracle Life Science Day & User Group Meeting,Overlaying a result of path computation onto the existing pathway map,03/09/11,Oracle Life Science Day & User Group Meeting,Hierarchy of similar proteins,2.3.1.61,2.3.1.39,2.3.1.41,2.3.1. other than aminoacyl gr
17、oups,2.3. acyltransferase,2.3.2. aminoacyltransferase,2. transferase,2.3.2.2,2.3.2.6,EC numbers specify the hierarchical classification of enzyme reactions,03/09/11,Oracle Life Science Day & User Group Meeting,Results of path computation using query relaxation,Enzymes that the target organism does n
18、ot have,03/09/11,Oracle Life Science Day & User Group Meeting,Searching alternative pathways between two compounds,03/09/11,Oracle Life Science Day & User Group Meeting,Other applications using Oracle,SSDB: Database of sequence similarities Binary relations for pairs of similar sequences 190,000,000
19、 relations http:/ssdb.genome.ad.jp/ Annotation tool A tool for functional annotation of genes in GENES Annotation can be done using Best hit and bidirectional best hit relations in SSDB Genomic position information LIGAND chemical database http:/www.genome.ad.jp/ligand/ Based on MDLs ISIS database S
20、ubstructure search,03/09/11,Oracle Life Science Day & User Group Meeting,03/09/11,Oracle Life Science Day & User Group Meeting,Possible application for future,LinkDB: database of related entries Binary relations between database entries related with cross-references 50 databases and 70 millions rela
21、tions http:/www.genome.ad.jp/dbget-bin/www_linkdb Extraction of cliques Orthologous groups in SSDB Comparison between networks Comparing and extracting correlated clusters from different networks Metabolic pathways and genomic positions Protein interaction networks and expression similarities,03/09/
22、11,Oracle Life Science Day & User Group Meeting,Searching similar compounds,Searching a maximal common subgraphs Counting matching atoms Calculating weight as Jaccard coefficient,03/09/11,Oracle Life Science Day & User Group Meeting,Summary,KEGG: Kyoto Encyclopedia of Genes and Genomes Integrated da
23、tabase for pathways, reactions, genomes, expression infomration, and more Graph representation Comparisons of pathways Path computation in pathway Using Oracle 10g Network Model Efficient and effective pathway analysis will be achieved, especially for pathway computation using graph search algorithm
24、s embedded in Oracle. Various types and large amount of network data such as network of protein interactions, database entries are expected. It will be much more effective if the graph comparison algorithms can be easily applied.,03/09/11,Oracle Life Science Day & User Group Meeting,KEGG project tea
25、m,Project Leader Minoru Kanehisa System Development Team: Susumu Goto, Kotaro Shiraishi, Kayo Okamoto, Satoshi Miyazaki, Tomomi Kamiya, Yoko Sato, Akihiro Nakaya, Shuichi Kawashima, Koichiro Tonomura, Junji Fukumoto, Koichi Ohkubo Data Entry Team: Miho Furumichi, Junko Yabuzaki, Nobue Takeuchi, Yuri
26、ko Matsuura, Masami Hamajima, Rumiko Yamamoto, Tomoko Komeno, Toshi Nakatani, Junko Nishida, Atsuko Tanaka, Megumi Yamaguchi, Tomoko Deno, Ayumi Kirioka, Tomoko Hattori, Kana Matsumoto, Hiroko Shino, Sanae Asanuma, Junko Yamamoto Curators: Takaaki Nishioka, Yasushi Okuno, Masahiro Hattori, Toshiaki Katayama, Yoshinobu Igarashi, Keun-joon Park, Akiyasu Yoshizawa, Vachiranee Limviphuvadh,