ImageVerifierCode 换一换
格式:PPT , 页数:47 ,大小:858.50KB ,
资源ID:378980      下载积分:2000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-378980.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(Biological networksConstruction andAnalysis.ppt)为本站会员(livefirmly316)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

Biological networksConstruction andAnalysis.ppt

1、Biological networks Construction and Analysis,Recap,Gene regulatory networks Transcription Factors: special proteins that function as “keys” to the “switches” that determine whether a protein is to be produced Gene regulatory networks try to show this “key-product” relationship and understand the re

2、gulatory mechanisms that govern the cell.We went over a simple algorithm for detecting significant patterns in these networks,Other networks?,Apart from regulation there are other events in a cell that require interaction of biological molecules Other types of molecular interactions that can be obse

3、rved in a cell enzyme ligand enzyme: a protein that catalyzes, or speeds up, a chemical reaction ligand: extracellular substance that binds to receptors metabolic pathways protein protein cell signaling pathways proteins interact physically and form large complexes for cell processes,Pathways are in

4、ter-linked,Signalling pathway,Genetic network,Metabolic pathway,STIMULUS,Interactions Pathways Network,A collection of interactions defines a network Pathways are subsets of networks All pathways are networks of interactions, however not all networks are pathways! Difference in the level of annotati

5、on or understanding We can define a pathway as a biological network that relates to a known physiological process or complete function,The “interactome”,The complete wiring of a proteome. Each vertex represents a protein. Each edge represents an “interaction” between two proteins.,An edge between tw

6、o proteins if.,The proteins interact physically and form large complexes The proteins are enzymes that catalyze two successive chemical reactions in a pathway One of the proteins regulates the expression of the other,Sources for interaction data,Literature: research labs have been conducting small-s

7、cale experiments for many years! Interaction dabases: MIPS (Munich Information center for Protein Sequences) BIND (Biomolecular Network Interaction Database) GRID (General Repository for Interaction Datasets) DIP (Database of Interacting Proteins) Experiments: Y2H (yeast two-hybrid method) APMS (aff

8、inity purification coupled with mass spectrometry),These methods provide the ability to perform genome/proteome-scale experiments. For yeast: 50,000 unique interactions involving 75% of known open reading frames (ORFs) of yeast genome However, for C. elegans they provide relatively small coverage of

9、 the genome with 5600 interactions. Problems with high-throughput experiments: Low quality, false positives, false negatives Fraction of biologically relevant interactions: 30%-50% (Deane et al. 2002),Solution:,User other indirect data sources to create a probabilistic protein network. Other sources

10、 include: Genome data: Existence of genes in multiple organisms Locations of the genes Bio-image data Gene Ontology annotations Microarray experiments Sub-cellular localization data,Probabilistic network approach,Each “interaction” link between two proteins has a posterior probability of existence,

11、based on the quality of supporting evidence.,Bayesian Network approach,Jansen et al. (2003) Science. Lee et al. (2004) Science. Combine individual probabilities of likelihood computed for each data source into a single likelihood (or probability) Naive Bayes: Assume independence of data sources Comb

12、ine likelihoods using simple multiplication,Bayesian Approach,A scalar score for a pair of genes is computed separately for each information source. Using gold positives (known interacting pairs) and gold negatives (known non-interacting pairs) interaction likelihoods for each information source is

13、computed. The product of likelihoods can be used to combine multiple information sources Assumption: A score from a source is independent from a score from another source.,Computing the likelihoods,Partition the pair scores of an information source into bins and provide likelihoods for score-ranges

14、E.g. Using the microarray information source and using Pearson correlation for scoring protein pairs you may get scores between -1 and 1. You want to know what is the likelihood of interaction for a protein pair that gets a Pearson correlation of 0.6.,Partitioning the scores,Computing the likelihood

15、,P(Interaction | Score) / P (Interaction)L = -P(Interaction | Score) / P (Interaction)Example,Protein interaction networks,Large scale (genome wide networks):,ProNet (Asthana et al.)Yeast3,112 nodes12,594 edges,Analyzing Protein Networks,Predict members of a partially known protein complex/pathway.

16、Infer individual genes functions on the basis of linked neighbors. Find strongly connected components, clusters to reveal unknown complexes. Find the best interaction path between a source and a target gene.,Simple analysis,The network can be thresholded to reveal clusters of interacting proteins,Co

17、mplex/Pathway membership problem,E.g., C. elegans cell death (apoptosis) pathway Identified 50 genes involved in the pathway. Are there other genes involved in the pathway? Biologists would like to know: Which genes (out of 15K genes) should be tested in the RNAi screens next?,Complex/pathway member

18、ship problem,Given a a set of proteins identified as the core complex (query), rank the remaining proteins in the network according to the probability that they “connect” to the core complex. This problem is very similar to the “network reliability” problem in communication networks.,Network reliabi

19、lity,Two terminal network reliability problem: Given a graph of connections between terminals: Each connection weighted by the probability that the corresponding wire is functioning at a given time What is the probability that some path of functioning wires connects two terminals at a given time?,Ex

20、act solution: NP-hard Several approximation methods exist,Monte Carlo simulation,Monte Carlo simulation (ProNet: Asthana et al. 2004) Create a sample of N binary networks from the probabilistic network (according to a Bernoulli trial on each edge based on its probability). Use breadth-first search t

21、o determine the existence of a path between the nodes (i.e., the two terminals). The fraction of sampled networks in which there exists a path between the two nodes is an approximation to the exact network reliability.,Parameters,Number of binary networks (samples) to be sampled from the probabilist

22、ic network 1000, 5000, 10000 ? The depth of the breadth-first search: complexity increases as you search for the existence of a path to a distant node. 4, 10, 20 ?,ProNet,Generate 10,000 binary networks from a probabilistic network (according to a Bernoulli trial on each edge based on its probabilit

23、y) Use breadth-first search to determine the existence of a path between two nodes Limit the maximum depth to 4 to reduce computation For each protein i in the network, count the fraction Ci of sampled networks in which there exists a path between i and the core complex. Report proteins ranked by Ci

24、,ProNet: example,Example,Complex nodes: p1 and p2,Example,Sample size: 4, maximum search depth: 3,Example,Sample size: 4, maximum search depth: 3,Cp12 = 0/4 = 0.0,Cp11 = 0/4 = 0.0,Cp10 = 0/4 = 0.0,Cp9 = 2/4 = 0.5,Cp8 = 2/4 = 0.5,Cp7 = 1/4 = 0.25,Cp6 = 0/4 = 0.0,Cp5 = 1/4 = 0.25,Cp4 = 1/4 = 0.25,Cp3

25、= 4/4 = 1.0,Results,Running time vs. sample size,What about accuracy of the technique? Is it able to give a good ranking for the nodes of the network, based on their closeness to the core?,Leave-one-out benchmark,Use known complexes to evaluate the accuracy of the method Leave one member (in turn) f

26、rom each complex/pathway. Use the rest of the complex/pathway as the starting, i.e., query, set. Examine the rank of the left-out protein. What do we expect from a good technique?,Accuracy vs. sample size,How does the sample size effect returned results?,Monte Carlo simulation,Disadvantages: What is

27、 the best choice for the number of samples? What should be the maximum depth for breadth-first search? (Need a cutoff to decrease running time) Scalability issues: May need a lot of computation time for large networks,Random Walks,Random Walks on graphs Googles page rank,Googles PageRank,Assumption:

28、 A link from page A to page B is a recommendation of page B by the author of A (we say B is successor of A) Quality of a page is related to its in-degreeRecursion: Quality of a page is related toits in-degree, and to the quality of pages linking to it PageRank BP 98,Definition of PageRank,Consider t

29、he following infinite random walk (surf): Initially the surfer is at a random page At each step, the surfer proceeds to a randomly chosen web page with probability d to a randomly chosen successor of the current page with probability 1-d The PageRank of a page p is the fraction of steps the surfer s

30、pends at p in the limit.,Random walks with restarts on interaction networks,Consider a random walker that starts on a source node, s. At every time tick, the walker chooses randomly among the available edges (based on edge weights), or goes back to node s with probability c.,s,0.2,0.4,0.4,0.1,0.3,0.

31、6,0.1,0.2,Random walks on graphs,The probability , is defined as the probability of finding the random walker at node v at time t. The steady state probability gives a measure of affinity to node s, and can be computed efficiently using iterative matrix operations.,Computing the steady state p vecto

32、r,Let s be the vector that represents the source nodes (i.e., si=1/n if node i is one the n source nodes, and 0 otherwise).Compute the following until p converges:p = (1-c)Ap + cswhere A is the column normalized adjacency matrix and c is the restart probability.,Same example,Start nodes: p1 and p2,R

33、andom walk results,Restart probability, c = 0.3,Experiments,Conducted complex/pathway membership queries on a probabilistic Yeast network: ConfidentNet (Lee et al., 4,681 nodes, 34,000 edges) Assembled a test set of 27 MIPS complexes and 10 KEGG pathways.,Leave-one-out benchmark,Leave one member (in

34、 turn) from each complex/pathway. Use the rest of the complex/pathway as the starting, i.e., query, set.Examine the rank of the left-out protein.,Leave-one-out on ConfidentNet,MIPS complex queries,Leave-one-out on ConfidentNet,KEGG pathway queries,Running time,Random Walks,Network Reliability by Monte Carlo Sampling,Total time to complete 121 MIPS complex queries,

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1