ImageVerifierCode 换一换
格式:PPT , 页数:47 ,大小:811KB ,
资源ID:373176      下载积分:2000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-373176.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(A Probabilistic Approach toHigh Throughput Drug Discovery.ppt)为本站会员(花仙子)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

A Probabilistic Approach toHigh Throughput Drug Discovery.ppt

1、A Probabilistic Approach to High Throughput Drug Discovery,Introduction and Motivation Probability Modeling in Drug Discovery Representation of Chemical Structures (Descriptors) Focused Combinatorial Library Design Summary and Outlook,2,High Throughput Screening,Large-scale automation of biological

2、assays (HTS) Use robotics to perform 10,000 to 100,000 screens per day Brute-force approach to drug discovery: “rapidly screen all compounds” Noteworthy drawbacks to HTS: Economics: $1-$5 per assay (provided large collections are assayed) Logistics: compound formatting, inventory systems and other o

3、verhead Precision Loss: effective “binary” measurement: active/inactive (pass/fail) High Error Rate: assay, synthesis failure, sample degradation, registration Resulting effects: Quality for quantity tradeoff - lots of low quality data High level of noise (error) in data makes interpretation very di

4、fficult HTS has gained acceptance and is routinely used to generate lead compounds for drug discovery projects,3,Sources of Compounds for HTS,Initial screening libraries (first libraries used in project) Historical “in-house” collection of compounds augmented with compounds purchased from external s

5、uppliers 1 million+ compounds available means initial screening library must be designed (diversity retained using fewer numbers of compounds) Receptor biased initial screening libraries are a possibility Follow-up libraries Parallel synthesis / combinatorial chemistry is an excellent source of larg

6、e numbers of (new) compounds Synthesis of “all” analogs around a lead structure exhibits poor diversity but very good for “local” exploration and lead follow-up External screening compound purchasing and in-house combinatorial chemistry efforts have gained acceptance and are routinely used in lead g

7、eneration and follow-up,4,High Throughput Discovery Cycle,Brute-force HTS not practical At least 10 trillion stable drug candidates At 1 billion screens per day 27 years are needed to screen all 10 trillion A discovery cycle can be used to reduce total screens Use HTS data to affect the selection of

8、 compounds to screen next Scale-up of the traditional experimental discovery cycle,5,Required Technology for HTD Cycle,High Throughput Screening facility Parallel synthesis and combinatorial chemistry capabilities Methodology for automatically analyzing HTS data Humans find it difficult to interpret

9、 large amounts of noisy data Automatic HTS QSAR technology necessary for HTD cycle Methodology for designing focused combinatorial libraries HTS QSAR results are used to bias a combinatorial library towards activity ADME properties and other design criteria should be taken into account Meaningful re

10、presentation of compounds Collection of molecular descriptors meaningful across projects (avoid time consuming variable selection procedures) Definition of a “chemistry space” for diversity studies (design of initial screening libraries),Probability Modeling in Drug Discovery,7,Probabilistic Formali

11、sm (Bayesian Inference),Step 1: Write all observables as a joint probability density; e.g., Pr (A,B,C) Step 2: Decompose density using probability theory and Bayes theorem until components are measurable; e.g., Pr (A,B,C) = Pr (B | A,C) Pr (C | A) Pr (A) Step 3: Model each component in product from

12、a database or experimental data set Step 4: Make predictions or estimates using computed model of Pr(A,B,C),8,Probabilities in Speech Recognition,Successful speech recognizers select (predict) an output word sequence from an input waveform by maximizing the joint likelihood Pr (WAVE, WORDS) This is

13、used (in part) to solve the isophonetic word sequence problem; e.g., “imadam” can be “Im Adam” or “Im a Dam” or “eye mad am” Pr (WAVE, WORDS) = Pr (WAVE | WORDS) Pr (WORDS) Pr(WORDS) is the prior probability of a word sequence (utterance) Pr(WAVE | WORDS) is used to score the waveform under the assu

14、mption or hypothesis that the word sequence is WORDS Build model of Pr(WORDS) by training on, say, 500,000,000 words of newspaper text (the prior knowledge) Pr(WORDS) effectively depresses importance of unlikely utterances in favor of more plausible statements (real phrases),9,Probabilities in Drug

15、Discovery,Notation: Y = active(0/1) D = drugable(0/1) S = structure Decompose:Product of probabilities balances competing goals Classification alone (e.g., RP) is not enough: weighted outcomes needed Methodology similar to “soft” classification problems or fuzzy logic Any method of probability model

16、ing is valid (e.g., histogram, analytic) Approximations introduced can be clearly identified e.g., Pr (D | Y, S) Pr (D | S) : drugability is independent of activity (!?),Drugable given active structure (approximated by “is drug-like” efforts),Activity assuming structure (probabilistic QSAR efforts),

17、10,Pr(Y|X) via Binary QSAR,If Y is “binary activity” and X is a descriptor vector thenPathology of Binary QSAR is reasonable If new structure is outside the training set then Pr(Y=1), the hit rate, is used to make predictions (no other information available),Active,Inactive,X1,Xk,Xk+1,Xn,Pr(Y),Pr(X|

18、Y),X1,Xn,Active,Inactive,Active,Inactive,Pr(X),Pr(Y|X),Bayes Theorem,11,Distribution Estimates,Four distributions in formula are of two types Pr(Y=0), Pr(Y=1) Prior probability of inactive/active Pr(X=x|Y=0), Pr(X=x|Y=1) Probability of ligand assuming inactive/active Modeling assumption: independent

19、 uncorrelated! Decompose multi-dimensional distribution into a product Estimate 2n+2 distributions instead of original four Binary QSAR Algorithm Compute descriptor vectors di De-correlate descriptors xi = Q(di - u) Estimate distributions from xi ,yi Pr (X = x | Y = y) Assemble p (x) Pr (Y = 1 | X =

20、 x) Predict for new descriptors d p (Q (d - u),12,Experience with Binary QSAR,Fundamental methodology publication (robustness study) Biocomputing Proceedings of the 1999 Pacific Symposium World Scientific Publishing, Singapore, 1999 Example literature data sets (non-HTS data) Estrogen receptor (Gao

21、et al.; J. Chem. Info. Comput. Sci., 1999, 36) O-acyltransferase (ACAT) (Labute et. al.; in press) Example industrial data sets (HTS assays) ArQule: 24,000 cpds. 200 active, 93% on inactives, 60% on actives Pharmacopeia: 24,000 cpds. 90% on inactives, 90% on actives SmithKline Beecham: 80,000 cpds.

22、100 active, 90% on actives Best success story: Pharmacia & Upjohn Binary QSAR model used to select building blocks in combi-chem library Improved activity from M to nM (factor of 1000),13,Combined Design Model for HTD Cycle,Use Binary QSAR method twice, once for activity model and once for drugabili

23、ty model Train drugability model Pr (D | X) on WDI/ACD for drug-like/non-drug-like or on specific data sets (e.g., blood-brain barrier permeability) Complete model of activity and drugability is the product Pr(D | X) Pr(Y | X) which approximates Pr(D, Y | S),ADME Model,Activity Model,Library Design,

24、Binary QSAR,BioAssay,Design Model,Combinatorial Library,HTS Data,Drugability Data (e.g., BBB or drug-like),Binary QSAR,Representation of Chemical Structures (Descriptors),15,A Brief History of QSAR,Original philosophy (Hansch & Leo): Use a fixed set of meaningful molecular properties to describe a w

25、ide variety of biological phenomena Linear regression used to determine SAR The determination of linear relationships is basic science Statistical regression framework used to assess significance of SAR Proliferation of descriptors Early successes lead to introduction of a vast array of descriptors

26、In principle, any number calculable from a chemical structure can be used as a molecular descriptor for SAR determination Over-determination of SAR Multitude of descriptors lead to need for schemes for variable elimination 3D methods treat each grid-point in field representation as a descriptor,16,F

27、undamental Notions,Use a fixed set of descriptors for diversity and QSAR/QSPR A meaningful chemistry space should not require customization In QSAR/QSPR automatic variable selection can be dangerous Make direct use of Hansch & Leo thinking (build on their experience) Model 3D properties from 2D (con

28、nectivity) information 3D information from 2D connectivity = 2 D descriptors HTS QSAR and large-scale diversity require fast calculation times 2D topological descriptors too weak, 3D descriptors too expensive Use approximate atomic surface areas as fundamental representation Complement substructure

29、keys (stay property-based for class-hopping) Intended applications QSAR/QSPR models - linear and nonlinear - early and late in project Chemistry space for library design,17,Exposed Van der Waals Surface Area (VSA),Calculate exposed Van der Waals surface area for each atom by subtracting off surface

30、area inside neighbors Correction factors to sphere formula depend on atomic radii and inter-atomic distances,4r2,4r2-CA,4r2 -CA -CB,A,B,A,r,18,Connection Table VSA Calculation,Neglect Non-bonded neighbors (small molecules have little NB contact) Interaction between angles (1-3 interactions) Stretch

31、of bond lengths (use ideal bond length) Parameters Radii: Van der Waals (or solvation) Inter-atomic distances: Ideal bond lengths Define Vi to be the exposed VSA of atom i.,r,s,d,A,19,Quality of Approximate VSA Calculation,Data set of 1,947 conformations MOE 2D 3D converter, MMFF94 force field, 0.01

32、 RMS gradient Molecular weights in 300,1600 range VDW Surface Area 3D dot calculation Accuracy r = 0.9856 r2 = 0.9666 10% error Largest errors on steroids an other fused ring systems,20,Subdivision of VSA by Properties,Given an atomic property value Pi for each atom i O2 1.2 C3 4.5 C4 5.9 N7 0.2 Bin

33、 Pi by ranges and sum ViVi values: Pi range: 0,1) 1,2) 2,3) 3,4) 4,5) 5,6) Descriptors: D1 D2 D3 D4 D5 D6,V1,V2,V3,V7,V4,+ V5,V6,+ V8,21,8 Molar Refractivity Descriptors,Wildman & Crippen SMR model of Molar Refractivity Specific attention paid to calculation of atomic contributions Protonation state

34、 taken as-is from structure (specific species) Property bins trained derived from 50,000 structures 8 descriptors result: SMR_VSAk Each bin is approximately equally populated over training set,Wildman,S.A., Crippen,G.M. Prediction of Physiochemical Parameters by Atomic Contributions. J. Chem. Inf. C

35、omput. Sci., 39(5), 868-873 (1999).,22,10 LogP (octanol/water) Descriptors,Wildman & Crippen SlogP model of LogP Specific attention paid to calculation of atomic contributions Protonation state taken as-is from structure (specific species) Property bins trained derived from 50,000 structures 10 desc

36、riptors: SlogP_VSAk Each bin is approximately equally populated over training set,Wildman,S.A., Crippen,G.M. Prediction of Physiochemical Parameters by Atomic Contributions. J. Chem. Inf. Comput. Sci., 39(5), 868-873 (1999).,23,SMR_VSA and SlogP_VSA Inter-correlation,Correlation Analysis SMR SlogP d

37、escriptors weakly correlated Test made on 2000 small molecules not used in definition of descriptors Displayed values are r values (not r2) Descriptors encode “orthogonal” molecular properties,24,14 Partial Charge Descriptors,Gasteiger (PEOE) partial charge model Approximation to local pKa Electrost

38、atic interactions Similar to Jurs descriptors 14 descriptors result from uniform interval boundaries Weak correlation,Stanton D., Jurs, P. Anal. Chem. 62, 2323 (1990)Gasteiger,J., Marsali. Iterative Partial Equalization of Orbital Electronegativity - A Rapid Access to Atomic Charges. Tetrahedron. Vo

39、l. 36, p3219 (1980),25,Encoding of Traditional Descriptors,Traditional descriptors modeled with VSA descriptors 1,932 small organic molecules with weights in (28,800) SlogP_VSA, SMR_VSA and PEOE_VSA descriptors calculated Principal components regression models for 64 traditional descriptors,chi0 0.9

40、9 chi0v_C 0.97 b_ar 0.89 b_1rotN 0.78 Kier1 0.99 KierA1 0.97 Kier2 0.89 b_double 0.77 vdw_area 0.99 a_hyd 0.96 vsa_pol 0.89 b_rotN 0.77 vdw_vol 0.99 a_nC 0.96 vsa_acc 0.88 a_ICM 0.73 vsa_hyd 0.99 a_nH 0.96 diameter 0.87 vsa_don 0.73 a_count 0.98 a_nO 0.95 VadjEq 0.87 KierFlex 0.69 a_heavy 0.98 b_hea

41、vy 0.95 a_nN 0.86 balabanJ 0.61 a_IC 0.98 chi1_C 0.95 KierA2 0.86 a_nP 0.60 apol 0.98 chi1v_C 0.95 radius 0.86 Kier3 0.57 b_count 0.98 SlogP 0.95 VdistMa 0.86 a_nCl 0.56 chi0v 0.98 a_acc 0.94 wienPath 0.85 KierA3 0.55 chi1 0.98 chi1v 0.94 wienPol 0.84 a_nS 0.53 SMR 0.98 Weight 0.93 VadjMa 0.82 b_1ro

42、tR 0.50 b_single 0.97 a_aro 0.91 VdistEq 0.82 density 0.49 bpol 0.97 a_don 0.91 vsa_oth 0.82 b_rotR 0.48 chi0_C 0.97 zagreb 0.91 a_nF 0.80 b_triple 0.46,26,Boiling Point,Data set Exp. boiling point (K) 298 small molecules 18 descriptors: SlogP_VSA(10), SMR_VSA(8) PCA regression r2 = 0.96, RMSE = 15.

43、53 Leave-one-out: r2 = 0.94, RMSE = 21.37 Random leave-100-out: r2 = 0.94,27,Free Energy of Solvation in Water,Data set Exp. Gs (kcal/mol) 291 small molecules 12 descriptors: PEOE_VSA(3), SlogP_VSA(7), SMR_VSA(2) PCA regression r2 = 0.90, RMSE = 0.78 Leave-one-out: r2 = 0.89, RMSE = 0.82 Random leav

44、e-100-out: r2 = 0.88,Viswanadhan, V.N., Ghose, A.K., Singh, U.C., Wendoloski, J.J.; Prediction of Solvation Free Energies of Small Organic Moleucles: Additive-Constitutive Models Based on Molecular Fingerprints and Atomic Constants; J. Chem. Inf. Comput. Sci., 39, 405-412 (1999),28,Thermodynamic Sol

45、ubility in Water,Data set Exp. logW at 25C 1,438 small molecules 32 Descriptors: SlogP_VSA (10), SMR_VSA (8), PEOE_VSA (14) PCA regression r2 = 0.75, RMSE = 2.4 Leave-one-out: r2 = 0.74, RMSE = 2.5,Syracuse Research Corporation, 6225 Running Ridge Road, North Syracuse, NY 13212. URL: http:/.,29,Vapo

46、r Pressure,Data set Exp. vapor pressure at 25C 1,771 small molecules 32 Descriptors: SlogP_VSA (10), SMR_VSA (8), PEOE_VSA (14) PCA regression r2 = 0.88, RMSE = 2.1 Leave-one-out: r2 = 0.87, RMSE = 2.2,Syracuse Research Corporation, 6225 Running Ridge Road, North Syracuse, NY 13212. URL: http:/.,30,

47、Compound Classification with Binary QSAR,Can Binary QSAR separate inhibitor classes using SLogP_VSAk and SMR_VSAk descriptors? Data: 455 compounds active against one of 7 targets Results (classification accuracy) Class 1: 98.7% p=0.003 Serotonin receptor ligands Class 2: 96.7% p=0.043 Benzodiazepine

48、 receptor ligands Class 3: 96.5% p=0.290 Carbonic anhydrase II inhibitors Class 4: 98.7% p=0.001 Cyclooxygenase-2 (Cox-2) inhibitors Class 5: 98.7% p=0.014 H3 antagonsists Class 6: 98.7% p=0.012 HIV protease inhibitors Class 7: 99.1% p=0.002 Tyrosine Kinase inhibitors,Labute,P. Binary QSAR: A New Me

49、thod for Quantitative Structure Activity Relationships. Proceedings of the 1999 Pacific Symposium World Scientific Publishing, Singapore (1999),31,Compound Classification with CART,Learning set for CART (recursive partitioning) 455 compounds active against one of 7 targets 1,942 “random” organic com

50、pounds SlogP_VSA, SMR_VSA descriptors Classification accuracy (32 node tree, depth 5) Class 1: 84.5% p=0.07 Serotonin receptor ligands Class 2: 49.1% p=0.30 Benzodiazepine receptor ligands Class 3: 92.5% p=0.27 Carbonic anhydrase II inhibitors Class 4: 96.8% p=0.01 Cyclooxygenase-2 (Cox-2) inhibitors Class 5: 82.7% p=0.03 H3 antagonsists Class 6: 85.4% p=0.02 HIV protease inhibitors Class 7: 91.4% p=0.01 Tyrosine Kinase inhibitors,

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1