1、A Hidden Markov Model for Protein Secondary Structure Prediction,Wei-Mou Zheng Institute of Theoretical Physics Academia Sinica PO Box 2735, Beijing 100080 ,Outline,Protein structure A brief review of secondary structure prediction Hidden Markov model: simple-minded Hidden Markov model: realistic Di
2、scussion References,Protein sequences are written in 20 letters (20 Naturally-occurring amino acid residues): AVCDE FGHIW KLMNY PQRSTHydrophobicCharged+-Polar,Cis-,Trans-,Residues form a directed chain,Rasmol ribbon diagram of GB1 Helix (pink), sheets (yellow) and coil (grey) Hydrogen-bond network3D
3、 structure secondary structure written in three letters:H, E, C. H: E: C = 34.9: 21.8: 43.3,Bayes formulaCount of Generally, P(x, y) = P(x|y)P(y),Protein sequence A, ai, i=1,2,n Secondary structure sequence S, si, i=1,2,nSecendary structure prediction: 1D amino acid sequences 1D secondary structure
4、sequence An old problem for more than 30 years Inference of S from A: P(S |A )1. Simple Chou-fasman approachChou-Fasmans propensity of amino acid to conformational state+ independence approximation,Parameter Training Propensities q(a,s)Counts (20x3) from a database: N(a, s)sum over a N(s),sum over s
5、 N(a),sum over a and s Nq(a,s) = N(a,s) N / N(a) N(s).,2. Garnier-Osguthorpe-Robson (GOR) window versionConditional Independency Weight matrix (20x17)x3 P(W|s) 3. Improved GOR (20x20x16x3, to include pair correlation),Hidden Markov Model (HMM): simple-minded Bayesian formula: P(S|A) = P(S,A)/P(A) P(
6、S,A) = P(A|S) P(S) Simple version emitting ai at si Markov chain according to P(a|s) For hidden sequenceForward and backward functions,s1,s2,s3,a1,a2,a3,Initial conditions and recursion relationsPartition functionLinear algorithm: Dynamic programmingBaum-Welch (sum) & Viterbi (max),Prob(si=s, si+1=s
7、) = Ai(s) tss P(ai+1|s) Bi+1(s)/ZProb(si:j),Hidden Markov Model: Realistic 1) Strong correlation in conformational states: at least two consicutive E and three consicutive Hrefined conformational states (243 75) 2) Emission probabilities improved window scores Proportion of accurately predicted site
8、s 70% (compared with 65% for prediction based on a single sequence)No post-prediction filteringIntegrated (overall) estimation of refined conformation statesMeasure of prediction confidence,Discussions,HMM using refined conformational states and window scores is efficient for protein secondary struc
9、ture prediction. Better score system should cover more correlation between conformation and sequence. Combining homologous information will improve the prediction accuracy. From secondary structure to 3D structure (structure codes: discretized 3D conformational states),ReferencesLawrence R Rabiner,
10、A tutorial on hidden Markov models and selected appllications in speech recognition Proceeding of the IEEE, 77 (1989) 257-286Burkhard Rost Protein Secondary Structure Prediction Continues to Rise Journal of Structural Biology 134, 204218 (2001),The End,Small,Hydrophobic,Polar,Aromatic,Aliphatic,Positive,Negative,Tiny,P,V,I,F,Y,W,T,L,H,K,R,D,G,A,C,S,N,E,Q,M,