Applying the Hidden Markov Model Methodology for .ppt

资源描述

1、AIDA01-1,Applying the Hidden Markov Model Methodology for Unsupervised Learning of Temporal Data,Cen Li1 and Gautam Biswas21Department of Computer Science Middle Tennessee State University2Department of EECS Vanderbilt University Nashville, TN 37235. USA. biswasvuse.vanderbilt.edu biswashttp:/www.vu

2、se.vanderbilt.edu/biswas June 21, 2001,AIDA01-2,Problem Description,Most real world systems are dynamic, e.g., Physical plants and Engineering systems Human Physiology Economic systems Systems are complex: hard to understand, model, and analyze But our ability to collect data on these systems has in

3、creased tremendouslyTask: Use data to automatically build models, extend incomplete models, and verify and validate existing models using the data available.Why models?Formal, abstract representation of phenomena or processEnables systematic analysis and prediction Our goal:Build models, i.e., creat

4、e structure from data using exploratory techniquesChallenge:Systematic and useful clustering algorithms for temporal data,AIDA01-3,Outline of Talk,Example Problem Related work on temporal data clustering Motivation for using Hidden Markov Model (HMM) representation Bayesian HMM clustering methodolog

5、y Experimental results Synthetic Data Real world ecological data Conclusions and future work,AIDA01-4,Example of Dynamic System,FiO2,RR,MAP,PaO2,Temporal features: FiO2, RR, MAP, PaO2, PIP, PEEP, PaCO2, TV .,ARDS patients response to respiratory therapy:,Goal : To construct profiles of patient respo

6、nse patterns for better understanding the effects of various therapies,AIDA01-5,Problem Description,Unsupervised Classification (Clustering)Given data objects described by multiple temporal features:(1) to assign category information to individual objects by objectively partitioning the objects into

7、 homogeneous groups such that the within group object similarity and the between group object dissimilarity are maximized,(2) to form succinct description for each category derived.,AIDA01-6,Related Work on Temporal Data Clustering,AIDA01-7,Motivation for using HMM Representation,What are our modeli

8、ng objectives ? Why do we choose the HMM representation ? The hidden states valid stages of a dynamic process Direct probabilistic links probabilistic transitions among different stages,Mathematically HMM Model made up of three components : initial state probabilities A: transition matrix B: emissio

9、n matrix ,AIDA01-8,Continuous Speech Signal,Rabiner and Huang, 1993,Spectral sequence feature extraction statistical pattern recognitionIsolated Word Recognition 1 HMM per word 2-10 states per HMM direct continuous density HMMs produce best resultsContinuous Word Recognition Number of words (?)Utter

10、ance boundaries (?) Word boundaries: fuzzy/non unique Matching word reference patterns to Words: exponential problem,Whole word models typically not used for continuous speech recognition,AIDA01-9,Continuous Speech Recognizer using sub-words,Recognized Sentence,Sentence (Sw):,W1,W2,silence,silence,W

11、ord (W1):,U1(W1),U2(W1),UL(W1)(W1),Sub-word Unit (PLU):,Rabiner and Huang, 1993,AIDA01-10,Continuous Speech Recognition Characteristics,Lot of background knowledge available, e.g., Phonemes & syllables for isolated word recognition Sub-word units (Phonelike units (PLUs), Syllable-like units, Acousti

12、c units) for word segmentation Grammar and semantics for language models As a result, HMM structure usually well-defined Recognition task dependent on learning model parameters Clustering techniques employed to derive efficient state representation Single speaker versus speaker independent recogniti

13、on Question: What about domains where structure not well known & Data not as well-defined,AIDA01-11,Past Work on HMM Clustering,Past Work: Rabiner et al., 1989 HMM Likelihood Clustering, HMM Threshold Clustering Baum Welch, Viterbi methods for parameter estimation Lee, 1990 Agglomerative procedure B

14、ahl et al., 1986, Normandin et al., 1994 HMM parameter estimation by Maximal Mutual Information (MMI) Kosaka et al. 1995 HMnet composition and clustering (CCL); Bhattacharyya distance measure Dermatas & Kokkinakis, 1996 Extended Rabiner et al.s work more sophisticated clustering structure clustering

15、 by recognition error Smyth, 1997 Clustering with finite mixture models Limitations: No objective criterion for cluster partition evaluation and selection Apply uniform, pre-defined HMM model size for clustering,Applied to Speech Recognition,AIDA01-12,Original work on HMM parameter estimation (Rabin

16、er, et al., 1989),Maximum Likelihood Method (Baum Welch): Parameter estimation by the E(xpectation)-M(aximization) method Assign initial parameters E-step: compute expected values of the necessary statistics M-step: update model parameter values to maximize likelihood Use forward-backward procedure

17、to cut down on the computational requirements. Baum Welch method very sensitive to initialization: use Viterbi method to find most likely path, and use that to initialize parameters. Extensions to method (estimate model size): state splitting, state merging, etc.,AIDA01-13,AIDA01-14,Our Approach to

18、HMM Clustering,Four nested search loops:Loop 1: search for the optimal number of clusters in the partition, Loop 2: search for the optimal object distribution to the clusters in the partition, Loop 3: search for the optimal HMM model size for each cluster in the partition, and Loop 4: search for the

19、 optimal model parameter configuration for each model .Search space:,HMM Learning,AIDA01-15,Introducing Heuristics to HMM Clustering,Four nested levels of search:Loop 1: search for the optimal number of clusters in the partition,Loop 2: search for the optimal object distribution to clusters in the p

20、artition,Loop 3: search for the optimal HMM model size for each cluster in the partition, andLoop 4: search for the optimal model parameter configuration for each model .,Bayesian Model Selection,HMM Learning,AIDA01-16,Bayesian Model Selection,(Marginal Likelihood),Model Posterior Probability:,Goal:

21、 Select Model M that maximizes P(M|D),AIDA01-17,Computing Marginal Likelihhods,Issue of complete versus incomplete dataApproximation Techniques Monte Carlo methods (Gibbs sampling) Candidate Method (variation of Gibbs sampling) Chickering Laplace approximation (multivariate Gaussian, quadratic Taylo

22、r series approx.)Bayesian Information Criteria derived from Laplace approx. Very efficientCheeseman-Stutz approx.,AIDA01-18,Marginal Likelihood Approximations,Bayesian Information Criterion (BIC):Cheeseman-Stutz (CS) Approximation:,Likelihood,Model ComplexityPenalty,Likelihood,Model PriorProbability

23、,AIDA01-19,Introducing Heuristics to HMM Clustering,Four nested levels of search: Step 1: search for the optimal number of clusters in the partition,Step 2: search for the optimal object distribution to clusters in the partition,Step 3: search for the optimal HMM model size for each cluster in the p

24、artition, andStep 4: search for the optimal model parameter configuration for each model .,Bayesian Model Selection,AIDA01-20,Bayesian Model Selection for HMM Model Size Selection,Goal: To select the optimal number of states, K, for a HMMthat maximizes,AIDA01-21,BIC and CS for HMM Model Size Selecti

25、on,(b) CS Approximation,likelihood,(a) BIC approximation,CS,prior,penalty,likelihood,BIC,2 3 4 5 6 7 8 9 10,2 3 4 5 6 7 8 9 10,AIDA01-22,HMM Model Size Selection Process,Sequential search for HMM model size with a heuristic stopping criterion - Bayesian model selection criterion approximated by BIC

26、and CS Issue 2: State initialization by K-means clustering,Assumptions:Data CompleteData Sufficient otherwise, suboptimal structure,AIDA01-23,Introducing Heuristics to HMM Clustering,Four nested levels of search: Step 1: search for the optimal number of clusters in the partition,Step 2: search for t

27、he optimal object distribution to clusters in the partition,Step 3: search for the optimal HMM model size for each cluster in the partition, andStep 4: search for the optimal model parameter configuration for each model .,Bayesian Model Selection,AIDA01-24,Cluster Partition Selection : Bayesian Mode

28、l Selection for Mixture of HMMs,Goal: Select partition with the optimal number of clusters, K,that maximizes P(M|D),Partition Posterior Probability (PPP),Apply BIC approximation,AIDA01-25,BIC and CS for Cluster Partition Selection,(a) BIC approximation,(b) CS approximation,penalty,prior,likelihood,l

29、ikelihood,BIC,CS,1 2 3 4 5 6 7 8 9 10,1 2 3 4 5 6 7 8 9 10,AIDA01-26,Bayesian Model Selection for Partition Evaluation and Selection,Sequential search for clustering partition model using a heuristic stopping criterion - Bayesian model selection criterion (approximated by BIC and CS),AIDA01-27,Fixed

30、 Size Cluster Partition Formation,Goal : to maximize:,Hard Data Partitioning,BIC criteria,AIDA01-28,Introducing Heuristics to HMM Clustering,Four nested levels of search:Step 1: search for the optimal number of clusters in the partition,Step 2: search for the optimal object distribution to clusters

31、in the partition,Step 3: search for the optimal HMM model size for each cluster in the partition, andStep 4: search for the optimal model parameter configuration for each model .,Bayesian Model Selection,EM with hard data partitioning & heuristicmodel initialization,Clustering HMM initialization,AID

32、A01-29,Bayesian HMM Clustering (BHMMC) with Component Model Size Selection,Complexity O(KNMmaxL),AIDA01-30,Experiments,Exp 1: Study the HMM model size selection methodExp 2: Compare HMM clustering with varying HMM sizes vs. HMM clustering with fixed HMMExp 3: Clustering robustness to data skewnessEx

33、p 4: Clustering sensitivity to seed selection,AIDA01-31,Experiment 1: Modeling with HMM,2,3,4,5,6,7,8,9,10,1,1,2,3,4,5,6,7,8,9,10,11,12,13,14,X,Y,y=kx+b, k= -1/k,S,1,3,S,2,S,S,4,y=kx+b,S,m,3,m,2,1,1,2,4,m,d,d(S , S )=g( ),d(S , S )=g( ),2,d(S , S ) = f( ),d,d,AIDA01-32,Experiment 1: Modeling with HM

34、M (continued),d,HMM model sizes rediscovered given 4-state HMMs of different d1 and d2 values: mean,d,d,=0.5,HMM Size Rediscovered,2,3,2,1,0.5,0.1,d,= 1,= 3,= 2,d,1,1,1,1,AIDA01-33,Experiment 2: Clustering with varying vs. fixed HMM model size,Partition Misclassification Count: mean (standard deviat

35、ion),AIDA01-34,Experiment 2 (continued),Between Partition Similarity,Varying,Fixed 3,Fixed 8,Fixed 15,AIDA01-35,Experiment 2 (continued),Varying,Fixed 3,Fixed 8,Fixed 15,Partition Posterior Probability,AIDA01-36,Experiment 3: BHMMC with regular vs. skewed data,Average Misclassification Count,Cluster

36、 1 - Cluster 2 %,Model distance = level 1,Model distance = level 2,AIDA01-37,Experiment 4: Clustering Sensitivity to Seed Selection,2,l,l,l,l,l,l,l,l,l,1,1,1,2,2,3,3,3,Data B :,l,l,l,l,l,1,1,1,2,3,l,2,l,3,l,3,2,l,Data A :,AIDA01-38,Experiment 4 (continued),Data B,Data B,Data A,Data A,1,2,3,1,2,3,Ave

37、rage over 10 runs,Variance on PPP(mean),Variance on misclassificationcount (mean),Level,Level,AIDA01-39,Experiments with Ecological Data collaborators: Dr. Mike Dale and Dr. Pat Dale School of Environmental Studies, Griffith University, Australia,Study the ecological effects of mosquito control by r

38、unneling, i.e., changing the drainage patterns in an area Data collected from about 30 sites in a salt marsh area south of Brisbane, Australia Data: At each site we have four measures of species performance Height and density of the grass Sporobolous (marine couch) Height and density of succulent Sa

39、rcocornia (samphire) 11 other environmental variables measured (they have missing values), but they were not used for cluster generation. Data collected in 3 month intervals starting mid 1985, each temporal data sequence is 56 long,AIDA01-40,Results of Experiments (1),Experiment 1: Generate HMM Clus

40、ters with all four plant variables. Result: Single dynamic process (one cluster partition) with 4 states that form a cycle.,Sporobolous Dense & Tall Sarcocornia Sparse,Sporobolous Sparse & Moderate Sarcocornia Short & Sparse,Sporobolous Sparse Sarcocornia Sparse & Tall,Sporobolous Almost absent Sarc

41、ocornia Dense & Moderate,Conclusions: Mosquito control treatment has no significant effects on plant ecology Some parts of the marsh shifted got wetter, and shifted more toward Sarcocornia,AIDA01-41,Results of Experiments (2),Experiment 2: Isolate and study the effects on the Sporobolous plantResult

42、: Two cluster partition: Cluster 1: 5 states with low transition probabilities (more or less static)Cluster 2: Two interacting dynamic cycles,S1 Sporobolous Dense & Tall,S5 Sporobolous Short & Dense,S2 Sporobolous Medium & Dense,S3 Sporobolous Sparse & Medium,S4 Sporobolous Almost Bare,0.9,0.9,0.45,

43、0.33,0.5,0.33,0.33,0.9,0.1,Loop 1,Loop 2,AIDA01-42,Results of Experiments (3),Summary of Results (Experiment 2) Two paths for getting Sporobolous to bare ground In loop 1 the grass recovers through state 2 In loop 2 the process may stay in state 3 (medium height) with equal chance of going bare (sta

44、te 4) or becoming short and dense (state 5). Shows modifications (new states) to the overall process Other observations Usually ecological systems have more memory than 3 months. May require higher order Markov processes Feature values may not be independent (consequences?) There was a drought in th

45、e middle of the observation period Amount of data available for parameter estimation limited may affect statistical validity of estimated parameters How do we incorporate the environmental variables into model? Some are discrete. They are at a different temporal granularity. Missing values.,AIDA01-4

46、3,Conclusions,Applied the Bayesian clustering methodology to work with mixture of CDHMMs Solved the HMM model size selection problem in the framework of Bayesian model selection Improved the quality of cluster partition and individual cluster models generated by introducing the HMM model size select

47、ion procedure into the clustering process Proposed a more effective HMM parameter initialization method Identified problems that may be common to other clustering tasks involving graphical models of undetermined sizes,AIDA01-44,Future Work,Improve clustering stability Evaluate partition quality in t

48、erms of the predictiveness of the cluster models Investigate incremental Bayesian HMM clustering approaches Enhance HMM of dynamic processes (higher order models, non stationary processes) Apply this methodology to real data, perform knowledge based model interpretation and analysis Ecological data Stock data Therapy response data,

展开阅读全文