Tagging with Hidden Markov Models.ppt
《Tagging with Hidden Markov Models.ppt》由会员分享,可在线阅读,更多相关《Tagging with Hidden Markov Models.ppt(27页珍藏版)》请在麦多课文档分享上搜索。
1、Tagging with Hidden Markov Models,CMPT 882 Final Project Chris Demwell Simon Fraser University,The Tagging Task,Identification of the part of speech of each word of a corpusSupervised: Training corpus provided consisting of correctly tagged textUnsupervised: Uses only plain text,Hidden Markov Models
2、 1,Observable states (corpus text) generated by hidden states (tags) Generative model,Hidden Markov Models 2,Model: = A, B, A: State transition probability matrix ai,j = probability of changing from state i to state j B: Emission probability matrix bj,k = probability that word at location k is assoc
3、iated with tag j : Intial state probability i = probability of starting in state i,Hidden Markov Models 3,Terms in this presentationN: Number of hidden states in each column (distinct tags) T: Number of columns in trellis (time ticks) M: Number of symbols (distinct words) O: The observation (the unt
4、agged text) bj(t): The probability of emitting the symbol found at tick t, given state j t,j and t,j : The probability of arriving at state i in time tick t, given the observation before and after tick t (respectively),Hidden Markov Models 4,A is a NxN matrix B is a NxT matrix is a vector of size N,
5、1,2,a1,1,a1,2,b1,1,b1,2,Forward Algorithm,Used for calculating Likelihood quickly t,i: The probability of arriving at trellis node (t,j) given the observation seen “so far”. Initialization 1,i = i Induction,2,2,1,1,1,2,1,3,Backward Algorithm,Symmetrical to Forward Algorithm Initialization T,i =1 for
6、 all I Induction:,1,2,2,1,2,2,2,3,Baum-Welch Re-estimation,Calculate two new matrices of intermediate probabilities , Calculate new A, B, given these probabilities Recalculate and , p(O | ) Repeat until p(O | ) doesnt change much,HMM Tagging 1,Training Method Supervised Relative Frequency Relative F
7、requency with further Maximum Likelihood training Unsupervised Maximum Likelihood training with random start,HMM Tagging 2,Read corpus, take counts and make translation tables Train HMM using BW or compute HMM using RF Compute most likely hidden state sequence Determine POS role that each state most
8、 likely plays,HMM Tagging: Pitfalls 1,Monolithic HMM Relatively opaque to debugging strategies Difficult to modularize Significant time/space efficiency concerns Varied techniques for prior implementations Numerical Stability Very small probabilities likely to underflow Log likelihood Text Chunking
9、Sentences? Fixed? Stream?,HMM Tagging: Pitfalls 2,State role identification Lexicon giving p(tag | word) from supervised corpus Unseen words Equally likely tags for multiple states Local maxima HMM not guaranteed to converge on correct model Initial conditions Random Trained Degenerate,HMM Tagging:
10、Prior Work 1,Cutting et al. Elaborate reduction of complexity (ambiguity classes) Integration of bias for tuning (lexicon choice, initial FB values) Fixed-size text chunks, model averaging between chunks for final model 500,000 words of Brown corpus: 96% accurate after eight iterations,HMM Tagging:
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
本资源只提供5页预览,全部文档请下载后查看!喜欢就下载吧,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- TAGGINGWITHHIDDENMARKOVMODELSPPT
