A Multi-span Language Modeling Frame Work For Speech .ppt

上传人:boatfragile160 文档编号:377847 上传时间:2018-10-09 格式:PPT 页数:19 大小:158KB
下载 相关 举报
A Multi-span Language Modeling Frame Work For Speech .ppt_第1页
第1页 / 共19页
A Multi-span Language Modeling Frame Work For Speech .ppt_第2页
第2页 / 共19页
A Multi-span Language Modeling Frame Work For Speech .ppt_第3页
第3页 / 共19页
A Multi-span Language Modeling Frame Work For Speech .ppt_第4页
第4页 / 共19页
A Multi-span Language Modeling Frame Work For Speech .ppt_第5页
第5页 / 共19页
亲,该文档总共19页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述

1、A Multi-span Language Modeling Frame Work For Speech Recognition Jimmy Wang Speech Lab, NTU,Outline,1.Introduction. 2.N-gram Language Modeling. 3.Smoothing and Clustering of N-gram Language Model. 4.LSA Modeling. 5.Hybrid LSA+N-gram Language Model. 6.Conclusion.,INTRODUCTION, .劉邦友血案抓到一對象 劉邦友血案抓到一隊象

2、.水餃一碗多少錢睡覺一晚多少錢,INTRODUCTION,Stochastic Modeling of Speech Recognition :,INTRODUCTION,N-gram language modeling has been the the formalism of choice for ASR because of reliability, but can only constraint locally.For global constraints, parsing and rule-based grammar have been only successful in smal

3、l vocabulary application.,INTRODUCTION,N-gram+LSA (Latent Semantic Analysis) language models integrate local constraints via N-gram, and global constraints through LSA models.,N-gram Language Model,Assume each word depends only on the previous N-1 words (N words total).N-gram=N-1 order Markov Model.

4、P(象| 抓到一隊) P(象| 抓到 , 一隊). Perplexity:,N-gram Language Model,N-gram Training From Text Corpus: Corpus Size ranges from hundreds Mbytes to several Gbytes.Maximum Likelihood Approach:P(“the | nothing but”) C(“nothing but the”) / C(“nothing but”).,Smoothing and Clustering,Terrible on test data: If no oc

5、currences of C(xyz), probability is 0.Find 01 by optimizing on “held-out” data.,Smoothing and Clustering,CLUSTERING = Classes of (same things).P(Tuesday | party on) or P(Tuesday | celebration on)= P(WEEKDAY|EVENT)Put words in clusters: P(WEEKDAY|EVENT) WEEKDAY = Sunday, Monday, Tuesday,EVENT=party,

6、celebration, birthday.Clustering may lead to good result for verylittle training data.,Smoothing and Clustering,Word Clustering Methods:1.Build them by hand.2.Part of Speech (POS) tags.3.Automatic Clustering:Swap words betweenclusters to minimize perplexity. Automatic Clustering: 1.top-down splittin

7、g(Decision Tree): Fast. 2.bottom-up merging: Accurate .,LSA MODELING,Word Co-Occurrence Matrix: WV=vocabulary of size M. M=4000080000T=training corpus of N documents.N=80000100000Ci,j=Number of words Wi in document Dj.Nj=Total number of words in Dj. Ei=normalized entropy of Wi in the corpus T.,LSA M

8、ODELING,Vector Representation:SVD (Singular Value Decomposition) of W:U is MxR of vectors ui, represents words,S is RxR diagonal matrix of singular values, V is NxR of vectors vj, represents documents. Experiment on different values led to that R=100300 seemed to be adequate balanced.,LSA MODELING,L

9、anguage Modeling:Hq-1:overall history of current document Word-Clustered LSA model:This clustering takes the global context andhence more semantic information.,LSA+N-gram Language Model,Integration with N-grams:Maximum Entropy Estimation:Hq-1:overall history of n-gram componentand LSA component .,LS

10、A+N-gram Language Model,Context Scope Selection:In real case, the prior probability would change over time.So we need to define the current document history or limit the size of history considered. Exponential Forgetting:0 =1,LSA+N-gram Language Model,Initialization of V0 : In the beginning, we may

11、present the pseudo-document V0 as: 1.Zero vector. 2.Centroid vector of all training documents. 3.If the domain is known, then we start at the centroid of specific region in the LSA space.,CONCLUSION,Hybrid N-gram+LSA model performs much better than traditional N-gram in perplexity(25%) and WER(14%).

12、 LSA performs better in the within-domain testing data, and not so good for cross-domain testing. Discounting obsolete data using exponential forgetting can be better when the topics change incrementally.,CONCLUSION,LSA modeling is much more sensitive to “content words” than “function words”, which is a complement for N-gram modeling Provided suitable domain adaptation framework, the hybrid LSA+N-gram model should improve the perplexity and recognition rate further more.,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 教学课件 > 大学教育

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1