Bayesian Probabilities.ppt
《Bayesian Probabilities.ppt》由会员分享,可在线阅读,更多相关《Bayesian Probabilities.ppt(41页珍藏版)》请在麦多课文档分享上搜索。
1、Bayesian Probabilities,Recall that Bayes theorem gives us a simple way to compute likelihoods for hypotheses and thus is useful for problems like diagnosis, recognition, interpretation, etc P(h | E) = p(E | h) * P(h) As we discussed, there are several problems when applying Bayes Bayes theorem works
2、 only when we have independent events We need too many probabilities to account for all of the combinations of evidence in E Probabilities are derived from statistics which might include some bias We can get around the first two problems if we assume independence (sometimes known as Nave Bayesian pr
3、obabilities) or if we can construct a suitable Bayesian network We can also employ a form of learning so that the probabilities more suit the problem at hand,Co-dependent Events,Recall from our sidewalk being wet example The two causes were that it rained or that we ran the sprinkler While the two e
4、vents do not seem to be dependent in fact we might not run the sprinkler if it is supposed to rain or we might run the sprinkler if we postponed running it because it might rain and it didnt Therefore, we cannot treat rain and sprinkler as independent events and we need to resolve this by using what
5、 is known as the probability chain rule P(A & B & C) = P(A) * P(B | A) * P(C | A & B) the & means “each item in the list happens” if we have 10 co-dependent events, say A1, , A10, then our last probability in the list is P(A10 | A1 & A2 & & A9),More on the Chain Rule,In order to apply the chain rule
6、, I will need a large number of probabilities Assume that I have events A, B, C, D, E P(A & B & C & D & E) = P(A) * P(B | A) * P(C | A & B) * P(D | A & B & C) * P(E | A & B & C & D) P(A & B & D & E) = P(A) * P(B | A) * P(D | A & B) * P(E | A & B & D) P(A & C & D) = P(A) * P(C | A) * P(D | A & C) etc
7、 So I will need P(event i | some combination of other events j) for all i and j if I have 5 co-dependent events, I need 32 conditional probabilities (along with 5 prior probabilities),Independence,Because of the problem of co-dependence, we might want to see if two events are independent For two eve
8、nts to be independent, if one event arises it does not impact the probability of the other event Consider for instance if you roll a 6 on a die, the probability of rolling a 6 on the same or another die should not change from 1 in 6 Consider drawing a red card from deck of cards and replacing it (pu
9、tting it back into the deck) will not change the probability that the next card drawn is also red However, after drawing the first red card, if you do not replace it, then the probability of drawing a second red card will change We say that two events, A and B, are independent if and only if P(A B)
10、= P(A)P(B),Continued,If we have independent events, then it simplifies what we need in order to compute our probabilities When reasoning about events A and B, if they are independent, we do not need probabilities of P(A), P(B) and P(A B), just P(A) and P(B) Similarly, if events A and B are independe
11、nt, then P(A | B) = P(A) because B being present or absent will have no impact on A With the assumption that events are independent in place, our previous equation of the form: P(H | e1 & e2 & e3 & ) = P(H) * P(e1 & e2 & e3 & | H) = P(H) * P(e1 | H) * P(e2 | e1 & H) * P(e3 | e1 & e2 & H) * can rewri
12、te a statement as P(H | e1 & e2 & e3 & ) = P(H) * P(e1 | H) * P(e2 | H) * P(e3 | H) * Thus, independence gets past the problem of needing an exponential number of probabilities,Nave Bayesian Classifier,Assume we have q data sets where each data set comprises some elements of the set d1, d2, , dn and
13、 each set qi has been classified into one of m categories c1, c2, , cm Given a new case = dj, dk, we compute the likelihood that the new case is in each of the categories as follows: P(ci | case) = P(ci) * P(dj | ci) * P(dk | ci) * for each i from 1 to m Where P(ci) is simply the number of times ci
14、occurs out of q And where P(dj | ci) is the number of times datum dj occurred in a data set that was classified as category ci P(ci | case) is a Nave Bayesian classifier (NBC) for category i This only works if the data making up any one case are independent this assumption is not necessarily true, t
15、hus the word “nave” The good news? This is easy!,Chain Rule vs. Nave Bayes,Lets consider an example We want to determine the probability that a person will have spoken or typed a particular phrase, say “the man bit a dog” We will compute the probability of this by examining several hundred or thousa
16、nd training sentences Using Nave Bayes: P(“the man bit a dog”) = P(“the”) * P(“man”) * P(“bit”) * P(“a”) * P(“dog”) We compute this just by counting the number of times each word occurs in the training sentences Using the chain rule P(“the man bit the dog”) = P(“the”) * P(“man” | “the”) * P(“bit” |
17、“the man”) * P(“a” | “the man bit”) * P(“dog” | “the man bit a”) P(“bit” | “the man”) is the number of times the word “bit” followed “the man” in all of our training sentences The probability computed by the chain rule is far smaller but also much more realistic, so using Nave Bayes should be done o
18、nly with caution and the foreknowledge that we can make an independence assumption,Spam Filters,One of the most common uses of a NBC is to construct a spam filter the spam filter works by learning a “bag of words” that is, the words that are typically associated with spam we take all of the words of
19、 every email message and discard any common words (I, of, the, is, etc) Now we “train” our spam filter by computing these probabilities: P(spam) the number of emails that were spam out of the training set P(!spam) the number of emails that were not spam out of the training set P(word1 | spam) the nu
20、mber of times word1 appeared in spam emails P(word1 | !spam) the number of times word 1 appears in non spam emails And so forth for every non common word,Using Our Spam Filter,A new email comes in Discard all common words Compute P(spam | words) and P(!spam | words) P(spam | word1 & word2 & & wordn)
21、 = P(spam) * P(word1 | spam) * P(word2 | spam) * * P(wordn | spam) P(!spam | word1 & word2 & & wordn) = P(!spam) * P(word1 | !spam) * P(word2 | !spam) * * P(wordn | !spam) Which probability is higher? That gives you your answer Without the nave assumption, our computation becomes: P(spam | word1, wo
22、rd2, , wordn) = P(spam) * P(word1, word2, , wordn | spam) = P(spam) * P(word1 | spam) * P(word2, , wordn | spam) = P(spam) * P(word1 | spam) * P(word1 & word2 | spam) * P( , wordn | spam) * English has well over 100,000 words but many are common, so that a spam filter may only deal with say 5,000 wo
23、rds but that would require 25000 probabilities if we did not use nave Bayes!,Classifier Example: Clustering,Clustering is used in data mining to infer boundaries for classes Here, we assume that we have already clustered a set of data into two classes We want to use NBC to determine if a new datum,
24、which lies in between the two clusters, is of one class or another We have two categories, we will call them red and green P(red) = # of red entries / # of total entries = 20 / 60 P(green) = # of green entries / # of total entries = 40 / 60 We add a new datum (shown in white in the figure) and ident
25、ify which class it is most likely a part of P(x | green) = # of green entries nearby / # of green entries = 1 / 40 P(x | red) = # of red entries nearby / # of red entries = 3 / 20,P(x is green | green entries) = p(green) * P(x | green) = 40 / 60 * 1 / 40 = 1 / 60 P(x is red | red entries) = p(red) *
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
本资源只提供5页预览,全部文档请下载后查看!喜欢就下载吧,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- BAYESIANPROBABILITIESPPT
