Bayesian Probabilities.ppt

上传人：eveningprove235

文档编号：378871

上传时间：2018-10-09

格式：PPT

页数：41

大小：604.50KB

《Bayesian Probabilities.ppt》由会员分享，可在线阅读，更多相关《Bayesian Probabilities.ppt（41页珍藏版）》请在麦多课文档分享上搜索。

1、Bayesian Probabilities,Recall that Bayes theorem gives us a simple way to compute likelihoods for hypotheses and thus is useful for problems like diagnosis, recognition, interpretation, etc P(h | E) = p(E | h) * P(h) As we discussed, there are several problems when applying Bayes Bayes theorem works

2、 only when we have independent events We need too many probabilities to account for all of the combinations of evidence in E Probabilities are derived from statistics which might include some bias We can get around the first two problems if we assume independence (sometimes known as Nave Bayesian pr

3、obabilities) or if we can construct a suitable Bayesian network We can also employ a form of learning so that the probabilities more suit the problem at hand,Co-dependent Events,Recall from our sidewalk being wet example The two causes were that it rained or that we ran the sprinkler While the two e

4、vents do not seem to be dependent in fact we might not run the sprinkler if it is supposed to rain or we might run the sprinkler if we postponed running it because it might rain and it didnt Therefore, we cannot treat rain and sprinkler as independent events and we need to resolve this by using what

5、 is known as the probability chain rule P(A & B & C) = P(A) * P(B | A) * P(C | A & B) the & means “each item in the list happens” if we have 10 co-dependent events, say A1, , A10, then our last probability in the list is P(A10 | A1 & A2 & & A9),More on the Chain Rule,In order to apply the chain rule

7、 So I will need P(event i | some combination of other events j) for all i and j if I have 5 co-dependent events, I need 32 conditional probabilities (along with 5 prior probabilities),Independence,Because of the problem of co-dependence, we might want to see if two events are independent For two eve

8、nts to be independent, if one event arises it does not impact the probability of the other event Consider for instance if you roll a 6 on a die, the probability of rolling a 6 on the same or another die should not change from 1 in 6 Consider drawing a red card from deck of cards and replacing it (pu

9、tting it back into the deck) will not change the probability that the next card drawn is also red However, after drawing the first red card, if you do not replace it, then the probability of drawing a second red card will change We say that two events, A and B, are independent if and only if P(A B)

10、= P(A)P(B),Continued,If we have independent events, then it simplifies what we need in order to compute our probabilities When reasoning about events A and B, if they are independent, we do not need probabilities of P(A), P(B) and P(A B), just P(A) and P(B) Similarly, if events A and B are independe

12、te a statement as P(H | e1 & e2 & e3 & ) = P(H) * P(e1 | H) * P(e2 | H) * P(e3 | H) * Thus, independence gets past the problem of needing an exponential number of probabilities,Nave Bayesian Classifier,Assume we have q data sets where each data set comprises some elements of the set d1, d2, , dn and

13、 each set qi has been classified into one of m categories c1, c2, , cm Given a new case = dj, dk, we compute the likelihood that the new case is in each of the categories as follows: P(ci | case) = P(ci) * P(dj | ci) * P(dk | ci) * for each i from 1 to m Where P(ci) is simply the number of times ci

14、occurs out of q And where P(dj | ci) is the number of times datum dj occurred in a data set that was classified as category ci P(ci | case) is a Nave Bayesian classifier (NBC) for category i This only works if the data making up any one case are independent this assumption is not necessarily true, t

15、hus the word “nave” The good news? This is easy!,Chain Rule vs. Nave Bayes,Lets consider an example We want to determine the probability that a person will have spoken or typed a particular phrase, say “the man bit a dog” We will compute the probability of this by examining several hundred or thousa

16、nd training sentences Using Nave Bayes: P(“the man bit a dog”) = P(“the”) * P(“man”) * P(“bit”) * P(“a”) * P(“dog”) We compute this just by counting the number of times each word occurs in the training sentences Using the chain rule P(“the man bit the dog”) = P(“the”) * P(“man” | “the”) * P(“bit” |

17、“the man”) * P(“a” | “the man bit”) * P(“dog” | “the man bit a”) P(“bit” | “the man”) is the number of times the word “bit” followed “the man” in all of our training sentences The probability computed by the chain rule is far smaller but also much more realistic, so using Nave Bayes should be done o

18、nly with caution and the foreknowledge that we can make an independence assumption,Spam Filters,One of the most common uses of a NBC is to construct a spam filter the spam filter works by learning a “bag of words” that is, the words that are typically associated with spam we take all of the words of

19、 every email message and discard any common words (I, of, the, is, etc) Now we “train” our spam filter by computing these probabilities: P(spam) the number of emails that were spam out of the training set P(!spam) the number of emails that were not spam out of the training set P(word1 | spam) the nu

20、mber of times word1 appeared in spam emails P(word1 | !spam) the number of times word 1 appears in non spam emails And so forth for every non common word,Using Our Spam Filter,A new email comes in Discard all common words Compute P(spam | words) and P(!spam | words) P(spam | word1 & word2 & & wordn)

23、rds but that would require 25000 probabilities if we did not use nave Bayes!,Classifier Example: Clustering,Clustering is used in data mining to infer boundaries for classes Here, we assume that we have already clustered a set of data into two classes We want to use NBC to determine if a new datum,

24、which lies in between the two clusters, is of one class or another We have two categories, we will call them red and green P(red) = # of red entries / # of total entries = 20 / 60 P(green) = # of green entries / # of total entries = 40 / 60 We add a new datum (shown in white in the figure) and ident

25、ify which class it is most likely a part of P(x | green) = # of green entries nearby / # of green entries = 1 / 40 P(x | red) = # of red entries nearby / # of red entries = 3 / 20,P(x is green | green entries) = p(green) * P(x | green) = 40 / 60 * 1 / 40 = 1 / 60 P(x is red | red entries) = p(red) *

26、 P(x | red) = 20 / 60 * 3 / 20 = 1 / 20,Bayesian Network,Thus far, our computations have been based on a single mapping from input to output What if our reasoning process is more involved? for instance, we have symptoms that map to intermediate conclusions that map to disease hypotheses? or we have

27、an explicit causal chain? We can form a network of the values that make up the chain of reasoning, these values are our random variables and each variable will be represented as a node the network will be a directed acyclic graph nodes will point from cause to effect, and we hope that the resulting

28、directed graph is acyclic (although as we will see this may not be the case) we use prior probabilities for the initial nodes and conditional probabilities to link those nodes to other nodes,Simple Example,Below we have a simple Bayesian network arranged as a DAG Our causes are construction and/or a

29、ccident and our effects are orange barrels, bad traffic and/or flashing lights Each variable (node) will either be true or false Given the input values of whether we see B, T or L, we want to compute the likelihood that the cause was either C or A notice that T can be caused by either C or A or both

30、 while this makes our graph somewhat more complicated than a linear graph, it does not contain a cycle,Computing the Cause,We use the chain rule to compute the probability of a chain of states being true (C, A, B, T, L) (we address in a bit what we really want to compute, not this particular chain)

31、p(C, A, B, T, L) = p(C) * p(A | C) * p(B | C, A) * p(T | B, C, A) * p(L | C, A, B, T) where p(C) is a prior probability and the others are conditional probabilities with 5 items, we need 25 = 32 conditional probabilities We can reduce some of the above terms B has nothing to do with A so that p(B |

33、ld be simplified, however since two causes feed into the same effect, this may not be wise If we did choose to implement this as p(T | C) * p(T | A) then we are assuming independence and we solve this with Nave Bayes,A Nave Bayes Example Network,Assume we have one cause and m effects as shown in the

34、 figure below Compute p(Y) and p(!Y) Number of times Y occurs in the data and the number of times Y does not occur out of n data Compute p(Xi | Y) Number of times Xi occurs when Y is true for each i from 1 to m Given some collection V, a subset of X1, X2, , Xm, then p(Y | V) = p(Y) * p(vi | Y) for e

35、ach vi in V Lets examine an example Y student is a grad student X1 student taking CSC course X2 student works X3 student is serious (dedicated) X4 student calls prof by first name,Example Continued,The University has 15,000 students of which 1,500 are graduate students p(Y) = 1500/15000 = .10 Out of

36、 1500 graduate students, 60 are taking CSC courses P(CSC | grad student) = 60 / 1500 = .04 P(!CSC | grad student) = 1450 / 1500 = .96 Out of 1500 graduate students, 1250 work full time P(work | grad student) = 1250 / 1500 = .83 P(!word | grad student) = 250 / 1500 = .17 Out of 1500 graduate students

37、, 1400 are serious about their studies P(serious | grad student) = 1400 / 1500 = .93 P(!serious | grad student) = 100 / 1500 = .07 Out of 1500 graduate students, 750 call their profs by their first names P(first name | grad student) = 750 / 1500 = .5 P(!first name | grad student) = 750 / 1500 = .5,E

38、xample Continued,NOTE: we will similarly have statistics for the non graduate students (see below) A given student works, is in CSC, is serious but does not call his prof by the first name p(CSC | !grad student) = 250 / 13500 = .02 p(work fulltime | !grad student) = 2500 / 13500 = .19 p(serious | !g

39、rad student) = 5000 / 13500 = .37 p(!first name | !grad student) = 12000 / 13500 = .89 What is the probability that the student is a graduate student? p(grad student | works & CSC & serious & !first name) = p(grad student) * p(works | grad student) * p(CSC | grad student) * p(serious | grad student)

41、ore we can conclude the student is a grad student,A Lengthier Example,The network below shows the cause and effects of two independent situations an earthquake (which can cause a radio announcement and/or your alarm to go off) and a burglary (which can cause your alarm to go off) if your alarm goes

43、 | E) What about neighbor call? Notice that it is dependent on alarm which is dependent on earthquake and burglary,Conditional Independence,In our previous example, we saw that neighbor calling was not independent of burglary or earthquake and therefore, a joint probability p(C, A, R, E, B) will be

44、far more complicated However, in such a case, we can make the node (neighbor calling) conditionally independent of burglary and earthquake if we are given either alarm or !alarm,On the left, A and B are independent of each other unless we instantiate C, so that A is dependent on B given C is true (o

45、r false)On the right, A and B are dependent unless we instantiate C, so A and B are independent given C is true (or false),Example,Here, age and gender are independent and smoking and exposure to toxins are independent if we are given age Next, smoking is dependent on both age and gender and cancer

46、is dependent on both exposure and smoking and theres nothing we can do about that But, serum calcium and lung tumor are independent given cancer So, given age and cancer p(A, G, E, S, C, L, SC) = p(A) * p(G) * p(E | A) * p(S | A, G) * p(C | E, S) * p(SC | C) * p(L | C),Instantiating Nodes,What does

47、it mean in our previous example that “given age” and “given cancer”? For given cancer, that just means that we are assuming cancer = true We in fact will instantiate cancer to false and compute the chain again to see which is more likely, this will tell us the probability that the patient has cancer

48、 and the probability that the patient does not have cancer In this way, we can use the Bayesian network to compute for a given node or set of nodes, which values are the most likely Since age is not a boolean variable, we will have a series of probabilities for different categories of age E.g., p(ag

49、e 18), p(18 = age 30), p(30 = age 55), along with conditional probabilities for age, e.g., p(smoking | age 18), p(smoking | 18 = age 30), ,What Happens With Dependencies?,In our previous example, what if we do not know the age or if the patient has cancer? If we cannot instantiate those nodes, we ha

50、ve cycles Lets do an easy example first Recall our “grass is wet” example, we said that running the sprinkler was not independent of rain,The Bayesian network that represents this domain is shown to the right, but it has a cycle, so there is a dependence, and if we do not know if we ran the sprinkler or not, we cannot remove that dependence What then should we do?,

下载提示：本站仅提供存储空间/不修改/不编辑