Bayesian models of inductive learning.ppt
《Bayesian models of inductive learning.ppt》由会员分享,可在线阅读,更多相关《Bayesian models of inductive learning.ppt(92页珍藏版)》请在麦多课文档分享上搜索。
1、Bayesian models of inductive learning,Tom Griffiths UC Berkeley,Josh Tenenbaum MIT,Charles Kemp CMU,Outline,Morning 9:00-10:30: Introduction: Why Bayes?; Basics of Bayesian inference (Josh) 11:00-12:30: How to build a Bayesian cognitive model (Tom)Afternoon 1:30-3:00: Hierarchical Bayesian models an
2、d learning structured representations (Charles) 3:30-5:00: Monte Carlo methods and nonparametric Bayesian models (Tom),What you will get out of this tutorial,Our view of what Bayesian models have to offer cognitive science In-depth examples of basic and advanced models: how the math works & what it
3、buys you A sense for how to go about the process of building Bayesian models Some (not extensive) comparison to other approaches Opportunities to ask questions,The big question,How does the mind get so much out of so little?,Our minds build rich models of the world and make strong generalizations fr
4、om input data that is sparse, noisy, and ambiguous in many ways far too limited to support the inferences we make. How do we do it?,Learning words for objects,Learning words for objects,The big question,How does the mind get so much out of so little? Perceiving the world from sense data Learning abo
5、ut kinds of objects and their properties Inferring causal relations Learning and using words, phrases, and sentences Learning and using intuitive theories of physics, psychology, biology, Learning social structures, conventions, and rules,The goal: A general-purpose computational framework for under
6、standing how people makethese inferences, and how they can be successful.,The problem of induction,Abstract knowledge. (Constraints / Inductive bias / Priors),The problems of induction,1. How does abstract knowledge guide inductive learning, inference, and decision-making from sparse, noisy or ambig
7、uous data? 2. What is the form and content of our abstract knowledge of the world? 3. What are the origins of our abstract knowledge? To what extent can it be acquired from experience? 4. How do our mental models grow over a lifetime, balancing simplicity versus data fit (Occam), accommodation versu
8、s assimilation (Piaget)? 5. How can learning and inference proceed efficiently and accurately, even in the presence of complex hypothesis spaces?,A toolkit for reverse-engineering induction,Bayesian inference in probabilistic generative models Probabilities defined on a range of structured represent
9、ations: spaces, graphs, grammars, predicate logic, schemas, programs. Hierarchical probabilistic models, with inference at all levels of abstraction Models of unbounded complexity (“nonparametric Bayes” or “infinite models”), which can grow in complexity or change form as observed data dictate. Appr
10、oximate methods of learning and inference, such as belief propagation, expectation-maximization (EM), Markov chain Monte Carlo (MCMC), and sequential Monte Carlo (particle filtering).,Phrase structure S,Utterance U,Grammar G,P(S | G),P(U | S),P(S | U, G) P(U | S) x P(S | G),Bottom-up Top-down,Phrase
11、 structure,Utterance,Speech signal,Grammar,“Universal Grammar”,Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG),(Han and Zhu, 2006),Vision as probabilistic parsing,Principles,Structure,Data,Whole-object principle Shape bias Taxonomic principle Contrast principle Basic-level bias,Learnin
12、g word meanings,Causal learning and reasoning,Principles,Structure,Data,Goal-directed action (production and comprehension),(Wolpert et al., 2003),Why Bayesian models of cognition?,A framework for understanding how the mind can solve fundamental problems of induction. Strong, principled quantitative
13、 models of human cognition. Tools for studying peoples implicit knowledge of the world. Beyond classic limiting dichotomies: “rules vs. statistics”, “nature vs. nurture”, “domain-general vs. domain-specific” . A unifying mathematical language for all of the cognitive sciences: AI, machine learning a
14、nd statistics, psychology, neuroscience, philosophy, linguistics. A bridge between engineering and “reverse-engineering”.Why now? Much recent progress, in computational resources, theoretical tools, and interdisciplinary connections.,Outline,Morning Introduction: Why Bayes? (Josh) Basics of Bayesian
15、 inference (Josh) How to build a Bayesian cognitive model (Tom)Afternoon Hierarchical Bayesian models and learning structured representations (Charles) Monte Carlo methods and nonparametric Bayesian models (Tom),Bayes rule,Sum over space of alternative hypotheses,For any hypothesis h and data d,Baye
16、sian inference,Bayes rule: An example Data: John is coughing Some hypotheses:John has a coldJohn has lung cancerJohn has a stomach flu Prior P(h) favors 1 and 3 over 2 Likelihood P(d|h) favors 1 and 2 over 3 Posterior P(h|d) favors 1 over 2 and 3,Plan for this lecture,Some basic aspects of Bayesian
17、statistics Comparing two hypotheses Model fitting Model selection Two (very brief) case studies in modeling human inductive learning Causal learning Concept learning,Coin flipping,Basic Bayes data = HHTHT or HHHHH compare two hypotheses:P(H) = 0.5 vs. P(H) = 1.0 Parameter estimation (Model fitting)
18、compare many hypotheses in a parameterized familyP(H) = q : Infer q Model selection compare qualitatively different hypotheses, often varying in complexity:P(H) = 0.5 vs. P(H) = q,Coin flipping,HHTHT,HHHHH,What process produced these sequences?,Comparing two hypotheses,Contrast simple hypotheses: h1
19、: “fair coin”, P(H) = 0.5 h2:“always heads”, P(H) = 1.0 Bayes rule:With two hypotheses, use odds form,Comparing two hypotheses,D: HHTHT H1, H2: “fair coin”, “always heads” P(D|H1) = 1/25 P(H1) = ? P(D|H2) = 0 P(H2) = 1-?,Comparing two hypotheses,D: HHTHT H1, H2: “fair coin”, “always heads” P(D|H1) =
20、 1/25 P(H1) = 999/1000 P(D|H2) = 0 P(H2) = 1/1000,Comparing two hypotheses,D: HHHHH H1, H2: “fair coin”, “always heads” P(D|H1) = 1/25 P(H1) = 999/1000 P(D|H2) = 1 P(H2) = 1/1000,Comparing two hypotheses,D: HHHHHHHHHH H1, H2: “fair coin”, “always heads” P(D|H1) = 1/210 P(H1) = 999/1000 P(D|H2) = 1 P
21、(H2) = 1/1000,Measuring prior knowledge,1. The fact that HHHHH looks like a “mere coincidence”, without making us suspicious that the coin is unfair, while HHHHHHHHHH does begin to make us suspicious, measures the strength of our prior belief that the coin is fair. If q is the threshold for suspicio
22、n in the posterior odds, and D* is the shortest suspicious sequence, the prior odds for a fair coin is roughly q/P(D*|“fair coin”). If q 1 and D* is between 10 and 20 heads, prior odds are roughly between 1/1,000 and 1/1,000,000. 2. The fact that HHTHT looks representative of a fair coin, and HHHHH
23、does not, reflects our prior knowledge about possible causal mechanisms in the world. Easy to imagine how a trick all-heads coin could work: low (but not negligible) prior probability. Hard to imagine how a trick “HHTHT” coin could work: extremely low (negligible) prior probability.,Plan for this le
24、cture,Some basic aspects of Bayesian statistics Comparing two hypotheses Model fitting Model selection Two (very brief) case studies in modeling human inductive learning Causal learning Concept learning,Coin flipping,Basic Bayes data = HHTHT or HHHHH compare two hypotheses:P(H) = 0.5 vs. P(H) = 1.0
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
本资源只提供5页预览,全部文档请下载后查看!喜欢就下载吧,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- BAYESIANMODELSOFINDUCTIVELEARNINGPPT
