ImageVerifierCode 换一换
格式:PPT , 页数:53 ,大小:1.08MB ,
资源ID:376631      下载积分:2000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-376631.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(Introduction to Bayesian inference and computation for social .ppt)为本站会员(inwarn120)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

Introduction to Bayesian inference and computation for social .ppt

1、Introduction to Bayesian inference and computation for social science data analysisNicky Best Imperial College, Londonwww.bias-project.org.uk,Outline,Overview of Bayesian methods Illustration of conjugate Bayesian inference MCMC methods Examples illustrating: Analysis using informative priors Hierar

2、chical priors, meta-analysis and evidence synthesis Adjusting for data quality Model uncertainty Discussion,Overview of Bayesian inference and computation,Overview of Bayesian methods,Bayesian methods have been widely applied in many areas medicine / epidemiology / genetics ecology / environmental s

3、ciences finance archaeology political and social sciences, Motivations for adopting Bayesian approach vary natural and coherent way of thinking about science and learning pragmatic choice that is suitable for the problem in hand,Overview of Bayesian methods,Medical context: FDA draft guidance www.fd

4、a.gov/cdrh/meetings/072706-bayesian.html: “Bayesian statisticsprovides a coherent method for learning from evidence as it accumulates” Evidence can accumulate in various ways: Sequentially Measurement of many similar units (individuals, centres, sub-groups, areas, periods) Measurement of different a

5、spects of a problem Evidence can take different forms: Data Expert judgement,Overview of Bayesian methods,Bayesian approach also provides formal framework for propagating uncertainty Well suited to building complex models by linking together multiple sub-models Can obtain estimates and uncertainty i

6、ntervals for any parameter, function of parameters or predictive quantity of interest Bayesian inference doesnt rely on asymptotics or analytic approximations Arbitrarily wide range of models can be handled using same inferential framework Focus on specifying realistic models, not on choosing analyt

7、ically tractable approximation,Bayesian inference,Distinguish between x : known quantities (data) q : unknown quantities (e.g. regression coefficients, future outcomes, missing observations) Fundamental idea: use probability distributions to represent uncertainty about unknowns Likelihood model for

8、the data: p( x | q ) Prior distribution representing current uncertainty about unknowns: p(q ) Applying Bayes theorem gives posterior distribution,Conjugate Bayesian inference,Example: election poll (from Franklin, 2004*) Imagine an election campaign where (for simplicity) we have just a Government/

9、Opposition vote choice. We enter the campaign with a prior distribution for the proportion supporting Government. This is p(q ) As the campaign begins, we get polling data. How should we change our estimate of Governments support?,*Adapted from Charles Franklins Essex Summer School course slides: ht

10、tp:/www.polisci.wisc.edu/users/franklin/Content/Essex/Lecs/BayesLec01p6up.pdf,Conjugate Bayesian inference,Data and likelihood Each poll consists of n voters, x of whom say they will vote for Government and n - x will vote for the opposition. If we assume we have no information to distinguish voters

11、 in their probability of supporting government then we have a binomial distribution for x,This binomial distribution is the likelihood p(x | q ),Conjugate Bayesian inference,Prior We need to specify a prior that expresses our uncertainty about the election (before it begins) conforms to the nature o

12、f the q parameter, i.e. is continuous but bounded between 0 and 1 A convenient choice is the Beta distribution,Conjugate Bayesian inference,Beta(a,b) distribution can take a variety of shapes depending on its two parameters a and b,Mean of Beta(a, b) distribution = a/(a+b)Variance of Beta(a,b) distr

13、ibution = ab(a+b+1)/(a+b)2,Conjugate Bayesian inference,Posterior Combining a beta prior with the binomial likelihood gives a posterior distribution,When prior and posterior come from same family, the prior is said to be conjugate to the likelihood Occurs when prior and likelihood have the same kern

14、el,Conjugate Bayesian inference,Suppose I believe that Government only has the support of half the population, and I think that estimate has a standard deviation of about 0.07 This is approximately a Beta(50, 50) distribution We observe a poll with 200 respondents, 120 of whom (60%) say they will vo

15、te for Government This produces a posterior which is a Beta(120+50, 80+50) = Beta(170, 130) distribution,Conjugate Bayesian inference,Prior mean, E(q ) = 50/100 = 0.5 Posterior mean, E(q | x, n) = 170/300 = 0.57 Posterior SD, Var(q | x, n) = 0.029 Frequentist estimate is based only on the data:,Conj

16、ugate Bayesian inference,A harder problem What is the probability that Government wins? It is not .57 or .60. Those are expected votes but not the probability of winning. How to answer this? Frequentists have a hard time with this one. They can obtain a p-value for testing H0: q 0.5, but this isnt t

17、he same as the probability that Government wins (its actually the probability of observing data more extreme than 120 out of 200 if H0 is true),Easy from Bayesian perspective calculate Pr(q 0.5 | x, n), the posterior probability that q 0.5,Bayesian computation,All Bayesian inference is based on the

18、posterior distribution Summarising posterior distributions involves integration,Except for conjugate models, integrals are usually analytically intractable Use Monte Carlo (simulation) integration (MCMC),Bayesian computation,Suppose we didnt know how to analytically integrate the Beta(170, 130) post

19、erior .but we do know how to simulate from a Beta,Bayesian computation,Can also use samples to estimate posterior tail area probabilities, percentiles, variances etc. Difficult to generate independent samples when posterior is complex and high dimensional Instead, generate dependent samples from a M

20、arkov chain having p(q | x ) as its stationary distribution Markov chain Monte Carlo (MCMC),Illustrative Examples,Borrowing strength,Bayesian learning borrowing “strength” (precision) from other sources of information Informative prior is one such source “todays posterior is tomorrows prior” relevan

21、ce of prior information to current study must be justified,Informative priors,Example 1: Western and Jackman (1994)* Example of regression analysis in comparative research What explains cross-national variation in union density? Union density is defined as the percentage of the work force who belong

22、s to a labour union Two issues Philosophical: data represent all available observations from a population conventional (frequentist) analysis based on long-run behaviour of repeatable data mechanism not appropriate Practical: small, collinear dataset yields imprecise estimates of regression effects,

23、 Slides adapted from Jeff Grynaviski: http:/home.uchicago.edu/grynav/bayes/abs03.htm,Informative priors,Competing theories Wallerstein: union density depends on the size of the civilian labour force (LabF) Stephens: union density depends on industrial concentration (IndC) Note: These two predictors

24、 correlate at -0.92. Control variable: presence of a left-wing government (LeftG) Sample: n = 20 countries with a continuous history of democracy since World War II Fit linear regression model to compare theoriesunion densityi N(mi, s2)mi = b0 + b1LeftG + b2LabF + b3IndC,Informative priors,Results w

25、ith non-informative priors on regression coefficients (numerically equivalent to OLS analysis), point estimate _ 95% CI,Informative priors,Motivation for Bayesian approach with informative priors Because of small sample size and multicollinear variables, not able to adjudicate between theories Data

26、tend to favour Wallerstein (union density depends on labour force size), but neither coefficient estimated very precisely Other historical data are available that could provide further relevant information Incorporation of prior information provides additional structure to the data, which helps to u

27、niquely identify the two coefficients,Informative priors,Prior distributions for regression coefficients Wallerstein Believes in negative labour force effect Comparison of Sweden and Norway in 1950: doubling of labour force corresponds to 3.5-4% drop in union densityon log scale, labour force effect

28、 size -3.5/log(2) -5 Confidence in direction of effect represented by prior SD giving 95% interval that excludes 0 b2 N(-5, 2.52),Informative priors,Prior distributions for regression coefficients Stephens Believes in positive industrial concentration effect Decline in industrial concentration in UK

29、 in 1980s: drop of 0.3 in industrial concentration corresponds to about 3% drop in union densityindustrial concentration effect size 3/0.3 = 10 Confidence in direction of effect represented by prior SD giving 95% interval that excludes 0 b3 N(10, 52),Informative priors,Prior distributions for regres

30、sion coefficients Wallerstein and Stephens Both believe left-wing govts assist union growth Assuming 1 year of left-wing govt increases union density by about 1% translates to effect size of 0.3 Confidence in direction of effect represented by prior SD giving 95% interval that excludes 0 b1 N(0.3, 0

31、152) Vague prior b0 N(0, 1002) assumed for intercept,Informative priors,Ind ConcLab ForceLeft Govt,Informative priors,Effects of LabF and IndC estimated more precisely Both sets of prior beliefs support inference that labour-force size decreases union density Only Stephens prior supports conclusion

32、 that industrial concentration increases union density Choice of prior is subjective if no consensus, can we be satisfied that data have been interpreted “fairly”? Sensitivity analysis Sensitivity to priors (e.g. repeat analysis using priors with increasing variance) Sensitivity to data (e.g. residu

33、als, influence diagnostics),Hierarchical priors,Hierarchical priors are another widely used approach for borrowing strength Useful when data available on many “similar” units (individuals, areas, studies, subgroups,) Data xi and parameters qi for each unit i=1,N Three different assumptions: Independ

34、ent parameters: units are unrelated, and each qi is estimated separately using data xi alone Identical parameters: observations treated as coming from same unit, with common parameter q Exchangeable parameters: units are “similar” (labels convey no information) mathematically equivalent to assuming

35、qis are drawn from common probability distribution with unknown parameters,Meta-analysis,Example 2: Meta-analysis (Spiegelhalter et al 2004) 8 small RCTs of IV magnesium sulphate following acute myocardial infarction Data: xig = deaths, nig = patients in trial i, treatment group g (0=control, 1=magn

36、esium) Model (likelihood): xig Binomial(pig, nig)logit(pig) = fi + qig i is log odds ratio for treatment effect If not willing to believe trials are identical, but no reason to believe they are systematically different assume qis are exchangeable with hierarchical prior qi Normal(m, s 2) m, s 2 also

37、 treated as unknown with (vague) priors,Meta-analysis,Estimates and 95% intervals for treatment effect from independent MLE and hierarchical Bayesian analysis,Meta-analysis,Effective sample sizen = sample size of trialV1 = variance of qi without borrowing (var of MLE)V2 = variance of qi with borrowi

38、ng (posterior variance of qi ),Meta-analysis,Example 3: Meta-analysis of effect of class size on educational achievement (Goldstein et al, 2000),8 studies: 1 RCT 3 matched 2 experimental 2 observational,Meta-analysis,Goldstein et al use maximum likelihood, with bootstrap CI due to small sample size

39、Under-estimates uncertainty relative to Bayesian intervals Note that 95% CI for Bayesian estimate of effect of class size includes 0,Accounting for data quality,Bayesian approach also provides formal framework for propagating uncertainty about different quantities in a model Natural tool for explici

40、tly modelling different aspects of data quality Measurement error Missing data,Accounting for data quality,Example 4: Accounting for population errors in small-area estimation and disease mapping (Best and Wakefield, 1999),Context: Mapping geographical variations in risk of breast cancer by electora

41、l ward in SE England, 1981-1991 Typical model: yi Poisson(li Ni) yi = number of breast cancer cases in area i,li is the area specific rate of breast cancer: parameter of interest,Ni = St Nit = population-years at risk in area i,Accounting for data quality,Ni usually assumed to be known Ignores uncer

42、tainty in small-area age/sex population counts in inter-census years B&W make use of additional data on Registrar Generals mid-year district age/sex population totals Ndt Model A: Nit = Ndt pit where pit is proportion of annual district population in particular age group of interest living in ward i

43、 pit estimated by interpolating 1981 and 1991 census counts Model B: Allow for sampling variability in Nit Nit Multinomial(Ndt, p1t , pKt ) Model C: Allow for uncertainty in proportions pit pit informative Dirichlet prior distribution,Accounting for data quality,prior,prior,prior,ward i,Random effec

44、ts Poisson regression: log li = ai + b Xi Xi = deprivation score for ward i,Accounting for data quality,prior,prior,prior,ward i,year t,prior,Accounting for data quality,Area-specific RR estimates,Model uncertainty,Model uncertainty can be large for observational data studies In regression models: W

45、hat is the best set of predictors for response of interest? Which confounders to control for? Which interactions to include? What functional form to use (linear, non-linear,.)?,Model uncertainty,Example 5: Predictors of crime rates in US States (adapted from Raftery et al, 1997) Ehrlich (1973) devel

46、oped and tested theory that decision to commit crime is rational choice based on costs and benefits Costs of crime related to probability of imprisonment and average length of time served in prison Benefits of crime related to income inequalities and aggregate wealth of community Net benefits of oth

47、er (legitimate) activities related to employment rate and education levels in community Ehrlich analysed data from 47 US states in 1960, focusing on relationship between crime rate and the 2 prison variables Up to 13 candidate control variables also considered,Model uncertainty,y = log crime rate in

48、 1960 in each of 47 US states Z1, Z2 = log prob. of prison, log av. time in prison X1, X13 = candidate control variables Fit Normal linear regression model Results sensitive to choice of control variables,Table adapted from Table 2 in Raftery et al (1997),Model uncertainty,Using Bayesian approach, c

49、an let set of control variables be an unknown parameter of the model, q Dont know (a priori) no. of covariates in best model q has unknown dimension assign prior distribution Can handle such “trans-dimensional” (TD) models using “reversible jump” MCMC algorithms Normal linear regression model yi Normal(mi, s 2) i = 1,.,47,Variable selection model:,yi,k,s2,b,q,mi,state i,g,Model uncertainty,Model uncertainty,probability 0 0.4 0.8,Model uncertainty,Model uncertainty,

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1