2y ago

23 Views

3 Downloads

249.75 KB

41 Pages

Transcription

Introduction to Bayesian StatisticsChristiana KartsonakiFebruary 11th, 2015

IntroductionBayes’ theorem is an elementary result in probability theory which relatesthe conditional probability P(A given B) to P(B given A), for two events Aand B.Christiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 20152 / 28

Bayes’ theoremP(A): probability of event AP(A B): conditional probability of event A given that event B hasoccurredP(A B) Christiana KartsonakiP(B A) P(A)P(B)Introduction to Bayesian StatisticsFebruary 11th, 20153 / 28

Bayesian inferenceP(conclusion data) P(data conclusion) P(conclusion)Christiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 20154 / 28

Bayesian inferenceP(conclusion data) P(data conclusion) P(conclusion)posteriorChristiana KartsonakilikelihoodIntroduction to Bayesian StatisticspriorFebruary 11th, 20154 / 28

Examplebuses arrive in a random pattern (Poisson process), mean time intervalunknownyou know something about the interval (e.g. likely not every 1 hour, butnot 1 minute either)the two sources of information are synthesized as a probability distributionChristiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 20155 / 28

Examplebuses arrive in a random pattern (Poisson process), mean time intervalunknowndata likelihoodyou know something about the interval (e.g. likely not every 1 hour, butnot 1 minute either)prior distributionthe two sources of information are synthesized as a probability distributionposterior distributionChristiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 20155 / 28

Examplebuses arrive in a random pattern (Poisson process), mean time intervalunknowndata likelihoodyou know something about the interval (e.g. likely not every 1 hour, butnot 1 minute either)prior distributionthe two sources of information are synthesized as a probability distributionposterior distribution merge information from data with ‘external’ informationChristiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 20155 / 28

Statistical inferenceparameter θdata xmodel f (x, θ)Christiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 20156 / 28

Bayes’ theoremp(θ x) p(x θ) π(θ)p(x)π(θ): prior distributionp(x θ): likelihoodp(θ x): posterior distributionp(x): predictive probability of x (normalizing factor)Christiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 20157 / 28

Bayes’ theoremp(θ x) p(x θ) π(θ)p(x)π(θ): prior distributionp(x θ): likelihoodp(θ x): posterior distributionp(x): predictive probability of x (normalizing factor)posterior likelihood priorChristiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 20157 / 28

Bayesian inferenceAny quantity that does not depend on θ cancels out from the denominatorand numerator of Bayes’ theorem.So if we can recognise which density is proportional to the product of thelikelihood and the prior, regarded solely as a function of θ, we know theposterior density of θ.Christiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 20158 / 28

Frequentist and Bayesian statisticsFrequentist approaches typically treat θ as an unknown constantBayesian approaches treat it as a random variableChristiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 20159 / 28

LikelihoodLikelihood used in most approaches to formal statistical inference.Describes the data generating process.Christiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201510 / 28

Prior and posterior distributionprior distribution represents information about the parameters otherthan that supplied by the data under analysisposterior distribution probability distribution as revised in the light ofthe data, determined by applying standard rules of probability theoryChristiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201511 / 28

Prior and posterior distributionprior distribution represents information about the parameters otherthan that supplied by the data under analysisposterior distribution probability distribution as revised in the light ofthe data, determined by applying standard rules of probability theorySometimes the prior distribution is ‘flat’ over a particular scale, intendedto represent the absence of initial information.In complex problems with many nuisance parameters the use of flat priordistributions is suspect and, at the very least, needs careful study usingsensitivity analyses.Christiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201511 / 28

ExampleX1 , . . . , Xn : random sample from an exponential distribution with densityf (x θ) θe θx , x 0prior π(θ) λe λθ , θ 0, for some known value of λChristiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201512 / 28

ExampleX1 , . . . , Xn : random sample from an exponential distribution with densityf (x θ) θe θx , x 0prior π(θ) λe λθ , θ 0, for some known value of λThen the likelihood isf (x1 , . . . , xn θ) θn e θChristiana KartsonakiPnIntroduction to Bayesian Statisticsi 1 xi.February 11th, 201512 / 28

ExampleX1 , . . . , Xn : random sample from an exponential distribution with densityf (x θ) θe θx , x 0prior π(θ) λe λθ , θ 0, for some known value of λThen the likelihood isf (x1 , . . . , xn θ) θn e θPni 1 xi.So the posterior distribution isChristiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201512 / 28

ExampleX1 , . . . , Xn : random sample from an exponential distribution with densityf (x θ) θe θx , x 0prior π(θ) λe λθ , θ 0, for some known value of λThen the likelihood isf (x1 , . . . , xn θ) θn e θPni 1 xi.So the posterior distribution isp(θ x) π(θ) f (x θ)Christiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201512 / 28

ExampleX1 , . . . , Xn : random sample from an exponential distribution with densityf (x θ) θe θx , x 0prior π(θ) λe λθ , θ 0, for some known value of λThen the likelihood isf (x1 , . . . , xn θ) θn e θPni 1 xi.So the posterior distribution isp(θ x) λe λθ θn e θChristiana KartsonakiIntroduction to Bayesian StatisticsPxiFebruary 11th, 201512 / 28

ExampleX1 , . . . , Xn : random sample from an exponential distribution with densityf (x θ) θe θx , x 0prior π(θ) λe λθ , θ 0, for some known value of λThen the likelihood isf (x1 , . . . , xn θ) θn e θPni 1 xi.So the posterior distribution isp(θ x) λe λθ θn e θ θn e θ(λ Christiana KartsonakiPPxixi )Introduction to Bayesian StatisticsFebruary 11th, 201512 / 28

ExampleX1 , . . . , Xn : random sample from an exponential distribution with densityf (x θ) θe θx , x 0prior π(θ) λe λθ , θ 0, for some known value of λThen the likelihood isf (x1 , . . . , xn θ) θn e θPni 1 xi.So the posterior distribution isp(θ x) θn e θ(λ Christiana KartsonakiPIntroduction to Bayesian Statisticsxi ),February 11th, 201512 / 28

ExampleX1 , . . . , Xn : random sample from an exponential distribution with densityf (x θ) θe θx , x 0prior π(θ) λe λθ , θ 0, for some known value of λThen the likelihood isf (x1 , . . . , xn θ) θn e θPni 1 xi.So the posterior distribution isp(θ x) θn e θ(λ which is the Gamma(n 1, λ Christiana KartsonakiPPxi ),xi ) density.Introduction to Bayesian StatisticsFebruary 11th, 201512 / 28

History‘inverse probability’prior distribution intended to represent initial ignorance usedsystematically in statistical analysis by Gauss and especially Laplace (circa1800)the approach was criticized during the 19th centuryin the middle of the 20th century attention shifted to a personalistic viewof probability – individual belief as expressed by individual choice in adecision-making contextChristiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201513 / 28

Prior distributionControversial aspects concern prior. What does it mean?What is the prior probability that treatment and control have identicaleffect?What is the prior probability that the difference between two groups isbetween 5 and 15 units?prior distribution must be specified explicitly, i.e. in effect numericallyChristiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201514 / 28

Prior distributionBroadly three approaches:1Summary of other data.Empirical Bayes.2Prior measures personalistic opinion of investigator about conclusions.Not useful for ‘public’ transmission of knowledge.3Objective degree of uncertainty.12Agreed measure of uncertainty.Ignorance, reference, flat prior for interval estimate. Laplace’s principleof indifference.Christiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201515 / 28

Empirical Bayes‘empirical’ frequency interpretation impliede.g. an unknown parameter representing a mean of some measurement –likely to vary under different circumstancescan be represented by a widely dispersed distributionleading to a posterior distribution with a frequency interpretationExample: variances of gene expression for different probes on a microarraymay be assumed to be a sample from a distribution with a commonparameterChristiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201516 / 28

Personalistic priorreflects the investigator’s subjective beliefsprior distribution is based on relatively informally recalled experience of afield, for example on data that have been seen only informallyChristiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201517 / 28

Flat priora prior which aims to insert as little new information as possiblefor relatively simple problems often limiting forms of the prior reproduceapproximately or exactly posterior intervals equivalent to confidenceintervalsChristiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201518 / 28

PriorsIs the prior distribution a positive insertion of evidence? If so, what isits basis and has the consistency of that evidence with the currentdata been checked?If flat/ignorance/reference priors have been used, how have they beenchosen? Has there been a sensitivity analysis? If the number ofparameters over which a prior distribution is defined is appreciablethen the choice of a flat prior distribution could be misleading.Each of a substantial number of individuals may have been allocateda value of an unknown parameter, the values having a stablefrequency distribution across individuals – empirical Bayes.Christiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201519 / 28

Posterior distributionConclusions can be summarized using for exampleposterior meanposterior variancecredible intervalsChristiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201520 / 28

Credible intervalsa region Cα (x) is a 100(1 α)% credible region for θ ifZp(θ x) dθ 1 αCα (x)Ithere is posterior probability 1 α that θ is in Cα (x)credible interval – special case of credible regionanalogous to (frequentist) confidence intervals, different interpretationChristiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201521 / 28

Hypothesis testingFrequentist approach to hypothesis testing compares a null hypothesisH0 with an alternative H1 through a test statistic T that tends to belarger under H1 than under H0 and rejects H0 for small p-valuesp PH0 (T tobs ), where tobs is the value of T actually observed and theprobability is computed as if H0 were trueBayesian approach attaches prior probabilities to models correspondingto H0 and H1 and compares their posterior probabilities using the BayesfactorB10 Christiana KartsonakiP(x H1 )P(x H0 )Introduction to Bayesian StatisticsFebruary 11th, 201522 / 28

Computationconjugate prior when the prior and the posterior are from the samefamily of distributions (for example normal prior and normal likelihood)makes calculations easierhowever, often unrealistic, so posterior distributions need to be evaluatednumericallyChristiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201523 / 28

Markov chain Monte Carlo (MCMC)Markov chain Monte Carlo (MCMC): a stochastic simulation techniquewhich is used for computing inferential quantities which cannot beobtained analyticallyMCMC simulates a discrete-time Markov chainit produces a dependent sequence of random variables{θ(1) , . . . , θ(M) } with approximate distribution the posteriordistribution of interestMCMC is an iterative procedure, such that given the current state ofthe chain, θ(i), the algorithm makes a probabilistic update to θ(i 1)Markov chains can automatically be constructed to match anyposterior densityChristiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201524 / 28

MCMCTwo of the most general procedures for MCMC simulation from a targetdistribution: Metropolis–Hastings algorithm Gibbs samplerChristiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201525 / 28

MCMCTwo of the most general procedures for MCMC simulation from a targetdistribution: Metropolis–Hastings algorithm Gibbs samplerSoftware: WinBugs – a Windows version of BUGS (Bayesian analysis Using theGibbs Sampler) CODA: a collection of convergence diagnostics and sample outputanalysis programs JAGS (Just Another Gibbs Sampler)Christiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201525 / 28

MCMC – priorsMCMC mostly uses flat priors.Flat for θ not same as flat for e.g. log(θ).For models with fairly few parameters and reasonable data givesconfidence level.For large number of parameters may give very bad answer. Nogeneral theory known.Christiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201526 / 28

DiscussionBayesian inference based on Bayes’ theoremdifferences in interpretation between Bayesian and frequentistinferencechoice of prior controversialcomputation usually done numerically; MCMC useful but to be usedwith cautionChristiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201527 / 28

Further readingCox, D. R. (2006). Frequentist and Bayesian Statistics: A Critique(Keynote Address). In Statistical Problems in Particle Physics,Astrophysics and Cosmology (Vol. 1, p. 3).Cox, D. R. and Donnelly, C. A. (2011). Principles of AppliedStatistics. Cambridge University Press.Davison, A. C. (2003). Statistical Models (Vol. 11). CambridgeUniversity Press.Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2003).Bayesian Data Analysis. Chapman and Hall/CRC Texts in c.uk/software/bugs/Christiana KartsonakiIntroduction to Bayesian StatisticsFebruary 11th, 201528 / 28

Christiana Kartsonaki Introduction to Bayesian Statistics February 11th, 2015 19 / 28. Posterior distribution Conclusions can be summarized using for example posterior mean posterior variance credible intervals Christiana Kartsonaki Introduction to Bayesian Statistics February 11th, 2015 20

Related Documents: