CHAPTER 13 Bayesian Estimation In Hierarchical Models

2y ago
28 Views
2 Downloads
2.12 MB
21 Pages
Last View : 10d ago
Last Download : 3m ago
Upload by : Julius Prosser
Transcription

CHAPTER13Bayesian Estimation in Hierarchical ModelsJohn K. Kruschke and Wolf VanpaemelKruschke, J. K. and Vanpaemel, W. (2015). Bayesian estimation in hierarchical models. In: J. R.Busemeyer, Z. Wang, J. T. Townsend, and A. Eidels (Eds.), The Oxford Handbook ofComputational and Mathematical Psychology, pp. 279-299. Oxford, UK: Oxford University Press.AbstractBayesian data analysis involves describing data by meaningful mathematical models, andallocating credibility to parameter values that are consistent with the data and with priorknowledge. The Bayesian approach is ideally suited for constructing hierarchical models,which are useful for data structures with multiple levels, such as data from individuals whoare members of groups which in turn are in higher-level organizations. Hierarchical modelshave parameters that meaningfully describe the data at their multiple levels and connectinformation within and across levels. Bayesian methods are very flexible and straightforwardfor estimating parameters of complex hierarchical models (and simpler models too). Weprovide an introduction to the ideas of hierarchical models and to the Bayesian estimationof their parameters, illustrated with two extended examples. One example considersbaseball batting averages of individual players grouped by fielding position. A secondexample uses a hierarchical extension of a cognitive process model to examine individualdifferences in attention allocation of people who have eating disorders. We conclude bydiscussing Bayesian model comparison as a case of hierarchical modeling.Key Words: Bayesian statistics, Bayesian data analysis, Bayesian modeling, hierarchicalmodel, model comparison, Markov chain Monte Carlo, shrinkage of estimates,multiple comparisons, individual differences, cognitive psychometrics, attentionallocationThe Ideas of Hierarchical BayesianEstimationBayesian reasoning formalizes the reallocationof credibility over possibilities in consideration ofnew data. Bayesian reasoning occurs routinely ineveryday life. Consider the logic of the fictionaldetective Sherlock Holmes, who famously said thatwhen a person has eliminated the impossible, thenwhatever remains, no matter how improbable, mustbe the truth (Doyle, 1890). His reasoning beganwith a set of candidate possibilities, some of whichhad low credibility a priori. Then he collectedevidence through detective work, which ruled outsome possibilities. Logically, he then reallocatedcredibility to the remaining possibilities. Thecomplementary logic of judicial exoneration is alsocommonplace. Suppose there are several unaffiliatedsuspects for a crime. If evidence implicates one ofthem, then the other suspects are exonerated. Thus,the initial allocation of credibility (i.e., culpability)across the suspects was reallocated in response tonew data.In data analysis, the space of possibilities consistsof parameter values in a descriptive model. Forexample, consider a set of data measured on acontinuous scale, such as the weights of a groupof 10-year-old children. We might want to describethe set of data in terms of a mathematical normaldistribution, which has two parameters, namelythe mean and the standard deviation. Beforecollecting the data, the possible means and standarddeviations have some prior credibility, about which279Kruschke, J. K. and Vanpaemel, W. (2015). Bayesian estimation in hierarchical models. In: J. R. Busemeyer, Z. Wang,J. T. Townsend, and A. Eidels (Eds.), The Oxford Handbook of Computational and Mathematical Psychology, pp. 279-299.Oxford, UK: Oxford University Press.

we might be very uncertain or highly informed.After collecting the data, we reallocate credibilityto values of the mean and standard deviation thatare reasonably consistent with the data and with ourprior beliefs. The reallocated credibilities constitutethe posterior distribution over the parameter values.We care about parameter values in formal modelsbecause the parameter values carry meaning. Whenwe say that the mean weight is 32 kilograms and thestandard deviation is 3.2 kilograms, we have a clearsense of how the data are distributed (accordingto the model). As another example, suppose wewant to describe children’s growth with a simplelinear function, which has a slope parameter. Whenwe say that the slope is 5 kilograms per year, wehave a clear sense of how weight changes throughtime (according to the model). The central goalof Bayesian estimation, and a major goal of dataanalysis generally, is deriving the most credibleparameter values for a chosen descriptive model,because the parameter values are meaningful in thecontext of the model.Bayesian estimation provides an entire distribution of credibility over the space of parameter values, not merely a single “best” value.The distribution precisely captures our uncertaintyabout the parameter estimate. The essence ofBayesian estimation is to formally describe howuncertainty changes when new data are taken intoaccount.Hierarchical Models Have Parameters withHierarchical MeaningIn many situations, the parameters of a modelhave meaningful dependencies on each other. As asimplistic example, suppose we want to estimate theprobability that a type of trick coin, manufacturedby the Acme Toy Company, comes up heads.We know that different coins of that type havesomewhat different underlying biases to come upheads, but there is a central tendency in the biasimposed by the manufacturing process. Thus, whenwe flip several coins of that type, each severaltimes, we can estimate the underlying biases ineach coin and the typical bias and consistencyof the manufacturing process. In this situation,the observed heads of a coin depend only on thebias in the individual coin, but the bias in thecoin depends on the manufacturing parameters.This chain of dependencies among parametersexemplifies a hierarchical model (Kruschke, 2015,Ch. 9).280As another example, consider research intochildhood obesity. The researchers measure weightsof children in a number of different schools thathave different school lunch programs, and from anumber of different school districts that may havedifferent but unknown socioeconomic statuses. Inthis case, a child’s weight might be modeled asdependent on his or her school lunch program.The school lunch program is characterized byparameters that indicate the central tendency andvariability of weights that it tends to produce. Theparameters of the school lunch program are, inturn, dependent on the school’s district, whichis described by parameters indicating the centraltendency and variability of school-lunch parametersacross schools in the district. This chain ofdependencies among parameters again exemplifiesa hierarchical model.In general, a model is hierarchical if theprobability of one parameter can be conceivedto depend on the value of another parameter.Expressed formally, suppose the observed data,denoted D, are described by a model with twoparameters, denoted α and β. The probability ofthe data is a mathematical function of the parametervalues, denoted by p(D α, β), which is called thelikelihood function of the parameters. The priorprobability of the parameters is denoted p(α, β).Notice that the likelihood and prior are expressed,so far, in terms of combinations of α and β inthe joint parameter space. The probability of thedata, weighted by the probability of the parametervalues, is the product, p(D α, β)p(α, β). The modelis hierarchical if that product can be factored as achain of dependencies among parameters, such asp(D α, β)p(α, β) p(D α)p(α β)p(β).Many models can be reparameterized, and conditional dependencies can be revealed or obscuredunder different parameterizations. The notion ofhierarchical has to do with a particular meaningfuldefinition of a model structure that expressesdependencies among parameters in a meaningfulway. In other words, it is the semantics of theparameters when factored in the corresponding waythat makes a model hierarchical. Ultimately, anymultiparameter model merely has parameters in ajoint space, whether that joint space is conceived ashierarchical or not. Many realistic situations involvenatural hierarchical meaning, as illustrated by thetwo major examples that will be described at lengthin this chapter.One of the primary applications of hierarchicalmodels is describing data from individuals withinnew directionsKruschke, J. K. and Vanpaemel, W. (2015). Bayesian estimation in hierarchical models. In: J. R. Busemeyer, Z. Wang,J. T. Townsend, and A. Eidels (Eds.), The Oxford Handbook of Computational and Mathematical Psychology, pp. 279-299.Oxford, UK: Oxford University Press.

groups. A hierarchical model may have parametersfor each individual that describe each individual’stendencies, and the distribution of individualparameters within a group is modeled by a higherlevel distribution with its own parameters thatdescribe the tendency of the group. The individuallevel and group-level parameters are estimatedsimultaneously. Therefore, the estimate of eachindividual-level parameter is informed by all theother individuals via the estimate of the group-leveldistribution, and the group-level parameters aremore precisely estimated by the jointly constrainedindividual-level parameters. The hierarchical approach is better than treating each individualindependently because the data from differentindividuals meaningfully inform one another. Andthe hierarchical approach is better than collapsingall the individual data together because collapseddata may blur or obscure trends within eachindividual.are inherently designed to provide clear representations of uncertainty. A thorough critique offrequentist methods such as p values would take ustoo far afield. Interested readers may consult manyother references, such as articles by Kruschke (2013)or Wagenmakers (2007).Some Mathematics and Mechanics ofBayesian EstimationThe mathematically correct reallocation of credibility over parameter values is specified by Bayes’rule (Bayes & Price, 1763):p(α D) p(D α) p(α) /p(D)) * ,) * , )* ,posterior likelihood prior(1)where p(D) dα p(D α)p(α)(2)Advantages of the Bayesian ApproachBayesian methods provide tremendous flexibilityin designing models that are appropriate for describing the data at hand, and Bayesian methods providea complete representation of parameter uncertainty(i.e., the posterior distribution) that can be directlyinterpreted. Unlike the frequentist interpretation ofparameters, there is no construction of samplingdistributions from auxiliary null hypotheses. In afrequentist approach, although it may be possibleto find a maximum-likelihood estimate (MLE) ofparameter values in a hierarchical nonlinear model,the subsequent task of interpreting the uncertaintyof the MLE can be very difficult. To decide whetheran estimated parameter value is significantly different from a null value, frequentist methodsdemand construction of sampling distributions ofarbitrarily-defined deviation statistics, generatedfrom arbitrarily-defined null hypotheses, fromwhich p values are determined for testing null hypotheses. When there are multiple tests, frequentistdecision rules must adjust the p values. Moreover,frequentist methods are unwieldy for constructingconfidence intervals on parameters, especially forcomplex hierarchical nonlinear models that areoften the primary interest for cognitive scientists.1Furthermore, confidence intervals change when theresearcher intention changes (e.g., Kruschke, 2013).Frequentist methods for measuring uncertainty (asconfidence intervals from sampling distributions)are fickle and difficult, whereas Bayesian methodsis called the “marginal likelihood” or “evidence.”The formula in Eq. 1 is a simple consequenceof the definition of conditional probability (e.g.,Kruschke, 2015), but it has huge ramificationswhen applied to meaningful, complex models.In some simple situations, the mathematicalform of the posterior distribution can be analyticallyderived. These cases demand that the integral inEq. 2 can be mathematically derived in conjunctionwith the product of terms in the numerator ofBayes’ rule. When this can be done, the result canbe especially pleasing because an explicit, simpleformula for the posterior distribution is obtained.Analytical solutions for Bayes’ rule can rarely beachieved for realistically complex models. Fortunately, instead, the posterior distribution is approximated, to arbitrarily high accuracy, by generatinga huge random sample of representative parametervalues from the posterior distribution. A largeclass of algorithms for generating a representativerandom sample from a distribution is called Markovchain Monte Carlo (MCMC) methods. Regardlessof which particular sampler from the class is used, inthe long run they all converge to an accurate representation of the posterior distribution. The biggerthe MCMC sample, the finer-resolution picturewe have of the posterior distribution. Because thesampling process uses a Markov chain, the randomsample produced by the MCMC process is oftencalled a chain.bayesian estimation in hierarchical models281Kruschke, J. K. and Vanpaemel, W. (2015). Bayesian estimation in hierarchical models. In: J. R. Busemeyer, Z. Wang,J. T. Townsend, and A. Eidels (Eds.), The Oxford Handbook of Computational and Mathematical Psychology, pp. 279-299.Oxford, UK: Oxford University Press.

Box 1 MCMC DetailsBecause the MCMC sampling is a randomwalk through parameter space, we would likesome assurance that it successfully explored theposterior distribution without getting stuck,oversampling, or undersampling zones of theposterior. Mathematically, the samplers will beaccurate in the long run, but we do not knowin advance exactly how long is long enough toproduce a reasonably good sample.There are various diagnostics for assessingMCMC chains. It is beyond the scope of thischapter to review their details, but the ideasare straightforward. One type of diagnosticassesses how “clumpy” the chain is, by using adescriptive statistic called the autocorrelation ofthe chain. If a chain is strongly autocorrelated,successive steps in the chain are near each other,thereby producing a clumpy chain that takes along time to smooth out. We want a smoothsample to be sure that the posterior distributionis accurately represented in all regions of theparameter space. To achieve stable estimatesof the tails of the posterior distribution, oneheuristic is that we need about 10,000 independent representative parameter values (Kruschke,2015, Section 7.5.2). Stable estimates of centraltendencies can be achieved by smaller numbersof independent values. A statistic called theeffective sample size (ESS) takes into accountthe autocorrelation of the chain and suggestswhat would be an equivalently sized sample ofindependent values.Another diagnostic assesses whether theMCMC chain has gotten stuck in a subset ofthe posterior distribution, rather than exploringthe entire posterior parameter space. Thisdiagnostic takes advantage of running two ormore distinct chains, and assessing the extentto which the chains overlap. If several differentchains thoroughly overlap, we have evidencethat the MCMC samples have converged to arepresentative sample.It is important to understand that the MCMC“sample” or “chain” is a huge representative sampleof parameter values from the posterior distribution.The MCMC sample is not to be confused with thesample of data. For any particular analysis, thereis a single fixed sample of data, and there is a single underlying mathematical posterior distribution282that is inferred from the sample of data. TheMCMC chain typically uses tens of thousands ofrepresentative parameter values from the posteriordistribution to represent the posterior distribution.Box 1 provides more details about assessing whenan MCMC chain is a good representation of theunderlying posterior distribution.Contemporary MCMC software works seamlessly for complex hierarchical models involvingnonlinear relationships between variables and nonnormal distributions at multiple levels. Modelspecification languages such as BUGS (Lunn,Jackson, Best, Thomas, & Spiegelhalter, 2013;Lunn, Thomas, Best, & Spiegelhalter, 2000), JAGS(Plummer, 2003), and Stan (Stan, 2013) allowthe user to specify descriptive models to satisfytheoretical and empirical demands.Example: Shrinkage and MultipleComparisons of Baseball Batting AbilitiesAmerican baseball is a sport in which one person,called a pitcher, throws a small ball as quickly aspossible over a small patch of earth, called homeplate, next to which is standing another personholding a stick, called a bat, who tries to hit theball with the bat. If the ball is hit appropriately intothe field, the batter attempts to run to other markedpatches of earth arranged in a diamond shape. Thebatter tries to arrive at the first patch of earth, calledfirst base, before the other players, called fielders,can retrieve the ball and throw it to a teammateattending first base.One of the crucial abilities of baseball players is,therefore, the ability to hit a very fast ball (sometimes thrown more than 90 miles [145 kilometers]per hour) with the bat. An important goal forenthusiasts of baseball is estimating each player’sability to bat the ball. Ability can not be assesseddirectly but can only be estimated by observing howmany times a player was able to hit the ball in all hisopportunities at bat, or by observing hits and at-batsfrom other similar players.There are nine players in the field at once, whospecialize in different positions. These include thepitcher, the catcher, the first base man, the secondbase man, the third base man, the shortstop, theleft fielder, the center fielder, and the right fielder.When one team is in the field, the other team is atbat. The teams alternate being at bat and being inthe field. Under some rules, the pitcher does nothave to bat when his team is at bat.Because different positions emphasize differentskills while on the field, not all players are prizednew directionsKruschke, J. K. and Vanpaemel, W. (2015). Bayesian estimation in hierarchical models. In: J. R. Busemeyer, Z. Wang,J. T. Townsend, and A. Eidels (Eds.), The Oxford Handbook of Computational and Mathematical Psychology, pp. 279-299.Oxford, UK: Oxford University Press.

for their batting ability alone. In particular, pitchersand catchers have specialized skills that are crucialfor team success. Therefore, based on the structureof the game, we know that players with differentprimary positions are likely to have different battingabilities.The DataThe data consist of records from 948 players inthe 2012 regular season of Major League Baseballwho had at least one at-bat.2 For player i, we havehis number of opportunities at bat, ABi , his numberof hits Hi , and his primary position when in thefield pp(i). In the data, there were 324 pitchers witha median of 4.0 at-bats, 103 catchers with a medianof 170.0 at-bats, and 60 right fielders with a medianof 340.5 at-bats, along with 461 players in six otherpositions.The Descriptive Model with Its MeaningfulParametersWe want to estimate, for each player, hisunderlying probability θi of hitting the ball whenat bat. The primary data to inform our estimateof θi are the player’s number of hits, Hi , andhis number of opportunities at bat, ABi . But theestimate will also be informed by our knowledgeof the player’s primary position, pp(i), and by thedata from all the other players (i.e., their hits, atbats, and positions). For example, if we know thatplayer i is a pitcher, and we know that pitcherstend to have θ values around 0.13 (because of allthe other data), then our estimate of θi should beanchored near 0.13 and adjusted by the specifichits and at-bats of the individual player. We willconstruct a hierarchical model that rationally sharesinformation across players within positions, andacross positions within all major league players.3We denote the ith player’s underlying probabilityof getting a hit as θi . (See Box 2 for discussionof assumptions in modeling.) Then the number ofhits Hi out of ABi at-bats is a random draw froma binomial distribution that has success rate θi , asillustrated at the bottom of Figure 13.1. The arrowpointing to Hi is labeled with a “ ” symbol toindicate that the number of hits is a random variabledistributed as a binomial distribution.To formally express our prior belief that differentprimary positions emphasize different skills andhence have different batting abilities, we assumethat the player abilities θi come from distributionsspecific to each position. Thus, the θi ’s for the 324Box 2 Model AssumptionsFor the analysis of batting abilities, we assumethat a player’s batting ability, θi , is constantfor all at-bats, and that the outcome of anyat-bat is independent of other at-bats. Theseassumptions may be false, but the notionof a constant underlying batting ability is ameaningful construct for our present purposes.Assumptions must be made for any statisticalanalysis, whether Bayesian or not, and theconclusions from any statistical analysis areconditional on its assumptions. An advantageof Bayesian analysis is that, relative to 20thcentury frequentist techniques, there is greaterflexibility to make assumptions that are appropriate to the situation. For example, ifwe wanted to build a more elaborate analysis,we could incorporate data about when inthe season the at-bats occurred, and estimatetemporal trends in ability due to practice orfatigue. Or, we could incorporate data aboutwhich pitcher was being faced in each atbat, and we could estimate pitcher difficultiessimultaneously with batter abilities. But theseelaborations, although possible in the Bayesianframework, would go far beyond our purposesin this chapter.pitchers are assumed to come from a distributionspecific to pitchers, that might have a differentcentral tendency and dispersion than the distribution of abilities for the 103 catchers, and so onfor the other positions. We model the distributionof θi ’s for a position as a beta distribution, whichis a natural distribution for describing values thatfall between zero and one, and is often used inthis sort of application (e.g., Kruschke, 2015).The mean of the beta distribution for primaryposition pp is denoted μpp , and the narrowness ofthe distribution is denoted κpp . The value of μpprepresents the typical batting ability of players inprimary position pp, and the value of κpp representshow tightly clustered the abilities are across playersin primary position pp. The κ parameter issometimes called the concentration or precision ofthe beta distribution.4 Thus, an individual playerwhose primary position is pp(i) is assumed to have abatting ability θi that comes from a beta distributionwith mean μpp(i) and precision κpp(i) . The valuesof μpp and κpp are estimated simultaneously withall the θi . Figure 13.1 illustrates this aspect ofthe model by showing an arrow pointing to θibayesian estimation in hierarchical models283Kruschke, J. K. and Vanpaemel, W. (2015). Bayesian estimation in hierarchical models. In: J. R. Busemeyer, Z. Wang,J. T. Townsend, and A. Eidels (Eds.), The Oxford Handbook of Computational and Mathematical Psychology, pp. 279-299.Oxford, UK: Oxford University Press.

from a beta distribution. The arrow is labeled with“ . . . i” to indicate that the θi have credibilitiesdistributed as a beta distribution for each of theindividuals. The diagram shows beta distributionsas they are conventionally parameterized by twoshape parameters, denoted app and bpp , that can bealgebraically redescribed in terms of the mean μppand precision κpp of the distribution: app μpp κppand bpp (1 μpp )κpp .To formally express our prior knowledge thatall players, from all positions, are professionalsin major league baseball, and, therefore, shouldmutually inform each other’s estimates, we assumethat the nine position abilities μpp come froman overarching beta distribution with mean μμppand precision κμpp . This structure is illustratedin the upper part of Figure 13.1 by the splitarrow, labeled with “ . . . pp”, pointing to μppfrom a beta distribution. The value of μμpp inthe overarching distribution represents our estimateof the batting ability of major league playersgenerally, and the value of κμpp represents howtightly clustered the abilities are across the ninepositions. These across-position parameters areestimated from the data, along with all the otherparameters.The precisions of the nine distributions arealso estimated from the data. The precisions ofthe position distributions, κpp , are assumed tocome from an overarching gamma distribution,as illustrated in Figure 13.1 by the split arrow,labeled with “ . . . pp”, pointing to κpp from agamma distribution. A gamma distribution is ageneric and natural distribution for describing nonnegative values such as precisions (e.g., Kruschke,2015). A gamma distribution is conventionallyparameterized by shape and rate values, denotedin Figure 13.1 as sκpp and rκpp . We assume thatthe precisions of each position can mutually informeach other; that is, if the batting abilities ofcatchers are tightly clustered, then the battingabilities or shortstops should probably also betightly clustered, and so forth. Therefore the shapeand rate parameters of the gamma distribution arethemselves estimated.At the top level in Figure 13.1 we incorporateany prior knowledge we might have about generalproperties of batting abilities for players in theFig. 13.1 The hierarchical descriptive model for baseball batting ability. The diagram should be scanned from the bottom up. At thebottom, the number of hits by the i th player, Hi , are assumed to come from a binomial distribution with maximum value being theat-bats, ABi , and probability of getting a hit being θi . See text for further details.284new directionsKruschke, J. K. and Vanpaemel, W. (2015). Bayesian estimation in hierarchical models. In: J. R. Busemeyer, Z. Wang,J. T. Townsend, and A. Eidels (Eds.), The Oxford Handbook of Computational and Mathematical Psychology, pp. 279-299.Oxford, UK: Oxford University Press.

major leagues, such as evidence from previousseasons of play. Baseball aficionados may haveextensive prior knowledge that could be usefullyimplemented in a Bayesian model. Unlike baseball experts, we have no additional backgroundknowledge, and, therefore, we will use very vagueand noncommittal top-level prior distributions.Thus, the top-level beta distribution on the overallbatting ability is given parameter values A 1and B 1, which make it uniform over allpossible batting abilities from zero to one. Thetop-level gamma distributions (on precision, shape,and rate) are given parameter values that makethem extremely broad and noncommittal such thatthe data dominate the estimates, with minimalinfluence from the top-level prior.There are 970 parameters in the model altogether: 948 individual θi , plus μpp , κpp for eachof nine primary positions, plus μμ , κμ acrosspositions, plus sκ and rκ . The Bayesian analysisyields credible combinations of the parameters in the970-dimensional joint parameter space.We care about the parameter values because theyare meaningful. Our primary interest is in theestimates of individual batting abilities, θi , andin the position-specific batting abilities, μpp . Weare also able to examine the relative precisions ofabilities across positions to address questions suchas, Are batting abilities of catchers as variable asbatting abilities of shortstops? We will not do sohere, however.Results: Interpreting the PosteriorDistributionWe used MCMC chains with total saved lengthof 15,000 after adaptation of 1,000 steps and burnin of 1,000 steps, using 3 parallel chains called fromthe runjags package (Denwood, 2013), thinned by30 merely to keep a modest file size for the savedchain. The diagnostics (see Box 1) assured us thatthe chains were adequate to provide an accurateand high-resolution representation of the posteriordistribution. The effective sample size (ESS) for allthe reported parameters and differences exceeded6,000, with nearly all exceeding 10,000.check of robustness against changes intop-level prior constantsBecause we wanted the top-level prior distribution to be noncommittal and have minimalinfluence on the posterior distribution, we checkedwhether the choice of prior had any notable effecton the posterior. We conducted the analysis withdifferent constants in the top-level gamma distributions, to check whether they had any notableinfluence on the resulting posterior distribution.Whether all gamma distributions used shape andrate constants of 0.1 and 0.1, or 0.001 and 0.001,the results were essentially identical. The resultsreported here are for gamma constants of 0.001 and0.001.comparisons of positionsWe first consider the estimates of hitting abilityfor different positions. Figure 13.2, left side, showsthe marginal posterior distributions for the μppparameters for the positions of catcher and pitcher.The distributions show the credible values of theparameters generated by the MCMC chain. Thesemarginal distributions collapse across all otherparameters in the high-dimensional joint parameterspace. The lower-left panel in Figure 13.2 showsthe distribution of differences between catchers andpitchers. At every step in the MCMC chain, thedifference between the credible values of μcatcherand μpitcher was computed, to produce a crediblevalue for the difference. The result is 15,000credible differences (one for each step in theMCMC chain).For each marginal posterior distribution, weprovide two summaries: Its approximate mode,displayed on top, and its 95% highest density interval(HDI), shown as a black horizontal bar. A parameter value inside the HDI has higher probabilitydensity (i.e., higher credibility) than a parametervalue outside the HDI. The total probability ofparameter values within the 95% HDI is 95%.The 95% HDI ind

example uses a hierarchical extension of a cognitive process model to examine individual differences in attention allocation of people who have eating disorders. We conclude by discussing Bayesian model comparison as a case of hierarchical modeling. Key Words: Bayesian statistics, Bayesian data a

Related Documents:

Part One: Heir of Ash Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18 Chapter 19 Chapter 20 Chapter 21 Chapter 22 Chapter 23 Chapter 24 Chapter 25 Chapter 26 Chapter 27 Chapter 28 Chapter 29 Chapter 30 .

TO KILL A MOCKINGBIRD. Contents Dedication Epigraph Part One Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Part Two Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18. Chapter 19 Chapter 20 Chapter 21 Chapter 22 Chapter 23 Chapter 24 Chapter 25 Chapter 26

DEDICATION PART ONE Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 PART TWO Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18 Chapter 19 Chapter 20 Chapter 21 Chapter 22 Chapter 23 .

Computational Bayesian Statistics An Introduction M. Antónia Amaral Turkman Carlos Daniel Paulino Peter Müller. Contents Preface to the English Version viii Preface ix 1 Bayesian Inference 1 1.1 The Classical Paradigm 2 1.2 The Bayesian Paradigm 5 1.3 Bayesian Inference 8 1.3.1 Parametric Inference 8

value of the parameter remains uncertain given a nite number of observations, and Bayesian statistics uses the posterior distribution to express this uncertainty. A nonparametric Bayesian model is a Bayesian model whose parameter space has in nite dimension. To de ne a nonparametric Bayesian model, we have

Objective Bayesian estimation and hypothesis testing 3 model M z, the value 0 were used as a proxy for the unknown value of . As summarized below, point estimation, region estimation and hypothesis testing may all be appropriately described as speci c decision problems using a common prior distribution and a common loss function.

methods, can be viewed in Bayesian terms as performing standard MAP estimation using a x ed, sparsity-inducing prior. In contrast, we advocate empirical Bayesian ap-proaches such as sparse Bayesian learning (SBL), which use a parameterized prior to encourage sparsity through a process called evidence maximization. We prove several xvi

BCS Essentials Certificate in Artificial Intelligence Syllabus V1.0 BCS 2018 Page 10 of 16 Recommended Reading List Artificial Intelligence and Consciousness Title Artificial Intelligence, A Modern Approach, 3rd Edition Author Stuart Russell and Peter Norvig, Publication Date 2016, ISBN 10 1292153962