Bayesian Inference And Generative Models - ETH Z

1y ago
2.84 MB
71 Pages
Last View : 14d ago
Last Download : 4m ago
Upload by : Julius Prosser

Bayesian inference and generative modelsKlaas Enno Stephan

Lecture as part of "Methods & Models for fMRI data analysis",University of Zurich & ETH Zurich, 22 November 2016Slides with a yellow title were not covered in detail in thelecture and will not be part of the exam.With slides from and many thanks to:Kay Brodersen,Will Penny,Sudhir Shankar Raman

Why should I know about Bayesian inference?Because Bayesian principles are fundamental for statistical inference in general system identification translational neuromodeling ("computational assays")– computational psychiatry– computational neurology contemporary theories of brain function (the "Bayesian brain")– predictive coding– free energy principle– active inference

Why should I know about Bayesian inference?Because Bayesian principles are fundamental for statistical inference in general system identification translational neuromodeling ("computational assays")– computational psychiatry– computational neurology contemporary theories of brain function (the "Bayesian brain")– predictive coding– free energy principle– active inference

Bayes‘ Theoremposteriorlikelihoodpriorp( y ) p( )P( y ) p( y )evidenceReverend Thomas Bayes1702 - 1761“Bayes‘ Theorem describes, how an ideally rational personprocesses information."Wikipedia

Bayesian inference: an animation

Generative models specify a joint probability distribution over all variables (observations andparameters) require a likelihood function and a prior:p( y, m) p( y , m) p( m) p( y, m) can be used to randomly generate synthetic data (observations) by samplingfrom the prior– we can check in advance whether the model can explain certainphenomena at all model comparison based on the model evidencep( y m) p( y , m) p( m)d

Principles of Bayesian inference Formulation of a generative modellikelihood function p(y )Modelprior distribution p( ) Observation of dataMeasurementdatay Model inversion – updating one's beliefsp( y ) p( y ) p( )maximum a posteriori(MAP) estimatesmodel evidence

PriorsPriors can be of different sorts, e.g. empirical (previous data) empirical (estimated fromcurrent data using a hierarchicalmodel "empirical Bayes") uninformed principled (e.g., positivityconstraints) shrinkageExample of a shrinkage prior

Advantages of generative models data ydescribe how observed data were generatedby hidden mechanismsmechanism 1 we can check in advance whether a modelcan explain certain phenomena at all force us to think mechanistically and beexplicit about pathophysiological theories formal framework for differentialdiagnosis:statistical comparison of competinggenerative models, each of which provides adifferent explanation of measured brainactivity or clinical symptoms.mechanism N

A generative modelling framework for fMRI & EEG:Dynamic causal modeling (DCM)dwMRIEEG, MEGfMRIForward model:Predicting measuredactivityModel inversion:Estimating neuronalmechanismsy g (x, ) dx f ( x, u , ) dtFriston et al. 2003, NeuroImageStephan et al. 2009, NeuroImage

y yBOLDy ationmodulatoryinput u2(t)tNeural state vityx1(t)drivinginput u1(t)ymodulation ofconnectivitydirect inputsx ( A u j B( j ) ) x Cu x x x u j xA B( j) x C u

Modulatory inputDriving inputu2(t)tNeuronal state equationu1(t)t𝒙𝟐 (𝒕)𝒙 𝑨 𝒖𝒋 𝑩𝒙 𝑪𝒖endogenousconnectivity𝒙𝟑 (𝒕)𝒙𝟏 (𝒕)modulation ofconnectivitydirect inputsLocal hemodynamicstate equations𝒔 𝒙 𝜿𝒔 𝜸 𝒇 𝟏 vasodilatorysignal and flow𝒇 𝒔induction (rCBF)𝒇𝒋 𝒙 𝒙 𝒙 𝒖𝒋 𝒙𝑨 𝑩(𝒋)𝑪 𝒙 𝒖Neuronal states𝒙𝒊 (𝒕)Hemodynamic modelBalloon model𝝉𝝂 𝒇 𝝂𝟏/𝜶𝝉𝒒 𝒇𝑬(𝒇, 𝑬𝟎 )/𝑬𝟎 𝝂𝟏/𝜶 𝒒/𝝂Changes in volume (𝝂)and dHb (𝒒)BOLD signaly(t)𝝂𝒊 (𝒕) and 𝒒𝒊 (𝒕)BOLD signal change equation𝒚 𝑽𝟎 𝒌𝟏 𝟏 𝒒 𝒌𝟐 𝟏 𝒒 𝒌𝟑 𝟏 𝝂𝝂with 𝒌𝟏 𝟒. 𝟑𝝑𝟎 𝑬𝟎 𝑻𝑬, 𝒌𝟐 𝜺𝒓𝟎 𝑬𝟎 𝑻𝑬, 𝒌𝟑 𝟏 𝜺 𝒆Stephan et al. 2015,Neuron

Neural population . signal change (%)210Nonlinear Dynamic Causal Model for fMRImn dx (i )( j) A ui B x j D x Cudt i 1j 1 040506070809010043210-13210Stephan et al. 2008, NeuroImage

Why should I know about Bayesian inference?Because Bayesian principles are fundamental for statistical inference in general system identification translational neuromodeling ("computational assays")– computational psychiatry– computational neurology contemporary theories of brain function (the "Bayesian brain")– predictive coding– free energy principle– active inference

Generative models as "computational assays"p ( y , m) p ( m)p ( y, m)p ( y , m) p ( m)p ( y, m)

Differential diagnosis based on generative models ofdisease symptomsSYMPTOM(behaviouror physiology)HYPOTHETICALMECHANISMyp( y , mk )p(mk y )m1 . mk . mKp( y mk ) p(mk )p(mk y) p( y mk ) p(mk )kStephan et al. 2016, NeuroImage

Computational assays:Translational NeuromodelingModels of disease mechanismsdx f ( x, u , ) dt Application to brain activity andbehaviour of individual patientsIndividual treatment predictionDetecting physiological subgroups(based on inferred mechanisms) disease mechanism A disease mechanism B disease mechanism CStephan et al. 2015, Neuron

Perception inversion of a hierarchical generative modelenvironm. statesneuronal statesothers' mental statesbodily statesforward modelp ( y x, m) p ( x m)p ( x y , m)perception

Example: free-energy principle and active inferencesensations – predictionsPrediction errorChangesensory inputChangepredictionsActionPerceptionMaximizing the evidence (of the brain's generative model) minimizing the surprise about the data (sensory inputs).Friston et al. 2006,J Physiol Paris

How is the posterior computed how is a generative model inverted?Bayesian InferenceAnalytical solutionsApproximate InferenceVariationalBayesMCMCSampling

How is the posterior computed how is a generative model inverted? compute the posterior analytically– requires conjugate priors– even then often difficult to derive an analytical solution variational Bayes (VB)– often hard work to derive, but fast to compute– cave: local minima, potentially inaccurate approximations sampling methods (MCMC)– guaranteed to be accurate in theory (for infinite computation time)– but may require very long run time in practice– convergence difficult to prove

Conjugate priorsIf the posterior p(θ x) is in the same family as the prior p(θ), the prior andposterior are called "conjugate distributions", and the prior is called a "conjugateprior" for the likelihood function.p( y ) p( )p( y ) p( y ) analytical expression for posterior examples (likelihood-prior): Normal-NormalNormal-inverse GammaBinomial-BetaMultinomial-Dirichletsame form

Posterior mean & variance of univariate GaussiansLikelihood & Priorp( y ) N ( , )2ey p( ) N ( p , p2 ) Posterior2Posterior: p( y ) N ( , )111 2 e2 p2 11 2 2 p p e2Posterior mean variance-weighted combination ofprior mean and data mean LikelihoodPrior p

Same thing – but expressed as precision weightingLikelihood & priorp( y ) N ( , ) 1ey p( ) N ( p , p 1 ) PosteriorPosterior: p( y ) N ( , 1 ) e p p e p Relative precision weighting LikelihoodPrior p

Variational Bayes (VB)Idea: find an approximate density 𝑞(𝜃) that is maximally similar to the trueposterior 𝑝 𝜃 𝑦 .This is often done by assuming a particular form for 𝑞 (fixed form VB) andthen optimizing its sufficient statistics.trueposteriorhypothesisclass𝑝 𝜃𝑦divergenceKL 𝑞 𝑝best proxy𝑞 𝜃

Kullback–Leibler (KL) divergence non-symmetric measureof the differencebetween two probabilitydistributions P and Q DKL(P‖Q) a measure ofthe information lost whenQ is used to approximateP: the expected numberof extra bits required tocode samples from Pwhen using a codeoptimized for Q, ratherthan using the true codeoptimized for P.

Variational calculusStandard calculusNewton, Leibniz, andothers functions𝑓: 𝑥 𝑓 𝑥 derivativesd𝑓d𝑥Example: maximizethe likelihoodexpression 𝑝 𝑦 𝜃w.r.t. 𝜃VariationalcalculusEuler, Lagrange, andothers functionals𝐹: 𝑓 𝐹 𝑓d𝐹 derivativesd𝑓Example: maximizethe entropy 𝐻 𝑝w.r.t. a probabilitydistribution 𝑝 𝑥Leonhard Euler(1707 – 1783)Swiss mathematician,‘Elementa CalculiVariationum’

Variational Bayesln 𝑝(𝑦) KL[𝑞 𝑝 𝐹 𝑞, 𝑦divergence 0ln 𝑝 𝑦ln 𝑝 𝑦KL[𝑞 𝑝𝐹 𝑞, 𝑦neg. freeenergyKL[𝑞 𝑝(unknown) (easy to evaluatefor a given 𝑞)𝐹 𝑞 is a functional wrt. theapproximate posterior 𝑞 𝜃 . 𝐹 𝑞, 𝑦Maximizing 𝐹 𝑞, 𝑦 is equivalent to: minimizing KL[𝑞 𝑝 tightening 𝐹 𝑞, 𝑦 as a lowerbound to the log model evidenceWhen 𝐹 𝑞, 𝑦 is maximized, 𝑞 𝜃 isour best estimate of the posterior.initialization convergence

Derivation of the (negative) free energy approximation See whiteboard! (or Appendix to Stephan et al. 2007, NeuroImage 38: 387-401)

Mean field assumptionFactorize the approximateposterior 𝑞 𝜃 into independentpartitions:𝑞 𝜃 𝑞 𝜃1𝑞 𝜃2𝑞𝑖 𝜃𝑖𝑖where 𝑞𝑖 𝜃𝑖 is the approximateposterior for the 𝑖 th subset ofparameters.For example, split parametersand hyperparameters:p , y q , q q 𝜃2𝜃1Jean Daunizeau, jdaunize/presentations/Bayes2.pdf

VB in a nutshell (under mean-field approximation) Neg. free-energyapprox. to modelevidence. Mean field approx. Maximise neg. freeenergy wrt. q minimise divergence,by maximisingvariational energiesln p y m F KL q , , p , y F ln p y, , q KL q , , p , m p , y q , q q q exp I exp ln p y, , q exp I exp ln p y, , Iterative updating of sufficient statistics of approx. posteriors bygradient ascent. q ( ) q ( )

VB (under mean-field assumption) in more detail

VB (under mean-field assumption) in more detail

Model comparison and selectionGiven competing hypotheseson structure & functionalmechanisms of a system, whichmodel is the best?Which model represents thebest balance between modelfit and model complexity?For which model m does p(y m)become maximal?Pitt & Miyung (2002) TICS

Bayesian model selection (BMS)Model evidence (marginal likelihood):p(y m)p( y m) p( y , m) p( m) d Gharamani, 2004yaccounts for both accuracyand complexity of the modelall possible datasets“If I randomly sampled from myprior and plugged the resultingvalue into the likelihoodfunction, how close would thepredicted data be – on average– to my observed data?”Various approximations, e.g.:- negative free energy, AIC, BICMcKay 1992, Neural Comput.Penny et al. 2004a, NeuroImage

Model space (hypothesis set) MModel space M is defined by prior on models.Usual choice: flat prior over a small set of models. 1/ M if m Mp ( m) 0 if m MIn this case, the posterior probability of model i is:p(mi y ) p( y mi ) p(mi ) Mp ( y mi )M p( y m ) p (m ) p ( y m )j 1jjj 1j

Differential diagnosis based on generative models ofdisease symptomsSYMPTOM(behaviouror physiology)HYPOTHETICALMECHANISMyp( y , mk )p(mk y )m1 . mk . mKp( y mk ) p(mk )p(mk y) p( y mk ) p(mk )kStephan et al. 2016, NeuroImage

Approximations to the model evidenceLogarithm is amonotonic functionMaximizing log model evidence Maximizing model evidenceLog model evidence balance between fit and complexitylog p( y m) accuracy ( m) complexity ( m) log p( y , m) complexity ( m)No. ofparametersAkaike Information Criterion:Bayesian Information Criterion:AIC log p( y , m) ppBIC log p( y , m) log N2No. ofdata points

The (negative) free energy approximation FF is a lower bound on the log model evidence:log p( y m) F KL q , p y, m Like AIC/BIC, F is an accuracy/complexity tradeoff:F log p( y , m) KL q , p m accuracycomplexityln 𝑝 𝑦 𝑚KL[𝑞 𝑝𝐹 𝑞, 𝑦

The (negative) free energy approximation Log evidence is thus expected log likelihood (wrt. q) plus 2 KL's:log p( y m) log p( y , m) KL q , p m KL q , p y, m F log p( y m) KL q , p y, m log p( y , m) KL q , p m accuracycomplexity

The complexity term in F In contrast to AIC & BIC, the complexity term of the negativefree energy F accounts for parameter interdependencies. UnderGaussian assumptions about the posterior (Laplaceapproximation):KL q ( ), p ( m) 111T 1 ln C ln C y y C y 222 The complexity term of F is higher– the more independent the prior parameters ( effective DFs)– the more dependent the posterior parameters– the more the posterior mean deviates from the prior mean

Bayes factorsTo compare two models, we could just compare their logevidences.But: the log evidence is just some number – not very intuitive!A more intuitive interpretation of model comparisons is madepossible by Bayes factors:positive value, [0; [p( y m1 )B12 p( y m2 )Kass & Raftery classification:Kass & Raftery 1995, J. Am. Stat. Assoc.B12p(m1 y)Evidence1 to 350-75%weak3 to 2075-95%positive20 to 15095-99%strong 150 99%Very strong

Fixed effects BMS at group levelGroup Bayes factor (GBF) for 1.K subjects:GBFij BFij( k )kAverage Bayes factor (ABF):ABFij K BF(k )ijkProblems:- blind with regard to group heterogeneity- sensitive to outliers

Random effects BMS for heterogeneous groups r Dir(r; )mk p(mk p)mk p(mk p)mk p(mk p)m1 Mult (m;1, r )y1 p( y1 m1 )y1 p( y1 m1 )y2 p( y2 m2 )y1 p( y1 m1 )Dirichlet parameters “occurrences” of models in the populationDirichlet distribution of model probabilities rMultinomial distribution of model labels mModel inversionby VariationalBayes or MCMCMeasured data yStephan et al. 2009, NeuroImage

Random effects BMS p(r ) Dir r , r Dir(r; )Z ( ) k k1Z k 1r kk k k mk p(mk p)mk p(mk p)mk p(mk p)m1 Mult (m;1, r )p( mn r ) rkmnky1 p( y1 m1 )y1 p( y1 m1 )y2 p( y2 m2 )y1 p( y1 m1 )p( yn mnk ) p( y ) p( mnk )d kStephan et al. 2009, NeuroImage

Write down joint probabilityand take the logp y, r, m p y m p m r p( r 0 ) p( r 0 ) p yn mn p mn r n 1 0 k 1 mnk rpy mr n n k k Z 0 knk mnk 0 k 1 pymrr n nk k k Z 0 n k 1 ln p y, r, m ln Z 0 0k 1 ln rk mnk log p yn mnk ln rk nk

Mean field approx. Maximise neg. freeenergy wrt. q minimise divergence,by maximisingvariational energiesq ( r , m) q ( r ) q ( m)q(r ) exp I r q(m) exp I m I r log p y, r , m q(m)I m log p y, r , m q(r )

Iterative updating of sufficient statistics of approx.posteriors 0 0 [1,,1]Until convergence unk exp ln p( yn mnk ) k k k unkg nk unkg q(m 1)knknk k g nkn 0 end k g nknour (normalized)posterior belief thatmodel k generated thedata from subject nexpected number ofsubjects whose data webelieve were generatedby model k

Four equivalent options for reporting model ranking byrandom effects BMS 1. Dirichlet parameter estimates k ( 1 K )2. expected posterior probability ofobtaining the k-th model for anyrandomly selected subjectrk3. exceedance probability that aparticular model k is more likely thanany other model (of the K modelstested), given the group data k {1.K }, j {1.K j k }: k p( rk rj y; )4. protected exceedance probability:see belowq

Example: Hemispheric interactions during visionLDm2LD LVFm1MOGFGMOGFGLD RVFMOGLD LVFLDLDLGLVFstim.RVF LD els:-35MOGFGLDLGLGRVFstim.FG-10-5Log model evidence differences05Stephan et al. 2003, ScienceStephan et al. 2007, J. Neurosci.

p(r 0.5 y) 0.997154.54p r1 r2 99.7%m23.5m1p(r 1 y)32.521.5r1 84.3%r2 15.7%0.500 1 11.8 2 et al. 2009a, NeuroImage0.610.70.80.91

Example: Synaesthesia “projectors” experiencecolor externally colocalizedwith a presented grapheme “associators” report aninternally evokedassociation across all subjects: noevidence for either model but BMS results mapprecisely onto projectors(bottom-up mechanisms)and associators (top-down)van Leeuwen et al. 2011, J. Neurosci.

Overfitting at the level of models #models risk of overfitting solutions:– regularisation: definition of modelspace choosing priors p(m)– family-level BMS– Bayesian model averaging (BMA)posterior model probability:p m y p y m p ( m) p y m p ( m)mBMA:p y p y, m p m y m

Model space partitioning: comparing model families partitioning model space into K subsetsor families: pooling information over all models inthese subsets allows one to computethe probability of a model family, giventhe data effectively removes uncertainty aboutany aspect of model structure, otherthan the attribute of interest (whichdefines the partition)Stephan et al. 2009, NeuroImagePenny et al. 2010, PLoS Comput. Biol.M f1 ,., f K p fk

Family-level inference: fixed effects We wish to have a uniform prior at thefamily level: This is related to the model level viathe sum of the priors on models: Hence the uniform prior at the familylevel is: The probability of each family is thenobtained by summing the posteriorprobabilities of the models it includes:Penny et al. 2010, PLoS Comput. Biol.1p fk Kp fk p ( m)m f k1 m f k : p m K fkp f k y1. N p m y m f k1. N

Family-level inference: random effects The frequency of a family in thepopulation is given by: In RFX-BMS, this follows a Dirichletdistribution, with a uniform prior on theparameters (see above). A uniform prior over familyprobabilities can be obtained bysetting:Stephan et al. 2009, NeuroImagePenny et al. 2010, PLoS Comput. rm f kmp ( s ) Dir 1 m f k : prior (m) fk

Family-level inference: random effects – a special case When the families are of equal size, one can simply sum the posterior modelprobabilities within families by exploiting the agglomerative property of theDirichlet distribution: r1 , r2 ,., rK Dir 1 , 2 ,., K r1* *r,r k 2 k N1*r,.,r k J k N 2 rk N Jk * ** Dir 1 k , 2 k ,., J k k N1k N 2k N J Stephan et al. 2009, NeuroImage

nonlinearModel space partitioning:comparing model familieslogGBF80Summed log evidence (rel. to RBML)FFXlinear604020p(r 0.5 y) 0.986105CBMN CBMN(ε) RBMN RBMN(ε) CBML CBML(ε) RBML RBML(ε)RFX 124.5104m2m1alpha83.5p r1 r2 98.6%63p(r 1 y)422.520CBMN CBMN(ε) RBMN RBMN(ε) CBML CBML(ε) RBML RBML(ε)1.5 1161412m1m248 1* kr1 73.5%r2 4rk 2* k2k 50nonlinear modelslinear modelsStephan et al. 2009, NeuroImage

Mismatch negativity (MMN) elicited by surprising stimuli(scales with unpredictability)MMNIFGpredictionerror in schizophrenic patients classical interpretations:– pre-attentive changedetection– neuronal adaptation current theories:– reflection of (hierarchical)Bayesian viantsGarrido et al. 2009, Clin. Neurophysiol.

Mismatch negativity(MMN)standardsdeviantsERPs in schizophrenic patientsMMN Highly relevant forcomputational assays ofglutamatergic and cholinergictransmission: NMDAR ACh (nicotinic &muscarinic)– 5HT– DASchmidt, Diaconescu et al. 2013, Cereb. CortexMMNplaceboketamine

Modelling Trial-by-Trial Changes of the MismatchNegativity (MMN)Lieder et al. 2013, PLoS Comput. Biol.

MMN model comparisonat multiple levels ComparingMMNtheories Comparingindividualmodels ComparingmodelingframeworksLieder et al. 2013, PLoS Comput. Biol.

Bayesian Model Averaging (BMA) abandons dependence of parameterinference on a single model and takes intoaccount model uncertainty uses the entire model space considered (oran optimal family of models) averages parameter estimates, weightedby posterior model probabilities represents a particularly useful alternative– when none of the models (or modelsubspaces) considered clearlyoutperforms all others– when comparing groups for which theoptimal model differssingle-subject BMA:p y p y, m p m y mgroup-level BMA:p n y1. N p n yn , m p m y1. N mNB: p(m y1.N) can be obtainedby either FFX or RFX BMSPenny et al. 2010, PLoS Comput. Biol.

Prefrontal-parietal connectivity duringworking memory in schizophrenia 17 at-risk mentalstate (ARMS)individuals 21 first-episodepatients(13 non-treated) 20 controlsSchmidt et al. 2013, JAMA Psychiatry

BMS results for all groupsSchmidt et al. 2013, JAMA Psychiatry

BMA results: PFC PPC connectivity17 ARMS, 21 first-episode (13 non-treated),20 controlsSchmidt et al. 2013, JAMA Psychiatry

Protected exceedance probability:Using BMA to protect against chance findings EPs express our confidence that the posterior probabilities of models aredifferent – under the hypothesis H1 that models differ in probability: rk 1/K does not account for possibility "null hypothesis" H0: rk 1/K Bayesian omnibus risk (BOR) of wrongly accepting H1 over H0: protected EP: Bayesian model averaging over H0 and H1:Rigoux et al. 2014, NeuroImage

definition of model spaceinference on model structure or inference on model parameters?inference onindividual models or model space partition?optimal model structure assumedto be identical across subjects?yesFFX BMScomparison of modelfamilies usingFFX or RFX BMSinference onparameters of an optimal model or parameters of all models?optimal model structure assumedto be identical across subjects?yesnoFFX BMSRFX BMSnoRFX BMSStephan et al. 2010, NeuroImageFFX analysis ofparameter estimates(e.g. BPA)RFX analysis ofparameter estimates(e.g. t-test, ANOVA)BMA

Further reading Penny WD, Stephan KE, Mechelli A, Friston KJ (2004) Comparing dynamic causal models. NeuroImage22:1157-1172. Penny WD, Stephan KE, Daunizeau J, Joao M, Friston K, Schofield T, Leff AP (2010) Comparing Families ofDynamic Causal Models. PLoS Computational Biology 6: e1000709. Penny WD (2012) Comparing dynamic causal models using AIC, BIC and free energy. Neuroimage 59: 319330. Rigoux L, Stephan KE, Friston KJ, Daunizeau J (2014) Bayesian model selection for group studies – revisited.NeuroImage 84: 971-985. Stephan KE, Weiskopf N, Drysdale PM, Robinson PA, Friston KJ (2007) Comparing hemodynamic models withDCM. NeuroImage 38:387-401. Stephan KE, Penny WD, Daunizeau J, Moran RJ, Friston KJ (2009) Bayesian model selection for groupstudies. NeuroImage 46:1004-1017. Stephan KE, Penny WD, Moran RJ, den Ouden HEM, Daunizeau J, Friston KJ (2010) Ten simple rules forDynamic Causal Modelling. NeuroImage 49: 3099-3109. Stephan KE, Iglesias S, Heinzle J, Diaconescu AO (2015) Translational Perspectives for ComputationalNeuroimaging. Neuron 87: 716-732. Stephan KE, Schlagenhauf F, Huys QJM, Raman S, Aponte EA, Brodersen KH, Rigoux L, Moran RJ,Daunizeau J, Dolan RJ, Friston KJ, Heinz A (2016) Computational Neuroimaging Strategies for Single PatientPredictions. NeuroImage, in press. DOI: 10.1016/j.neuroimage.2016.06.038

Thank you

Why should I know about Bayesian inference? Because Bayesian principles are fundamental for statistical inference in general system identification translational neuromodeling ("computational assays") - computational psychiatry - computational neurology

Related Documents:

1 Generative vs Discriminative Generally, there are two wide classes of Machine Learning models: Generative Models and Discriminative Models. Discriminative models aim to come up with a \good separator". Generative Models aim to estimate densities to the training data. Generative Models ass

Combining information theoretic kernels with generative embeddings . images, sequences) use generative models in a standard Bayesian framework. To exploit the state-of-the-art performance of discriminative learning, while also taking advantage of generative models of the data, generative

Computational Bayesian Statistics An Introduction M. Antónia Amaral Turkman Carlos Daniel Paulino Peter Müller. Contents Preface to the English Version viii Preface ix 1 Bayesian Inference 1 1.1 The Classical Paradigm 2 1.2 The Bayesian Paradigm 5 1.3 Bayesian Inference 8 1.3.1 Parametric Inference 8

Bayesian" model, that a combination of analytic calculation and straightforward, practically e–-cient, approximation can ofier state-of-the-art results. 2 From Least-Squares to Bayesian Inference We introduce the methodology of Bayesian inference by considering an example prediction (re-gression) problem.

ple generative models based on different feature detectors and different numbers of parts) into a single classifier. Section 8 discusses the main results and observa-tions of this work. 2 Generative Models In this section we briefly review a class of generative models which will be used in conjunction with

Bayesian Modeling Using WinBUGS, by Ioannis Ntzoufras, New York: Wiley, 2009. 2 PuBH 7440: Introduction to Bayesian Inference. Textbooks for this course Other books of interest (cont’d): Bayesian Comp

Comparison of frequentist and Bayesian inference. Class 20, 18.05 Jeremy Orloff and Jonathan Bloom. 1 Learning Goals. 1. Be able to explain the difference between the p-value and a posterior probability to a doctor. 2 Introduction. We have now learned about two schools of statistical inference: Bayesian and frequentist.

300-a02 abp enterprise sdn bhd. 7th floor menara lien hee no, 8 jalan tangung, 47700 petaling jaya. selangor p. j john c.o.d. 03-7804448 03-7804444 300-c01 control manufacturing 400-2 (tingkat satu) batu 1/2, jalan pahang, 51000 kuala lumpur kl lal net 60 days 03-6632599 03-6632588