Machine Learning - B. Supervised Learning: Nonlinear Models B.5. A .

1y ago

15 Views

2 Downloads

954.69 KB

50 Pages

Last View : 1d ago

Last Download : 3m ago

Upload by : Ciara Libby

Report this link

Download PDF

Transcription

Machine Learning Machine Learning B. Supervised Learning: Nonlinear Models B.5. A First Look at Bayesian and Markov Networks Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 1 / 32

Machine Learning Syllabus Fri. 25.10. (1) 0. Introduction Fri. Fri. Fri. Fri. 1.11. 8.11. 15.11. 22.11. (2) (3) (4) (5) A. Supervised Learning: Linear Models & Fundamentals A.1 Linear Regression A.2 Linear Classification A.3 Regularization A.4 High-dimensional Data Fri. Fri. Fri. Fri. 29.11. 6.12. 13.12. 20.12. (6) (7) (8) (9) Fri. 10.1. (10) B. Supervised Learning: Nonlinear Models B.1 Nearest-Neighbor Models B.2 Neural Networks B.3 Decision Trees B.4 Support Vector Machines — Christmas Break — B.5 A First Look at Bayesian and Markov Networks Fri. 17.1. Fri. 24.1. Fri. 31.1. Fri. 7.2. (11) (12) (13) (14) C. Unsupervised Learning C.1 Clustering C.2 Dimensionality Reduction C.3 Frequent Pattern Mining Q&A Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 1 / 32

Machine Learning Outline 1. Introduction 2. Examples 3. Inference 4. Learning Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 1 / 32

Machine Learning 1. Introduction Outline 1. Introduction 2. Examples 3. Inference 4. Learning Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 1 / 32

Machine Learning 1. Introduction Joint Distribution x1 : the sun shines p(x1 false) 0.25 p(x1 true) 0.75 p(x1 ) false true (0.25, 0.75) 0.25 0.75 Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 1 / 32

Machine Learning 1. Introduction Joint Distribution x1 : the sun shines p(x1 false) 0.25 p(x1 true) 0.75 p(x1 ) false true (0.25, 0.75) 0.25 0.75 p(x2 ) false true (0.67, 0.33) 0.67 0.33 x2 : it rains p(x2 false) 0.67 p(x2 true) 0.33 Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 1 / 32

Machine Learning 1. Introduction Joint Distribution x1 : the sun shines p(x1 false) 0.25 p(x1 true) 0.75 p(x1 ) false true (0.25, 0.75) 0.25 0.75 p(x2 ) false true (0.67, 0.33) 0.67 0.33 x2 : it rains p(x2 false) 0.67 p(x2 true) 0.33 joint distribution: p(x1 p(x1 p(x1 p(x1 false, x2 false) false, x2 true) true, x2 false) true, x2 true) 0.07 0.18 0.6 0.15 p(x1 , x2 ) x1 x2 false true false 0.07 0.18 true 0.6 0.15 Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 1 / 32

Machine Learning 1. Introduction Joint Distribution x1 : the sun shines p(x1 false) 0.25 p(x1 true) 0.75 p(x1 ) false true (0.25, 0.75) 0.25 0.75 p(x2 ) false true (0.67, 0.33) 0.67 0.33 x2 : it rains p(x2 false) 0.67 p(x2 true) 0.33 joint distribution: p(x1 , x2 ) x1 x2 false true 0.07 0.18 0.6 0.15 false 0.07 0.18 true 0.6 0.15 Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 1 / 32

Machine Learning 1. Introduction Independence for two variables: p(x, y ) p(x) · p(y ) for two variable subsets: p(x1 , x2 , . . . , xM ) p(xI ) · p(xJ ), I , J {1, . . . , M}, I J Note: xI : {xm1 , xm2 , . . . , xmK } for I : {m1 , m2 , . . . , mK }. Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 2 / 32

Machine Learning 1. Introduction Independence for two variables: p(x, y ) p(x) · p(y ) for two variable subsets: p(x1 , x2 , . . . , xM ) p(xI ) · p(xJ ), I , J {1, . . . , M}, I J Examples: 0.07 0.18 0.6 0.15 not independent 0.17 0.08 0.5 0.25 independent Note: xI : {xm1 , xm2 , . . . , xmK } for I : {m1 , m2 , . . . , mK }. Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 2 / 32

Machine Learning 1. Introduction Chain Rule p(x1 , x2 , . . . , xM ) p(x1 ) · p(x2 x1 ) · p(x3 x1 , x2 ) . . · p(xM x1 , x2 , . . . , xM 1 ) Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 3 / 32

Machine Learning 1. Introduction Chain Rule p(x1 , x2 , . . . , xM ) p(x1 ) · p(x2 x1 ) · p(x3 x1 , x2 ) . . · p(xM x1 , x2 , . . . , xM 1 ) Examples: 0.07 0.18 0.6 0.15 (0.25, 0.75) · 0.28 0.72 0.8 0.2 Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 3 / 32

Machine Learning 1. Introduction Chain Rule p(x1 , x2 , . . . , xM ) p(x1 ) · p(x2 x1 ) · p(x3 x1 , x2 ) . . · p(xM x1 , x2 , . . . , xM 1 ) Examples: 0.17 0.08 0.5 0.25 (0.25, 0.75) · 0.67 0.33 0.67 0.33 Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 3 / 32

Machine Learning 1. Introduction Conditional Independence two variables x, y are independent conditionally on variable z: x y z : p(x, y z) p(x z) · p(y z) two variable sets are independent conditionally on variables z1 , . . . , zK : {x1 , . . . , xI } {y1 , . . . , yJ } {z1 , . . . , zK } : p(x1 , . . . , xI , y1 , . . . , yJ z1 , . . . , zK ) p(x1 , . . . , xI z1 , . . . , zK ) · p(y1 , . . . , yJ z1 , . . . , zK ) Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 4 / 32

Machine Learning 1. Introduction Conditional Independence / Example Example: xn {x1 , . . . , xn 2 } xn 1 n (Markov property) p(x1 , . . . , xN ) p(x1 )p(x2 x1 )p(x3 x2 ) · · · p(xM xM 1 ) Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 5 / 32

Machine Learning 1. Introduction Graphical Models I represent joint distributions of variables by graphs I by directed graphs: Bayesian networks I by undirected graphs: Markov networks I by mixed directed/undirected graphs. I nodes represent random variables I absent edges represent conditional independence Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 6 / 32

Machine Learning 1. Introduction Directed Graph Terminology I I I I I I directed graph: G : (V , E ), E V V I V set called nodes / vertices I E called edges, (v , w ) E edge from v to w . adjacency matrix A {0, 1}N N Av ,w : δ((v , w ) E ), v , w {1, . . . , N}, N : V parents: pa(v ) : {w V (w , v ) E } children: ch(v ) : {w V (v , w ) E } 1 neighbors: nbr(v ) : pa(v ) ch(v ) family: fam(v ) : pa(v ) {v } I root: v without parents. I leaf: v without children. Note: δ(P) : 1 if proposition P is true, : 0 otherwise. 2 3 4 5 [Murphy, 2012, fig. 10.1a Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 7 / 32

Machine Learning 1. Introduction Directed Graph Terminology I path: p V : I I I I I I S M N V M: p (p1 , . . . , pM ), pm V (pm , pm 1 ) E for all m. I length p : M I starts at p1 I ends at pM I paths G : {p V (pm , pm 1 ) E I v w : exists path from v to w , i.e., p G : p1 v , p p w . ancestors: anc(v ) : {w V w descendants: desc(v ) : {w V v in-degree pa(v ) out-degree ch(v ) degree nbr(v ) Note: V : S m 1, . . . , p 1}. M N V M finite V -sequences. v} 1 w} 2 3 4 5 [Murphy, 2012, fig. 10.1a Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 8 / 32

Machine Learning 1. Introduction Directed Graph Terminology I cycle/loop at v : v I I self loop: (v , v ) E directed acyclic graph / DAG: I I v directed graph without cycles. topological ordering: I numbering of the nodes s.t. all nodes have lower number than their 1 children. I exists for DAGs. 2 3 4 5 [Murphy, 2012, fig. 10.1a Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 9 / 32

Machine Learning 1. Introduction Bayesian Networks / Directed Graphical Models A Bayesian network (aka directed graphical model) is a set of conditional probability distributions/densities (CPDs) p(xm xctxt(m) ), m {1, . . . , M} s.t. the graph defined by V : {1, . . . , M} E : {(n, m) m V , n ctxt(m)}, i.e., pa(m) : ctxt(m) is a DAG. A Bayesian network defines a factorization of the joint distribution p(x1 , . . . , xM ) M Y m 1 p(xm xpa(m) ) Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 10 / 32

Machine Learning 1. Introduction Bayesian Networks / Example For the DAG below, p(x1 , x2 , x3 , x4 , x5 ) p(x1 ) p(x2 x1 ) p(x3 x1 ) p(x4 x2 , x3 ) p(x5 x3 ) 1 2 3 4 5 [Murphy, 2012, fig. 10.1a Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 11 / 32

Machine Learning 1. Introduction Bayesian Networks / Example For the DAG below, p(x1 , x2 , x3 , x4 , x5 ) p(x1 ) p(x2 x1 ) p(x3 x1 ) p(x4 x2 , x3 ) p(x5 x3 ) If I all variables are binary and all CPDs given as conditional probability tables (CPTs), then the BN is defined by the following 5 CPTs: I x1 0 1 x2 0 1 . . x1 0 . . 1 . . x3 0 1 x1 0 . . x5 0 1 x3 0 . . 1 . . 1 2 x4 x2 x3 0 1 0 0 . . 1 1 . . 0 . . 1 . . 1 . . 3 4 5 [Murphy, 2012, fig. 10.1a Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 11 / 32

Machine Learning 2. Examples Outline 1. Introduction 2. Examples 3. Inference 4. Learning Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 12 / 32

Machine Learning 2. Examples Naive Bayes Classifier y x1 x2 x3 x4 x5 p(x1 , . . . , xM , y ) p(y )p(x1 y )p(x2 y ) · · · p(xM y ) M Y p(y ) m 1 more powerful generalization: tree-augmented naive Bayes: p(xm y ) y x3 x1 x4 x5 x2 Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 12 / 32

Machine Learning 2. Examples Medical Diagnosis y1 x1 y2 x2 x3 y3 x4 p(x1 , . . . , xM , y1 , . . . , yT ) diseases / causes x5 T Y t 1 I I I symptoms M Y p(yt ) m 1 p(xm ypa(m) ) bipartite graph predictor variables x1 , . . . , xM (symptoms) target variables y1 , . . . , yT (diseases / causes) I I multi-label ( Naive Bayes: single-label) y ’s also could be hidden Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 13 / 32

Machine Learning 2. Examples Markov Models first order: p(x1 , . . . , xM ) p(x1 )p(x2 x1 )p(x3 x2 ) · · · p(xM xM 1 ) p(x1 ) M 1 Y m 1 x1 x2 p(xm 1 xm ) x3 ··· [Murphy, 2012, fig. 10.3a Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 14 / 32

Machine Learning 2. Examples Markov Models / Second Order second order: p(x1 , . . . , xM ) p(x1 , x2 )p(x3 x1 , x2 )p(x4 x2 , x3 ) · · · p(xM xM 2 , xM 1 ) p(x1 , x2 ) M 1 Y m 2 p(xm 1 xm 1 , xm ) ··· x1 x2 x3 x4 [Murphy, 2012, fig. 10.3b Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 15 / 32

Machine Learning 2. Examples Hidden Markov Models I observed variables x1 , . . . , xM I hidden variables z1 , . . . , zM p(x1 , . . . , xM , z1 , . . . , zM ) p(z1 ) M 1 Y m 1 I transition model p(zm 1 zm ) I observation model p(xm zm ) p(zm 1 zm ) z1 z2 zT x1 x2 xT M Y m 1 p(xm zm ) [Murphy, 2012, fig. 10.4] Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 16 / 32

Machine Learning 3. Inference Outline 1. Introduction 2. Examples 3. Inference 4. Learning Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 17 / 32

Machine Learning 3. Inference The Probabilistic Inference Problem Given I a Bayesian network model θ : G (V , E ), I a query consisting of I a set X : {x1 , . . . , xM } V of predictor variables (aka observed, visible variables) I with a value vm for each xm (m 1, . . . , M) and I a set Y : {y1 , . . . , yT } V of target variables (aka query variables), with X Y , compute p(Y X v ; θ) : p(y1 , . . . , yT x1 v1 , x2 v2 , . . . , xM vM ; θ) (p(y1 w1 , . . . , yT wT x1 v1 , x2 v2 , . . . , xM vM ; θ))w1 ,.,wT Variables that are neither predictor variables nor target variables are called nuisance variables. Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 17 / 32

Machine Learning 3. Inference Inference Without Nuisance Variables . Without nuisance variables: V X Y def p(Y X v ; θ) p(X v , Y ; θ) p(X v , Y ; θ) P p(X v ; θ) w p(X v , Y w ; θ) I first, clamp predictors X to their observed values v , I then, normalize p(X v , Y ; θ) to sum to 1 (over Y ). I p(X v ; θ) likelihood of the data / probability of evidence is a constant. Note: Summation over w is over all possible values of variables Y . Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 18 / 32

Machine Learning 3. Inference Inference With Nuisance Variables ). Nuisance variables: Z : {z1 , . . . , zK } : V \ (X Y 1. add to target variables 2. answer resulting query without nuisance variables: p(Y , Z X ). 3. marginalize out nuisance variables: p(Y X v ; θ) marginalization X u p(Y , Z u X v ; θ) Note: Summation over u is over all possible values of variables Z . Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 19 / 32

Machine Learning 3. Inference Inference With Nuisance Variables ). Nuisance variables: Z : {z1 , . . . , zK } : V \ (X Y 1. add to target variables 2. answer resulting query without nuisance variables: p(Y , Z X ). 3. marginalize out nuisance variables: p(Y X v ; θ) marginalization X u p(Y , Z u X v ; θ) Caveat: This is a naive algorithm never used in practice. See BN lecture for practically useful BN inference algorithms. Note: Summation over u is over all possible values of variables Z . Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 19 / 32

Machine Learning 3. Inference Complexity of Inference I I for simplicity assume I all M predictor variables are nominal with L levels, I all K nuisance variables are nominal with L levels, I a single target variable: Y {y }, T 1 also nominal with L levels. without (Conditional) Independencies: I I full table p requires LM K 1 1 cells storage. inference requires O(LK 1 ) operations. I I for each Y w sum over all LK many Z u. with (Conditional) Independencies / Bayesian network: I CPDs p require O((M K 1)Lmax indegree 1 ) cells storage. I inference requires O((K 1)Ltreewidth 1 ) operations. I treewidth 1 for a chain! Note: See the Bayesian networks lecture for BN inference algorithms. Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 20 / 32

Machine Learning 4. Learning Outline 1. Introduction 2. Examples 3. Inference 4. Learning Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 21 / 32

Machine Learning 4. Learning Learning Bayesian Networks I parameter learning: given I the structure of the network (graph G ), I a regularization penalty Reg(θ) — for the parameters θ of the CPTs, and I data x1 , . . . , xN , learn the CPTs p. θ̂ : arg max θ I N X n 1 log p(xn ; θ) Reg(θ) structure learning: given I data, learn the structure G and the CPTs p. Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 21 / 32

Machine Learning 4. Learning Bayesian Approach I in the Bayesian approach, parameters are also considered to be random variables, thus, I learning is just a special type of inference (with the parameters as targets) I information about the distribution of the parameters before seeing the data is required (prior distribution p(θ)) I parameter learning: given I the structure of the network (graph G ) and I a prior distribution p(θ) of the parameters, I data x1 , . . . , xN , learn the CPTs p. θ̂ : arg max θ N X log p(xn ; θ) log p(θ) n 1 Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 22 / 32

Machine Learning 4. Learning Plate Notation I variables on plates are duplicated I I the number of copies is given in the lower right corner. an index is used to differentiate copies of the same variable. Example 1: data x1 , . . . , xN is independently identically distributed (iid) θ X1 θ XN Xi N [Murphy, 2012, fig. 10.7] Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 23 / 32

Machine Learning 4. Learning Plate Notation I variables on plates are duplicated I the number of copies is given in the lower right corner. I an index is used to differentiate copies of the same variable. I variables being in several plates will be duplicated for every combination, i.e., have several indices. I for clarity, the index should be added to the plate (but often is omitted). Example 2: Naive Bayes classifier. π π Yi Yi Xij Xi1 . XiD N N θc1 . θcD C θjc[Murphy, 2012, fig. 10.8] C Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, D Germany 23 / 32

Machine Learning 4. Learning Learning from Complete Data Likelihood decomposes w.r.t. graph structure: p(D θ) : N Y p(xn n 1 N Y M Y n 1 m 1 M Y N Y m 1 n 1 M Y m 1 θ) p(xn,m xn,pa(m) , θm ) p(xn,m xn,pa(m) , θm ) p(Dm θm ) where θm are the parameters of p(xm pa(m)) Note: In Bayesian contexts, often p(. . . θ) is used instead of p(. . . ; θ). Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 24 / 32

Machine Learning 4. Learning Learning from Complete Data If the prior also factorizes, p(θ) M Y p(θm ) m 1 then the posterior factorizes as well p(θ D) p(D θ)p(θ) M Y m 1 p(Dm θm )p(θm ) and the parameters θm of each CPT can be estimated independently. Note: In Bayesian contexts, often p(. . . θ) is used instead of p(. . . ; θ). Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 25 / 32

Machine Learning 4. Learning Learning from Complete Data / Dirichlet Prior If I all variables are nominal, I variable m has Lm levels (m 1, . . . , M), and parameters θ of CPTs are p(xm xpa(m) ) θm,c,l , with c : xpa(m) , l : xm L X θm,c,l 1, l 1 m, c and a Dirichlet distribution for each row in the CPT θm,c,· Dir(αm,c ), Lm αm,c (R 0) is a useful prior. Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 26 / 32

Machine Learning 4. Learning Learning from Complete Data / Dirichlet Prior Then the posterior p(θm,c,· D) is also Dirichlet: θm,c,· D Dir(αm,c Nm,c ) Nm,c,l : N X δ(xn,m l, xn,pa(m) c ) n 1 with mean θ̄m,c,l PL Nm,c,l αm,c,l l 0 1 Nm,c,l 0 αm,c,l 0 Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 27 / 32

Machine Learning 4. Learning Learning from Complete Data / Example graph structure: data: x1 x2 x3 x4 x5 0 0 1 0 0 0 1 1 1 1 2 3 1 1 0 1 0 0 1 1 0 0 4 5 0 1 1 1 0 learned parameters for CPT of x4 (m 4): 1 prior: p(θm,c ) : Dir(1, 1) m, c c xpa(m) Nm,c,l θ̄m,c,l x2 x3 N4,c,1 N4,c,0 θ̄4,c,1 θ̄4,c,0 0 0 0 0 1/2 1/2 1 0 1 0 2/3 1/3 0 1 0 1 1/3 2/3 1 1 2 1 3/5 2/5 [Murphy, 2012, fig. 10.1a Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 28 / 32

Machine Learning 4. Learning Learning BN from Complete Data / Algorithm 1 2 3 4 5 learn-bn-params(Dtrain : {x1 , . . . , xN } X 1 · · · X M , G , α) : for n : 1 : N: for m : 1 : M: αm,xn,m ,xn,pa(m) 1 return α where I X m : {1, . . . , Lm } discrete domains of variable Xm (having Lm different levels) I G is a DAG on {1, . . . , M} I (αm,l,c )m 1:M,l 1:Lm ,c Qc pa(m) Lc 0 the Dirichlet prior of the parameters Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 29 / 32

Machine Learning 4. Learning Learning with Missing and/or Hidden Variables Learning with I I missing values or hidden variables is more complicated as I I the likelihood no longer factorizes and neither is convex. use iterative approximation algorithms to find a local MAP or ML optimum. Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 30 / 32

Machine Learning 4. Learning Summary I I Bayesian Networks define a joint probability distribution by a factorization of conditional probability distributions (CPDs) p(xn pa(xn )) I Conditions pa(m) form a DAG. I For nominal variables, all CPDs can be represented as tables (CPTs). I Storage complexity is O(Lmax indegree 1 ) (instead of O(LM )). Many model classes essentially are Bayesian networks: I I Naive Bayes classifier, Markov Models, Hidden Markov Models Inference in BN means to compute the (marginal joint) distribution of target variables given observed evidence of some predictor variables. I A Bayesian network can answer queries for arbitrary targets (not just a predefined one as most predictive models). I Nuisance variables (for a query) are variables neither observed nor used as targets. Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany I Inference with nuisance variables can be done efficiently for DAGs with 31 / 32

Machine Learning 4. Learning Summary (2/2) I I Learning BN has to distinguish between I parameter learning: learn just the CPDs for a given graph, vs. I structure learning: learn both, graph and CPDs. Parameter learning the maximum aposteriori (MAP) for BN with CPTs and Dirichlet prior can be done simply by counting the frequencies of families in the data. Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 32 / 32

Machine Learning Further Readings I [Murphy, 2012, chapter 10]. Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 33 / 32

Machine Learning References Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press, 2012. Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 34 / 32

Related Documents:

Machine Learning Algorithms - A Review

supervised machine learning is a combination of supervised and unsupervised machine learning methods. It can be fruit-full in those areas of machine learning and data mining where the unlabeled data is already present and getting the labeled data is a tedious process. With more common supervised machine learning methods, you train

29 Views

1y ago

Intrusion detection in Industrial OT environment by combination of ...

This research used four of the machine learning algorithms which are mentioned below. The Random Forest: Supervised machine learning classifier. K-NN: Supervised machine learning classifier. Multilayer perceptron (MLP): un-supervised Deep learning algorithm. Stacked Ensemble Learning: Hybrid learning method where the prediction .

11 Views

1y ago

Specification and Price of Automatic Rendering Machine (FOB ... - AR

decoration machine mortar machine paster machine plater machine wall machinery putzmeister plastering machine mortar spraying machine india ez renda automatic rendering machine price wall painting machine price machine manufacturers in china mail concrete mixer machines cement mixture machine wall finishing machine .

15 Views

3m ago

Evaluation of Supervised Machine Learning Classifiers for ...

The supervised machine learning is applied based on algorithms for classifying data set, thus they are called classifiers. The research uses a classifier term as a synonym for an algorithm. The algorithms, or classifiers, of the supervised machine learning are Naïve Bayes, SVM, kNN, C 4.5, and Random Forest. The rest of this paper is organized .

46 Views

3y ago

Credit Card Fraud Detection Using Machine Learning …

II. LITERATURE SURVEY Multiple Supervised and Semi-Supervised machine learning techniques are used for fraud detection , In this paper we have compared certain machine learning algorithms for detection of fraudulent transaction and find accuracy of each algorithms.Many Supervised

32 Views

2y ago

Semi-Supervised Learning with the Deep Rendering Mixture Model

Semi-supervised learning algorithms reduce the high cost of acquiring labeled training data by using both la-beled and unlabeled data during learning. Deep Convo-lutional Networks (DCNs) have achieved great success in supervised tasks and as such have been widely employed in the semi-supervised learning. In this paper we lever-

22 Views

1y ago

Graph-Based Semi-Supervised Learning for Natural Language Understanding

In contrast with supervised learning algorithms, SSL algorithms can improve their performance by leveraging information in unlabeled data. Some recent results (Laine and Aila,2017;Miyato et al., 2019;Tarvainen and Valpola,2017) have shown that semi-supervised learning could reach perfor-mance of purely supervised learning in certain sce-narios.

12 Views

1y ago

Differential Expression of Genes and DNA Methylation ...

Genes and DNA Methylation associated with Prenatal Protein Undernutrition by Albumen Removal in an avian model . the main source of protein for the developing embryo8, the net effect is prenatal protein undernutrition. Thus, in the chicken only strictly nutritional effects are involved, in contrast to mammalian models where maternal effects (e.g. hormonal effects) are implicated. Indeed, in .

60 Views

3y ago

Recent Views

Guidance for opponents in civil legal aid cases - Scottish Legal Aid Board

injury case - may apply for civil legal aid (since this leaﬂet deals only with civil legal aid, where we refer to "legal aid" we mean "civil legal aid"). Legal aid is ﬁnancial help from public funds. It helps people who qualify to get legal advice and the help of a solicitor to put their case in court.

4m ago

110 Views

WHAT TO DO IF YOU ARE SEXUALLY HARASSED

There are many legal clinics or legal information centres you can contact to obtain legal information, educational resources or legal referrals. Alberta Central Alberta Community Legal Clinic (Red Deer) Centre for Public Legal Education Alberta Pro Bono Law Alberta Women's Centre Legal Advice Clinic (Calgary)

3y ago

245 Views

Legal Advocacy Essentials

Legal Advocacy Essentials: a core training for legal advocates Presented by the Washington State Coalition Against Domestic Violence, 2008. This information is not intended as a substitute for legal advice. 1 Legal Advocacy Essentials . A core training for legal advocates . Table of Contents . What is a legal advocate?

1y ago

249 Views

Legal & Corporate Services: Strategic Plan - CP6

the provision of legal advice, managing legal risk and managing the legal supply chain. By doing this well, the team will move towards its vision. Legal Services is made up of 4 teams, each serving different customers with a dedicated legal resource. This is summarised in the figure right. Although Legal Services has customerdistinct, -focussed .

1y ago

171 Views

Legal Proceedings and Legal Privilege Exemptions: Myth-busting - ICO

If asking for legal advice, say so, and start new email chain If giving legal advice, say so Involve lawyers (before litigation contemplated) Maintain confidentiality of legal advice documents Limit dissemination of legal advice (need to know; original only) Make internal communications re legal advice factual

1y ago

240 Views

Community Fundraising Kit - Marrickville Legal Centre

Is a CLC the same as Legal Aid? Community legal centres are not the same as Legal Aid. Legal Aid NSW is a government body that provides legal services to people who experience significant disadvantage across NSW. Legal Aid provides assistance for criminal, family and civil law plus domestic and family violence.

6m ago

70 Views

Dafne-EFC 2020 Legal Environment for Philanthropy in .

Dafne-EFC Philanthropy Advocacy: 2020 Legal Environment for Philanthropy in Europe, Switzerland 3 I.Legal framework for foundations 1. Does the jurisdiction recognise a basic legal definition of a foundation? (please describe) What different legal types of foundations exist (autonomous organisations with legal

3y ago

215 Views

Legal Studies - Washington University in St. Louis

Legal Studies (02/09/21) Legal Studies The Legal Studies minor is an interdisciplinary program that allows students to study the role of law and legal institutions in society. Students who minor in Legal Studies learn about law in courses from anthropology, economics, history, philosophy, political science and other disciplines.

3y ago

183 Views

CLASS K - LAW

K85-89 Legal research K94 Legal composition and draftsmanship K100-103 Legal education K109-110 Law societies. International bar associations K115-130 The legal profession K133 Legal aid. Legal assistance to the poor K140-165 History of law K170 Biography K

2y ago

172 Views

Contract Management in Corporate Legal Departments .

May 25, 2016 · Relationship Between Legal, Finance, & the Business Create/Negotiate Activate Perform Analyze Renew Business Business Legal Legal Finance Finance Business Legal Finance Business . - Collaboration Legal Portal - Standard Operating Procedures - KPIs Dashboards - Reports. Technology Enabled Contract Management Best Practices 1. Initiate/

2y ago

361 Views

Persuasive Legal Writing

the court just focuses on the facts of the crime and hardly addresses any legal issue. The way to convince a court that a legal issue is worth reversing on requires that we have more than a legal basis to appeal - it requires us to put the legal issue in the context of a persuasive storyline. Sometimes the storyline will be about the legal issue.

1y ago

129 Views

Legal AI - Thomson Reuters

of the legal AI market, for example in relation to contract generation and completion. In short, legal AI has a potential use wherever there are people who must deal with legal documents or address legal queries, especially where those legal needs are expressed through text, which AI experts refer to as 'unstructured data'.

1y ago

123 Views

Legal Information vs Legal Advice Guidelines - TMCEC

giving legal advice. Legal advice is a written or oral statement that: o Interprets some aspect of the law, court rules, or court procedures; o Recommends a specific course of conduct a person should take in an actual or potential legal proceeding; or o Applies the law to the individual person's specific factual circumstances. What is Legal .

1y ago

225 Views

Smart legal contracts Advice to Government

The forms a smart legal contract can take 22 Use cases for smart legal contracts 30 Costs and benefits of smart legal contracts 35. CHAPTER 3: FORMATION OF SMART LEGAL CONTRACTS 39. The law on contract formation 39 Agreement 39 Consideration 49 Certainty and completeness 50 Intention to create legal relations 54 Formality requirements 57

1y ago

162 Views

CSR FREQUENTLY ASKED QUESTIONS - Legal Services Corporation

Because of this lack of legal analysis applying the law to the client's unique circumstances, these letters do not meet the definition of legal assistance (legal advice is a subset of legal assistance) set forth in Section 2.2 of the 2008 CSR Handbook which reads: For CSR purposes, legal assistance is defined as the provision of limited service

1y ago

140 Views

Machine Learning - B. Supervised Learning: Nonlinear Models B.5. A .

It looks like you're using an ad-blocker