Graphical Models - Login.cs.utexas.edu

2y ago
10 Views
2 Downloads
4.15 MB
157 Pages
Last View : 20d ago
Last Download : 3m ago
Upload by : Kelvin Chao
Transcription

Graphical ModelsPradeep RavikumarDepartment of Computer ScienceThe University of Texas at Austin

Useful References Graphical models, exponential families, and variational inference. M. J.Wainwright and M. I. Jordan. Foundations and Trends in Machine Learning,Vol. 1, Numbers 1--2, pp. 1--305, December 2008.(Available for free online) Graphical Models with R. Soren Hojsgaard, David Edwards, Steffen Lauritzen Probabilistic Graphical Models: Principles and Techniques. D. Koller and N.Friedman.

Applications of Graphical Models Computer vision and graphics Natural language processing Information retrieval Robotic control Computational biology Medical diagnosis Finance and economics Error-control codes Decision making under uncertainty .

Speech Recognition

Bioinformatics

Computer Vision

Abstraction Layers and InterfacesInternet Protocol LayersComputer Architecture Layers

Abstraction Layers and InterfacesInternet Protocol Layers Computer Architecture LayersInterface layer for machine learning?

Abstraction Layers and InterfacesInternet Protocol Layers Computer Architecture LayersInterface layer for machine learning? Graphical Models!

Graphical Models: The Why Compact way of representing distributions among random variables

Graphical Models: The Why Compact way of representing distributions among random variables Visually appealing way (accessible to domain experts) of representingdistributions among random variables

Graphical Models: The Why Compact way of representing distributions among random variables Visually appealing way (accessible to domain experts) of representingdistributions among random variables They represent distributions among random variables — probability theory

Graphical Models: The Why Compact way of representing distributions among random variables Visually appealing way (accessible to domain experts) of representingdistributions among random variables They represent distributions among random variables — probability theory They use graphs to do so — graph theory

Graphical Models: The Why Compact way of representing distributions among random variables Visually appealing way (accessible to domain experts) of representingdistributions among random variables They represent distributions among random variables — probability theory They use graphs to do so — graph theory A marriage of graph and probability theory

Graphical Models: Representation

Joint Distributions Consider a set of random variables {X1 , X2 , . . . , Xn } Let xi be a realization of random variable Xi ; xi may be real, discrete,vector in vector space; for now assume variables are discrete. Probability Mass Function p(x1 , . . . , xn ) : P (X1 x1 , . . . , Xn xn ). Shorthand: X (X1 , . . . , Xn ), x (x1 , . . . , xn ),so that p(x) P (X x).

Directed Graphs Let V {1, 2, . . . , n}. Given a set of non-negative functions {fi (xi , x i ) :i 2 V }, each of which sum to one as a function of first argument, we definea joint probability distribution:p(x1 , . . . , xn ) : nYfi (xi , x i ).i 1 Exercise: Calculate p(xi x i ). Can be shown that p(xi x i ) fi (xi , x i ).

Directed Graphs Let V {1, 2, . . . , n}. Given a set of non-negative functions {fi (xi , x i ) :i 2 V }, each of which sum to one as a function of first argument, we definea joint probability distribution:p(x1 , . . . , xn ) : nYfi (xi , x i ).i 1 Exercise: Calculate p(xi x i ). Can be shown that p(xi x i ) fi (xi , x i ).

Directed Graphs Let V {1, 2, . . . , n}. Given a set of non-negative functions {fi (xi , x i ) :i 2 V }, each of which sum to one as a function of first argument, we definea joint probability distribution:p(x1 , . . . , xn ) : nYfi (xi , x i ).i 1 Exercise: Calculate p(xi x i ). Can be shown that p(xi x i ) fi (xi , x i ).

Representational Economy Directed graphical model distributions thus can be represented or storedmuch more cheaply than a general distribution‣ What’s the catch?‣ The graphical model distribution satisfy certain conditional independenceassumptions

Graphical Models and Conditional Independences A Graphical Model is a family of probability distributions. Given a graph,specify any set of conditional distributions of nodes given parents.

Graphical Models and Conditional Independences A Graphical Model is a family of probability distributions. Given a graph,specify any set of conditional distributions of nodes given parents. Any such distribution satisfies conditional independences

Graphical Models and Conditional Independences A Graphical Model is a family of probability distributions. Given a graph,specify any set of conditional distributions of nodes given parents. Any such distribution satisfies conditional independences‣ We will now provide an algorithm (Bayes-Ball, d-Separation) to list allconditional independence statements satisfied by the given family ofgraphical model distributions

Three Canonical Graphs

Is there a graph-theoretic way to write down allconditional independence statements given graph?

“Blocking” Rules IXY(a)ZXY(b)Z

Undirected Graphical Models

Undirected Graphical Models In the directed graphical model case, we started with a factorized formbefore discussing conditional independence assumptions because theformer was more natural (conditional probabilities vs “Bayes-Ball” algorithm) In the undirected graphical model case, the conditional independenceassertions are much more natural and intuitive‣ Just depend on classical graph separation (contrast with blocking rules)‣ X A is cond. independent of X C given X B, if nodes in X A are separatedfrom nodes in X C given nodes in X B‣ Remove nodes in X B: no path from nodes in X A to those in X B

Undirected Graphical Models: Parameterization

Undirected Graphical Models: Parameterization Given an undirected graph, we can define the set of conditional independenceassertions using graph separation as above

Undirected Graphical Models: Parameterization Given an undirected graph, we can define the set of conditional independenceassertions using graph separation as above‣ the set of distributions that satisfy all these assertions is the undirectedgraphical model family given the graph

Undirected Graphical Models: Parameterization Given an undirected graph, we can define the set of conditional independenceassertions using graph separation as above‣ the set of distributions that satisfy all these assertions is the undirectedgraphical model family given the graph We would like to obtain a “local” parameterization of an undirected graphical model

Undirected Graphical Models: Parameterization Given an undirected graph, we can define the set of conditional independenceassertions using graph separation as above‣ the set of distributions that satisfy all these assertions is the undirectedgraphical model family given the graph We would like to obtain a “local” parameterization of an undirected graphical model‣ Why? For compact representation of joint distribution!

Undirected Graphical Models: Parameterization Given an undirected graph, we can define the set of conditional independenceassertions using graph separation as above‣ the set of distributions that satisfy all these assertions is the undirectedgraphical model family given the graph We would like to obtain a “local” parameterization of an undirected graphical model‣ Why? For compact representation of joint distribution! For directed graphs, the parameterization was based on local conditionalprobabilities of nodes given their parents

Undirected Graphical Models: Parameterization Given an undirected graph, we can define the set of conditional independenceassertions using graph separation as above‣ the set of distributions that satisfy all these assertions is the undirectedgraphical model family given the graph We would like to obtain a “local” parameterization of an undirected graphical model‣ Why? For compact representation of joint distribution! For directed graphs, the parameterization was based on local conditionalprobabilities of nodes given their parents‣ “local” had the interpretation of a set consisting of a node and its parents

Undirected Graphical Models: Parameterization We would like to obtain a “local” parameterization of an undirected graphicalmodel Conditional probabilities of nodes given neighbors?‣ Hard to ensure consistency of such conditional probabilities across nodes,so won’t be able to choose such functions independently for each node

Undirected Graphical Models: Parameterization We would like to obtain a “local” parameterization of an undirected graphicalmodel Conditional probabilities of nodes given neighbors?‣ Hard to ensure consistency of such conditional probabilities across nodes,so won’t be able to choose such functions independently for each node‣ Their product need not be a valid joint distribution

Undirected Graphical Models: Parameterization How do we use conditional probabilities in the undirected graphical modelcase (where there are no parents)? Conditional probabilities of nodes given neighbors?‣ Hard to ensure consistency of such conditional probabilities across nodes,so won’t be able to choose such functions independently for each node‣ Their product need not be a valid joint distribution‣ Better to not use conditional probabilities, but to use product of someother “local” functions that can be chosen independently

Conditional Independence and Factorization If XA ? XC XB , thenP (XA XB , XC ) P (XA XC )P (XA , XB , XC ) P (XA XC ) P (XB , XC ) The joint distribution P (XA , XB , XC ) f (XA , XB ) g(XB , XC ), for somefunctions f and g. Conditional Independence entails factorization!

Conditional Independence and Factorization If XA ? XC XB , thenP (XA XB , XC ) P (XA XC )P (XA , XB , XC ) P (XA XC ) P (XB , XC ) The joint distribution P (XA , XB , XC ) f (XA , XB ) g(XB , XC ), for somefunctions f and g. Conditional Independence entails factorization! If there is no edge between Xi and Xj , then the joint distribution cannothave a factor such as (Xi , Xj , Xk ). Factors of the joint distribution can depend only on variables within afully connected subgraph, also called a clique. Without loss of generality, assume factors only over maximum cliques

Conditional Independence and Factorization If XA ? XC XB , thenP (XA XB , XC ) P (XA XC )P (XA , XB , XC ) P (XA XC ) P (XB , XC ) The joint distribution P (XA , XB , XC ) f (XA , XB ) g(XB , XC ), for somefunctions f and g. Conditional Independence entails factorization! If there is no edge between Xi and Xj , then the joint distribution cannothave a factor such as (Xi , Xj , Xk ). Factors of the joint distribution can depend only on variables within afully connected subgraph, also called a clique. Without loss of generality, assume factors only over maximum cliques

Conditional Independence and Factorization If XA ? XC XB , thenP (XA XB , XC ) P (XA XC )P (XA , XB , XC ) P (XA XC ) P (XB , XC ) The joint distribution P (XA , XB , XC ) f (XA , XB ) g(XB , XC ), for somefunctions f and g. Conditional Independence entails factorization! If there is no edge between Xi and Xj , then the joint distribution cannothave a factor such as (Xi , Xj , Xk ). Factors of the joint distribution can depend only on variables within afully connected subgraph, also called a clique. Without loss of generality, assume factors only over maximum cliques

Undirected Graphical Models: 5001x30110x510101x2

What do potential functions mean? We just discussed how local conditional probabilities are not satisfactoryas potential functions Why not replace potential functionsp(xC )?C (xC )with marginal probabilities

What do potential functions mean? Potential functions are neither conditional nor marginal probabilities, and donot have a local probabilistic interpretation But do often have a “pre-probabilistic” interpretation as “agreement,”“constraint” or “energy” Potential function favors certain configurations by assigning larger value

Dual Characterization of Undirected Graphical ModelsCharacterization I:Given a graph G, all distributions expressed as product of potentialfunctions over maximal cliquesCharacterization II:Given a graph G, all distributions that satisfy all conditional independence statementsspecified by graph separationHammersley Clifford Theorem: Characterizations are equivalent!As in the directed case, a profound link between probability theory and graph theory

Beyond Directed/Undirected Graphical Models

Factorization “Factorization” is a richer concept than “conditional independence” There exist families of distributions defined in terms of factorizations thatcannot be captured solely within (un)directed graphical model formalism. From point of view of graphical model inference algorithms, not a problem:would work for any distribution within any sub-family defined on graph. But by missing out on algebraic structure, might not have the most efficient inference algorithm possible We will soon see factor graphs, a graphical representation that makessuch reduced parameterizations explicit, and in particular could capturethe factorized family example earlier Factor graphs provide nothing new in terms of exploiting conditional independences, but provide a way to exploit algebraic relationships.

Factorization “Factorization” is a richer concept than “conditional independence” There exist families of distributions defined in terms of factorizations thatcannot be captured solely within (un)directed graphical model formalism. From point of view of graphical model inference algorithms, not a problem:would work for any distribution within any sub-family defined on graph. But by missing out on algebraic structure, might not have the most efficient inference algorithm possible We will soon see factor graphs, a graphical representation that makessuch reduced parameterizations explicit, and in particular could capturethe factorized family example earlier Factor graphs provide nothing new in terms of exploiting conditional independences, but provide a way to exploit algebraic relationships.

Factor Graphs Given a set of variables {x1 , . . . , xn }, let C denote a set of subsets of{1, . . . , n}. Example: C {{1, 3}, {3, 4}, {2, 4, 5}, {1, 3}}. Denote: C {Cs : s 2 F}. To each index s 2 F, associate factor fs (xCs ), function on variables indexed by Cs . Example above: with index set F {a, b, c, d}, factors arefa (x1 , x3 ), fb (x3 , x4 ), f3 (x2 , x4 , x5 ), fd (x1 , x3 ) Note that C is an arbitrary collection of subsets of indices; need not correspond to cliques of some graph (note that we have not even specified anygraph at this point!)

Factor Graphs Such factorized forms occur in many areas of mathematics, and hasapplications outside probability theory‣ Our interest is in application to probability distributions of course Can see that both undirected and directed graphical models can also bewritten as a product of factors We have yet not defined the factor graph‣ a graphical representation of the factored distribution above

Factor Graph Bipartite graph G(V, F, E), where vertices V index variables, the verticesF index the factors. The edges E: each factor node s 2 F linked to all variable nodes in Cs ;these are the only edges.

Graphical Models: Inference

Probabilistic Inference Let E and F be disjoint subsets of nodes of a graphical model; with XE and XFthe disjoint subsets of variables corresponding to E, F We want to calculate P(xF xE) for arbitrary subsets E and F‣ This is the general probabilistic inference problem‣ We will focus initially on a single “query node” F, and directed graphicalmodels

Simplest CaseXXXYYY(a)(b)(c)

Simplest CaseXXXYYY(a)(b)(c) P(Y X) is specified directly by the conditional probability table

Inference Algorithms Exact Algorithms: these solve the inference problem exactly‣ Caveat: Time taken is in general exponential in “tree-width” of graph (whichis typically not constant, and scales poorly with the number of nodes)‣ Variable Elimination, Sum-Product (for tree-structured graphs) Approximate algorithms: these solve the inference problem approximately‣ While time taken is linear/quadratic in number of nodes, even to obtainmethods with approximation guarantees can be shown to be NP-hard‣ Sum-Product (for general graphs)

Sum Product Message-Passing Protocol: A node can send a message to a neighboringnode when (and only when) it has received messages from all of its remainingneighbors

Sum Product Message-Passing Protocol: A node can send a message to a neighboringnode when (and only when) it has received messages from all of its remainingneighbors Two ways to implement this protocol:‣ Parallel‣ From leaves to root, and from root to leaves (as in previous diagram, thiswill provide messages along both directions for any edge)

Sum Product: Parallel(a)(b)(c)(d)

Sum Product: Parallel(a)(b)(c)(d)

Sum Product: Parallel(a)(b)(c)(d)

Graphical Models: Learning

We will focus on pairwise graphical models1expp(X; , G) Z( )st (xs , xt )IsingPottsIndicator st st (Xs , Xt )(s,t) E(G): arbitrary potential functionsxs xtI(xs xt )I(xs , xt j, k)

Learning Graphical Models

Learning Graphical Models Two Step Procedures:

Learning Graphical Models Two Step Procedures:‣ 1. Model Selection; estimate graph structure

Learning Graphical Models Two Step Procedures:‣ 1. Model Selection; estimate graph structure‣ 2. Parameter Inference given graph structure

Learning Graphical Models Two Step Procedures:‣ 1. Model Selection; estimate graph structure‣ 2. Parameter Inference given graph structure Score Based Approaches: search over space of graphs, with a score for graphbased on parameter inference

Learning Graphical Models Two Step Procedures:‣ 1. Model Selection; estimate graph structure‣ 2. Parameter Inference given graph structure Score Based Approaches: search over space of graphs, with a score for graphbased on parameter inference Constraint-based Approaches: estimate individual edges by hypothesis testsfor conditional independences

Learning Graphical Models Two Step Procedures:‣ 1. Model Selection; estimate graph structure‣ 2. Parameter Inference given graph structure Score Based Approaches: search over space of graphs, with a score for graphbased on parameter inference Constraint-based Approaches: estimate individual edges by hypothesis testsfor conditional independences Caveats: (a) difficult to provide guarantees for estimators; (b) estimators are NPHard

Learning Graphical Models State of the art methods are based on estimating neighborhoods‣ Via high-dimensional statistical model estimation‣ Via high-dimensional hypothesis tests

Graphical Models with R. Soren Hojsgaard, David Edwards, Steffen Lauritzen Probabilistic Graphical Models: Principles and Techniques. D. Koller and N. Friedman. Applications of Graphical Models . But do often have a “pre-

Related Documents:

471-9107, ssloan@austin.utexas.edu Shetal Vohra-Gupta, PhD, 232-2701, sgupta@austin.utexas.edu and Stacey Jordan, MSSW, stacey.jordan@austin.utexas.edu, Administration and Policy Practice Information regarding the Clinical Social Work and Sharon Brennan Executive Assistant Dean's Office, 471-0562; sharon.brennan@austin.utexas.edu

4 2. On the Commonwealth of Pennsylvania page, click Login to USER and MUSER click here. 3. In Username, enter "user\your PA Login username".Note the following: Your PA Login username is the username you entered when you registered with PA Login. Your PA Login username is NOT your PA Login email address. If you can't remember your PA Login username, visit PA Login Password Recovery.

4 2. On the Commonwealth of Pennsylvania page, click Login to USER and MUSER click here. 3. In Username, enter "user\your PA Login username".Note the following: Your PA Login username is the username you entered when you registered with PA Login. Your PA Login username is NOT your PA Login email address. If you can't remember your PA Login username, visit PA Login Password Recovery.

The graphical desktop presents a Graphical User Interface (GUI) for the user to interact with the system and run appl i cto ns. If you hav eus dh x -b r login, you will have to start the graphical desktop manually by entering the command startx followed by the ENTER key. Fig. Starting the Graphical Desktop Note: The graphical desktop that we .

The login enhancement, called ACR Login, employs technology from SSO market leader Okta that enables users to enter their login credentials one time on a single page to access all their ACR Login applications. Certain ACR Login applications now require Multifactor Authentication (MFA). MFA necessitates users entering

account from a PA Login account to the new Keystone login system. If, at any time, you run into issues attempting to migration your existing Egrants (PA Login) account to a Keystone Login account, or have issues attempting to create and/or register a new Keystone Login account, please call the PCCD Egrants helpdesk at 717-787-5887.

Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago May 23, 2011 Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models May 23, 2011 1 / 30. . Graphical Models May 23, 2011 16 / 30. Bias-Variance trade o If the hypothesis space is very limited, it mig

An Alphabetical List of Diocesan and Religious Priests of the United States REPORTED TO THE PUBLISHERS FOR THIS ISSUE (Cardinals, Archbishops, Bishops, Archabbots and Abbots are listed in previous section)