By
Young Xu
Probabilistic Graphical Models
1. Foundations
1.1 Probability Theory
1.1.1 Probability Distirbutions
1.1.2 Basic Concepts in Probability
1.1.3 Random Variables and Joint Distributions
1.1.4 Independence and Conditional Independence
1.1.5 Querying a Distribution
1.1.6 Continuous Space
1.1.7 Expectation and Variance
1.2 Graphs
1.2.1 Nodes and Edges
1.2.2 Subgraphs
1.2.3 Paths and Trails
1.2.4 Cycles and Loops
Representation
2. The Bayesian Network Representation
2.1 Exploting Independence Properties
2.1.1 Independent Random Variables
2.1.2 The Conditional Parameterization
2.1.3 The Naive Bayes Model
2.2 Bayesian Network
2.2.1 The Student Example Revistited
2.2.2 Basic Independencies in Bayesian Networks
2.2.3 Graphs and Distributions
2.3 Independencies in Graphs
2.3.1 D-separation
2.3.2 Soundness and Completeness
2.3.3 An Algorithm for d-Separation
2.3.4 I-Equivalence
2.4 From Distributions to Graphs
2.4.1 Minimal I-Maps
2.4.2 Perfect Maps
2.4.3 Finding Perfect Maps (#)
3. Undirected Graphical Models
3.1 The Misconception Example
3.2 Parameterization
3.2.1 Factors
3.2.2 Gibbs Distributions and Markov Networks
3.2.3 Reduced Markov Networks
3.3 Markov Netwrok Independencies
3.3.1 Basic Independencies
3.3.2 Independencies Revisited
3.3.3 From Distributions to Graphs
3.4 Parameterization Revisited
3.4.1 Finer-Grained Parameterization
3.4.2 Overparameterization
3.5 Bayesian Networks and Markov Networks
3.5.1 From Bayesian Networks to Markov Networks
3.5.2 From Markov Networks to Bayesian Networks
3.5.3 Chordal Graphs
3.6 Partially Directed Models
3.6.1 Conditional Random Fields
3.6.2 Chain Graph Models (#)
4. Local Probabilistic Models
4.1 Tabular CPDs
4.2 Deterministic CPDs
4.3 Context-Specific CPDs
4.4 Independence of Causal Influence
4.4.1 The Noisy-Or Model
4.4.2 Generalized Linear Models
4.5 Continuous Variables
4.5.1 Hybrid Models
4.6 Conditional Bayesian Networks
5. Template-Based Representations
5.1 Temporal Models
5.1.1 Basic Assumptions
5.1.2 Dynamic Bayesian Networks
5.1.3 State-Observation Models
5.2 Template Variables and Template Factors
5.3 Directed Probabilistic Models for Object-Relational Domains
5.3.1 Plate Models
5.3.2 Probabilistic Realtional Models
5.4 Undirected Representation
5.5 Structural Uncertainty (#)
5.5.1 Relational Uncertainty
5.5.2 Object Uncertainty
6. Gaussian Network Models
6.1 Multivariate Gaussians
6.1.1 Basic Parameterization
6.1.2 Operations on Gaussians
6.1.3 Independencies in Gaussians
6.2 Gaussian Baysian Networks
6.3 Gaussian Markov Random Fiels
7. The Exponential Family
7.1 Exponential Families
7.1.1 Linear Exponential Families
7.2 Factored Exponential Families
7.2.1 Product Distributions
7.2.2 Bayesian Networks
7.3 Entropy and Relative Entropy
7.3.1 Entropy
7.3.2 Relative Entropy
Infernece
8. Exact Inference: Variable Elimination
8.1 Analysis of Complexity
8.1.1 Analysis of Exact Inference
8.1.2 Analysis of Approximate Infernece
8.2 Variable Elimination: The Basic Ideas
8.3 Variable Elimination
8.3.1 Basic Elimination
8.3.2 Dealing with Evidence
8.4 Complexity and Graph Structure: Variable Elimination
8.4.1 Simple Analysis
8.4.2 Graph-Theoretic Analysis
8.4.3 Finding Elimination Orderings (#)
8.5 Conditioning (#)
8.5.1 The Conditioning Algorithm
8.5.2 Conditioning and Varialble Elimination
8.5.3 Graph-Theoretic Analysis
8.5.4 ImProved Conditioning
8.5 Inference with Structured CPDs (#)
8.5.1 Independence of Causal Influence
8.5.2 Context-Specific Independence
9. Exact Inference: Clique Trees
9.1 Variable Elimination and Clique Trees
9.1.1 Cluster Graphs
9.1.2 Clique Trees
9.2 Message Passing: Sum Product
9.2.1 Variable Elimination in a Clique Tree
9.2.2 Clique Tree Calibration
9.2.3 A Calibrated Clique Tree as a Distribution
9.3 Message Passing: Belief Update
9.3.1 Message Passing with Division
9.3.2 Equivalence of Sum-Product and Belief Update Messages
9.3.3 Answering Queries
9.4 Constructiong a Clique Tree
9.4.1 Clique Trees from Variable Elimination
10. Inference as Optimization
10.1 Introduction
10.1.1 Exact Inference Revisited (#)
10.1.2 The Energy Functional
10.1.3 Optimizing the Energy Functional
10.2 Exact Inference as Optimization
10.2.1 Fixed-Poing Characterization
10.2.2 Inference as Optimization
10.3 Propagation-Based Approximation
10.3.1 A Simple Example
10.3.2 Cluster-Graph Belief Progagation
10.3.3 Properties of Cluster-Graph Belief Propagation
10.3.4 Analyzing Convergence (#)
10.3.5 Constructiong Cluster Graphs
10.3.6 Variatioanal Analysis
10.3.7 Other Enrtopy Approximations (#)
10.4 Propagation with Approximate Messages (#)
10.4.1 Facotrized Messages
10.4.2 Approximate Message Computation
10.4.3 Inference with Approximate Messages
10.4.4 Expectation Propagation
10.4.5 Variational Analysis
10.5 Structured Variatioanl Approximations
10.5.1 The Mean Field Approximation
10.5.2 Structured Approximations
10.5.3 Local Variational Methods (#)
11. Particle-Based Approximate Inference
11.1 Forward Sampling
11.1.1 Sampling from a Bayesian Network
11.1.2 Analysis of Error
11.1.3 Conditional Probability Queries
11.2 Likelihood Weighting and Importance Sampling
11.2.1 Likelihood Weighting: Intuition
11.2.2 Importance Sampling
11.2.3 Importance Sampling for Bayesian Networks
11.2.4 Importance Sampling Revisisted
11.3 Markov Chain Monte Carlo Mehtods
11.3.1 Gibbs Sampling Algorithm
11.3.2 Markov Chains
11.3.3 Gibbs Sampling Revisited
11.3.4 A Broader Class of Markov Chains (#)
11.3.4 Using a Markov Chain
11.4 Collapsed Particles
11.4.1 Collapsed Likelihood Weighting (#)
11.4.2 Collapsed MCMC
11.5 Deterministic Search Methods (#)
12. MAP Inference
12.1 Overview
12.1.1 Computational Complexity
12.1.2 Overview of Solution Methods
12.2 Variable Elimination for (Marginal) MAP
12.2.1 Max-Product Variable Elimination
12.2.2 Finding the Most Probalbe Assignment
12.2.3 Variable Elimination for Marginal MAP (#)
12.3 Max-Product in Clique Trees
12.3.1 Computing Max-Marginals
12.3.2 Message Passing as Reparameterization
12.3.3 Decoding Max-Marginals
12.4 Max-Product Belief Propagation in Loopy Cluster Graphs
12.4.1 Standard Max-Product Message Passing
12.4.2 Max-Product BP with Counting Numbers (#)
12.5 MAP as a Linear Optimization Problem
12.5.2 Linear Programming Relaxation
12.5.3 Low-Temperature Limits
12.6 Using Graph Cuts for MAP
12.6.1 Inference Using Graph Cuts
12.6.2 Nonbinary Variables
12.7 Local Search Algorithms
13. Inference in Hybrid Networks
13.1 Variable Elimination in Gaussian Networks
13.1.2 Sum-Product Algorithms
13.1.3 Gaussian Belief Propagation
13.2 Hybird Networks
13.2.1 The Difficulities
13.2.2 Factor Operations for Hybrid Gaussian Networks
13.2.3 EP for CLG Networks
13.2.4 An “Exact” ClG Algorithm (#)
13.3 Nonlinear Dependencies
13.3.1 Linearization
13.3.2 Expectation Propagation with Gaussian Approximation
13.4 Particle-Based Approximation Methods
13.4.1 Sampling in Continuous Spaces
13.4.2 Forwrad Sampling in Bayesian Networks
13.4.3 MCMC Methods
13.4.4 Collapsed Particles
13.4.5 Nonparametric Message Passing
14. Inference in Temporal Models
14.1 Exact Inference
14.1.1 Filtering in State-Observation Models
14.1.2 Filtering as Clique Tree Propagation
14.1.3 Clique Tree Inference in DBNs
14.1.4 Entanglement
14.2 Approximate Inference
14.2.1 Key Ideas
14.2.2 Factored Belief State Methods
14.2.3 Particle Filtering
14.2.4 Deterministic Search Techniques
14.3 Hybrid DBNs
14.3.1 Continuous Models
14.3.2 Hybrid Models
15. Learning Graphical Models: Overview
15.1 Goal of Learning
15.1.1 Density Estimation
15.1.2 Specific Prediction Tasks
15.1.3 Knowledge Discovery
15.2 Learning as Optimization
15.2.1 Empirical Risk as Overfitting
15.2.2 Discriminative versus Generative Training
15.3 Learning Tasks
15.3.1 Model Constraints
15.3.2 Data Observability
15.3.3 Taxonmomy of Learning Tasks
16. Parameter Estimation
16.1 Maximum Likelihood Estimation
16.1.1 The Thumbtack Example
16.1.2 The Maximum Likelihood Principle
16.2 MLE for Bayesian Networks
16.2.1 A Simple Example
16.2.2 Global Likelihood Decomposition
16.2.3 Table-CPDs
16.2.4 Gaussian Bayesian Networks (#)
16.2.5 Maximum Likelihood Estimation as M-Projection
16.3 Bayesian Parameter Estimation
16.3.1 Parameter Independence and Global Decompotion
16.3.2 Local Decompostion
16.3.3 Priors for Baysian Network Learning
16.3.4 MAP Estimation (#)
16.4 Learning Models with Shared Parameters
16.4.1 Global Parameter Sharing
16.4.2 Local Parameter Sharing
16.4.3 Bayesian Inference with Shared Parameters
16.4.4 Hierarchical Priors (#)
16.5 Generalization Analysis
16.5.1 Asymptotic Analysis
16.5.2 PAC-Bounds
17. Structure Learning in Baysian Networks
17.1 Constraint-Based Approcaches
17.1.1 General Framework
17.1.2 Independence Tests
17.2 Structure Scores
17.2.1 Likelihood Scores
17.2.2 Bayesian Score
17.2.3 Marginal Likelihood for a Single Variable
17.2.4 Bayesian Score for Bayesian Networks
17.2.4 Understanding the Bayesian Score
17.2.5 Priors
17.2.6 Score Equivalence (#)
17.3 Strucutre Search
17.3.1 Learning Tree-Structured Networks
17.3.2 Known Order
17.3.3 General Graphs
17.3.4 Learning with Equivalence Classes (#)
17.4 Bayesian Model Averaging (#)
17.4.1 Basic Theory
17.4.2 Model AVeraging Given an Order
17.4.3 The General Case
17.5 Learning Models with Additional Structure
17.5.1 Learning with Local Structure
17.5.2 Learning Template Models
18. Partially Observed Data
18.1 Foundations
18.1.1 Likelihood of Data and Observation Models
18.1.2 Decoupling of Observation Mechanism
18.1.3 The Likelihood Function
18.1.4 Identifiability
18.2 Parameter Estimation
18.2.1 Gradient Ascent
18.2.2 Expectation Maximization (EM)
18.2.3 Comparison: Gradient Ascent versus EM
18.2.4 Approximate Inference (#)
18.3 Bayesian Learning with Incomplete Data (#)
18.3.1 Overview
18.3.2 MCMC Sampling
18.3.3 Variational Bayesian Learning
18.4 Structure Learning
18.4.1 Scoring Structures
18.4.2 Structure Search
18.4.3 Structural EM
18.5 Learning Models with Hideen Variables
18.5.1 Information Content of Hidden Varialbles
18.5.2 Determining the Cardinality
18.5.3 Introducing Hidden Variables
19. Learning Undirected Models
19.1 Overview
19.2 The Likelihood Function
19.2.1 An Example
19.2.3 Properties of the Likelihood Function
19.3 Maximum (Conditiaonal) Likelihood Parameter Estimation
19.3.1 Maximum Likelihood Estimation
19.3.2 Conditionally Trained Models
19.3.3 Learning with Missing Data
19.3.4 Maximum Entropy and Maximum Likelihood (#)
19.4 Parameter Priors and Regularization
19.4.1 Local Priors
19.4.2 Global Priors
19.5 Learning with Approximate Inference
19.5.1 Belief Propagation
19.5.2 Map-Based Learning (#)
19.6 Alternative Objectives
19.6.1 Pseudolikelihood and Its Generalizations
19.6.2 Constrastive Optimization Criteria
19.7 Structure Learning
19.7.1 Structure Learning Using Independence Tests
19.7.2 Score-Based Learning: Hypothesis Spaces
19.7.3 Objective Functions
19.7.4 Optimization Task
19.7.5 Evaluating Changes to the Model
20. Causality
20.1 Motivation and Overview
20.1.1 Conditioning and Intervention
20.1.2 Correlation and Causation
20.2 Causal Models
20.3 Structural Causal Identifiability
20.3.1 Query Simplification Rules
20.3.2 Interated Query Simplification
20.4 Mechanisms and Response Variables (#)
20.5 Partial Identifiability in Functional Causal Models (#)
20.6 Counterfactual Queries (#)
20.6.1 Twinned Netwroks
20.6.2 Bounds on Counterfactual Queries
20.7 Learning Causal Models
20.7.1 Learning Causal Models without Confounding Factors
20.7.2 Learning from Interventional Data
20.7.3 Dealing with Latent Variables (#)
20.7.4 Learning Functional Causal Models (#)
21. Utilities and Decisions
21.1 Foundations: Maximizing Expected Utility
21.1.1 Decision Making Under Uncertainty
21.1.2 Theoretical Justification (#)
21.2 Utility Curves
21.2.1 Utility of Money
21.2.2 Attitudes Toward Risk
21.2.3 Rationality
21.3 Utility Elicitation
21.3.1 Utility Elicitation Procedures
21.3.2 Utility of Human Life
21.4 Utilities of Complex Outcomes
21.4.1 Preference and Utility Independence (#)
21.4.2 Additive Independence Properties
22. Structured Decision Problems
22.1 Decision Trees
22.1.1 Representation
22.1.2 Backward Induction Algorithm
22.2 Influence Diagrams
22.2.1 Basic Representation
22.2.2 Decision Rules
22.2.3 Semantics and Optimality Criterion
22.3 Backward Induction in Influence Diagrams
22.3.1 Decision Trees for Influence Diagrams
22.3.2 Sum-Max-Sum Rule
22.4 Computing Expected Utilities
22.4.1 Simple Variable Elimination
22.4.2 Multiple Utility Variables: Simple Approaches
22.4.3 Generalized Variable Elimination (#)
22.5 Optimization in Influence Diagrams
22.5.1 Optimizaing a Single Decision Rule
22.5.2 Iterated Optimization Algorithm
22.5.3 Strategic Relevance and Global Optimality
22.7.1 Single Observations
22.7.2 Multiple Observations