文章目录
  1. 1. Probabilistic Graphical Models
    1. 1.1. 1. Foundations
      1. 1.1.1. 1.1 Probability Theory
        1. 1.1.1.1. 1.1.1 Probability Distirbutions
        2. 1.1.1.2. 1.1.2 Basic Concepts in Probability
        3. 1.1.1.3. 1.1.3 Random Variables and Joint Distributions
        4. 1.1.1.4. 1.1.4 Independence and Conditional Independence
        5. 1.1.1.5. 1.1.5 Querying a Distribution
        6. 1.1.1.6. 1.1.6 Continuous Space
        7. 1.1.1.7. 1.1.7 Expectation and Variance
      2. 1.1.2. 1.2 Graphs
        1. 1.1.2.1. 1.2.1 Nodes and Edges
        2. 1.1.2.2. 1.2.2 Subgraphs
        3. 1.1.2.3. 1.2.3 Paths and Trails
        4. 1.1.2.4. 1.2.4 Cycles and Loops
  2. 2. Representation
    1. 2.1. 2. The Bayesian Network Representation
      1. 2.1.1. 2.1 Exploting Independence Properties
        1. 2.1.1.1. 2.1.1 Independent Random Variables
        2. 2.1.1.2. 2.1.2 The Conditional Parameterization
        3. 2.1.1.3. 2.1.3 The Naive Bayes Model
      2. 2.1.2. 2.2 Bayesian Network
        1. 2.1.2.1. 2.2.1 The Student Example Revistited
        2. 2.1.2.2. 2.2.2 Basic Independencies in Bayesian Networks
        3. 2.1.2.3. 2.2.3 Graphs and Distributions
      3. 2.1.3. 2.3 Independencies in Graphs
        1. 2.1.3.1. 2.3.1 D-separation
        2. 2.1.3.2. 2.3.2 Soundness and Completeness
        3. 2.1.3.3. 2.3.3 An Algorithm for d-Separation
        4. 2.1.3.4. 2.3.4 I-Equivalence
      4. 2.1.4. 2.4 From Distributions to Graphs
        1. 2.1.4.1. 2.4.1 Minimal I-Maps
        2. 2.1.4.2. 2.4.2 Perfect Maps
        3. 2.1.4.3. 2.4.3 Finding Perfect Maps (#)
    2. 2.2. 3. Undirected Graphical Models
      1. 2.2.1. 3.1 The Misconception Example
      2. 2.2.2. 3.2 Parameterization
        1. 2.2.2.1. 3.2.1 Factors
        2. 2.2.2.2. 3.2.2 Gibbs Distributions and Markov Networks
        3. 2.2.2.3. 3.2.3 Reduced Markov Networks
      3. 2.2.3. 3.3 Markov Netwrok Independencies
        1. 2.2.3.1. 3.3.1 Basic Independencies
        2. 2.2.3.2. 3.3.2 Independencies Revisited
        3. 2.2.3.3. 3.3.3 From Distributions to Graphs
      4. 2.2.4. 3.4 Parameterization Revisited
        1. 2.2.4.1. 3.4.1 Finer-Grained Parameterization
        2. 2.2.4.2. 3.4.2 Overparameterization
      5. 2.2.5. 3.5 Bayesian Networks and Markov Networks
        1. 2.2.5.1. 3.5.1 From Bayesian Networks to Markov Networks
        2. 2.2.5.2. 3.5.2 From Markov Networks to Bayesian Networks
        3. 2.2.5.3. 3.5.3 Chordal Graphs
      6. 2.2.6. 3.6 Partially Directed Models
        1. 2.2.6.1. 3.6.1 Conditional Random Fields
        2. 2.2.6.2. 3.6.2 Chain Graph Models (#)
    3. 2.3. 4. Local Probabilistic Models
      1. 2.3.1. 4.1 Tabular CPDs
      2. 2.3.2. 4.2 Deterministic CPDs
      3. 2.3.3. 4.3 Context-Specific CPDs
      4. 2.3.4. 4.4 Independence of Causal Influence
        1. 2.3.4.1. 4.4.1 The Noisy-Or Model
        2. 2.3.4.2. 4.4.2 Generalized Linear Models
        3. 2.3.4.3. 4.4.3 The General Formulation
      5. 2.3.5. 4.5 Continuous Variables
        1. 2.3.5.1. 4.5.1 Hybrid Models
        2. 2.3.5.2. 4.6 Conditional Bayesian Networks
    4. 2.4. 5. Template-Based Representations
      1. 2.4.1. 5.1 Temporal Models
        1. 2.4.1.1. 5.1.1 Basic Assumptions
        2. 2.4.1.2. 5.1.2 Dynamic Bayesian Networks
        3. 2.4.1.3. 5.1.3 State-Observation Models
      2. 2.4.2. 5.2 Template Variables and Template Factors
      3. 2.4.3. 5.3 Directed Probabilistic Models for Object-Relational Domains
        1. 2.4.3.1. 5.3.1 Plate Models
        2. 2.4.3.2. 5.3.2 Probabilistic Realtional Models
      4. 2.4.4. 5.4 Undirected Representation
      5. 2.4.5. 5.5 Structural Uncertainty (#)
        1. 2.4.5.1. 5.5.1 Relational Uncertainty
        2. 2.4.5.2. 5.5.2 Object Uncertainty
    5. 2.5. 6. Gaussian Network Models
      1. 2.5.1. 6.1 Multivariate Gaussians
        1. 2.5.1.1. 6.1.1 Basic Parameterization
        2. 2.5.1.2. 6.1.2 Operations on Gaussians
        3. 2.5.1.3. 6.1.3 Independencies in Gaussians
      2. 2.5.2. 6.2 Gaussian Baysian Networks
      3. 2.5.3. 6.3 Gaussian Markov Random Fiels
    6. 2.6. 7. The Exponential Family
      1. 2.6.1. 7.1 Exponential Families
        1. 2.6.1.1. 7.1.1 Linear Exponential Families
    7. 2.7. 7.2 Factored Exponential Families
      1. 2.7.1. 7.2.1 Product Distributions
      2. 2.7.2. 7.2.2 Bayesian Networks
    8. 2.8. 7.3 Entropy and Relative Entropy
      1. 2.8.1. 7.3.1 Entropy
      2. 2.8.2. 7.3.2 Relative Entropy
  3. 3. Infernece
    1. 3.1. 8. Exact Inference: Variable Elimination
      1. 3.1.1. 8.1 Analysis of Complexity
        1. 3.1.1.1. 8.1.1 Analysis of Exact Inference
        2. 3.1.1.2. 8.1.2 Analysis of Approximate Infernece
      2. 3.1.2. 8.2 Variable Elimination: The Basic Ideas
      3. 3.1.3. 8.3 Variable Elimination
        1. 3.1.3.1. 8.3.1 Basic Elimination
        2. 3.1.3.2. 8.3.2 Dealing with Evidence
      4. 3.1.4. 8.4 Complexity and Graph Structure: Variable Elimination
        1. 3.1.4.1. 8.4.1 Simple Analysis
        2. 3.1.4.2. 8.4.2 Graph-Theoretic Analysis
        3. 3.1.4.3. 8.4.3 Finding Elimination Orderings (#)
      5. 3.1.5. 8.5 Conditioning (#)
        1. 3.1.5.1. 8.5.1 The Conditioning Algorithm
        2. 3.1.5.2. 8.5.2 Conditioning and Varialble Elimination
        3. 3.1.5.3. 8.5.3 Graph-Theoretic Analysis
        4. 3.1.5.4. 8.5.4 ImProved Conditioning
      6. 3.1.6. 8.5 Inference with Structured CPDs (#)
        1. 3.1.6.1. 8.5.1 Independence of Causal Influence
        2. 3.1.6.2. 8.5.2 Context-Specific Independence
    2. 3.2. 9. Exact Inference: Clique Trees
      1. 3.2.1. 9.1 Variable Elimination and Clique Trees
        1. 3.2.1.1. 9.1.1 Cluster Graphs
        2. 3.2.1.2. 9.1.2 Clique Trees
      2. 3.2.2. 9.2 Message Passing: Sum Product
        1. 3.2.2.1. 9.2.1 Variable Elimination in a Clique Tree
        2. 3.2.2.2. 9.2.2 Clique Tree Calibration
        3. 3.2.2.3. 9.2.3 A Calibrated Clique Tree as a Distribution
      3. 3.2.3. 9.3 Message Passing: Belief Update
        1. 3.2.3.1. 9.3.1 Message Passing with Division
        2. 3.2.3.2. 9.3.2 Equivalence of Sum-Product and Belief Update Messages
        3. 3.2.3.3. 9.3.3 Answering Queries
      4. 3.2.4. 9.4 Constructiong a Clique Tree
        1. 3.2.4.1. 9.4.1 Clique Trees from Variable Elimination
        2. 3.2.4.2. 9.4.2 Clique Trees form Chordal Graphs
    3. 3.3. 10. Inference as Optimization
      1. 3.3.1. 10.1 Introduction
        1. 3.3.1.1. 10.1.1 Exact Inference Revisited (#)
        2. 3.3.1.2. 10.1.2 The Energy Functional
        3. 3.3.1.3. 10.1.3 Optimizing the Energy Functional
      2. 3.3.2. 10.2 Exact Inference as Optimization
        1. 3.3.2.1. 10.2.1 Fixed-Poing Characterization
        2. 3.3.2.2. 10.2.2 Inference as Optimization
      3. 3.3.3. 10.3 Propagation-Based Approximation
        1. 3.3.3.1. 10.3.1 A Simple Example
        2. 3.3.3.2. 10.3.2 Cluster-Graph Belief Progagation
        3. 3.3.3.3. 10.3.3 Properties of Cluster-Graph Belief Propagation
        4. 3.3.3.4. 10.3.4 Analyzing Convergence (#)
        5. 3.3.3.5. 10.3.5 Constructiong Cluster Graphs
        6. 3.3.3.6. 10.3.6 Variatioanal Analysis
        7. 3.3.3.7. 10.3.7 Other Enrtopy Approximations (#)
      4. 3.3.4. 10.4 Propagation with Approximate Messages (#)
        1. 3.3.4.1. 10.4.1 Facotrized Messages
        2. 3.3.4.2. 10.4.2 Approximate Message Computation
        3. 3.3.4.3. 10.4.3 Inference with Approximate Messages
        4. 3.3.4.4. 10.4.4 Expectation Propagation
        5. 3.3.4.5. 10.4.5 Variational Analysis
      5. 3.3.5. 10.5 Structured Variatioanl Approximations
        1. 3.3.5.1. 10.5.1 The Mean Field Approximation
        2. 3.3.5.2. 10.5.2 Structured Approximations
        3. 3.3.5.3. 10.5.3 Local Variational Methods (#)
    4. 3.4. 11. Particle-Based Approximate Inference
      1. 3.4.1. 11.1 Forward Sampling
        1. 3.4.1.1. 11.1.1 Sampling from a Bayesian Network
        2. 3.4.1.2. 11.1.2 Analysis of Error
        3. 3.4.1.3. 11.1.3 Conditional Probability Queries
      2. 3.4.2. 11.2 Likelihood Weighting and Importance Sampling
        1. 3.4.2.1. 11.2.1 Likelihood Weighting: Intuition
        2. 3.4.2.2. 11.2.2 Importance Sampling
        3. 3.4.2.3. 11.2.3 Importance Sampling for Bayesian Networks
        4. 3.4.2.4. 11.2.4 Importance Sampling Revisisted
      3. 3.4.3. 11.3 Markov Chain Monte Carlo Mehtods
        1. 3.4.3.1. 11.3.1 Gibbs Sampling Algorithm
        2. 3.4.3.2. 11.3.2 Markov Chains
        3. 3.4.3.3. 11.3.3 Gibbs Sampling Revisited
        4. 3.4.3.4. 11.3.4 A Broader Class of Markov Chains (#)
        5. 3.4.3.5. 11.3.4 Using a Markov Chain
      4. 3.4.4. 11.4 Collapsed Particles
        1. 3.4.4.1. 11.4.1 Collapsed Likelihood Weighting (#)
        2. 3.4.4.2. 11.4.2 Collapsed MCMC
        3. 3.4.4.3. 11.5 Deterministic Search Methods (#)
    5. 3.5. 12. MAP Inference
      1. 3.5.1. 12.1 Overview
        1. 3.5.1.1. 12.1.1 Computational Complexity
        2. 3.5.1.2. 12.1.2 Overview of Solution Methods
      2. 3.5.2. 12.2 Variable Elimination for (Marginal) MAP
        1. 3.5.2.1. 12.2.1 Max-Product Variable Elimination
        2. 3.5.2.2. 12.2.2 Finding the Most Probalbe Assignment
        3. 3.5.2.3. 12.2.3 Variable Elimination for Marginal MAP (#)
      3. 3.5.3. 12.3 Max-Product in Clique Trees
        1. 3.5.3.1. 12.3.1 Computing Max-Marginals
        2. 3.5.3.2. 12.3.2 Message Passing as Reparameterization
        3. 3.5.3.3. 12.3.3 Decoding Max-Marginals
      4. 3.5.4. 12.4 Max-Product Belief Propagation in Loopy Cluster Graphs
        1. 3.5.4.1. 12.4.1 Standard Max-Product Message Passing
        2. 3.5.4.2. 12.4.2 Max-Product BP with Counting Numbers (#)
      5. 3.5.5. 12.5 MAP as a Linear Optimization Problem
        1. 3.5.5.1. 12.5.1 The Integer Program Formulation
        2. 3.5.5.2. 12.5.2 Linear Programming Relaxation
        3. 3.5.5.3. 12.5.3 Low-Temperature Limits
      6. 3.5.6. 12.6 Using Graph Cuts for MAP
        1. 3.5.6.1. 12.6.1 Inference Using Graph Cuts
        2. 3.5.6.2. 12.6.2 Nonbinary Variables
      7. 3.5.7. 12.7 Local Search Algorithms
    6. 3.6. 13. Inference in Hybrid Networks
      1. 3.6.1. 13.1 Variable Elimination in Gaussian Networks
        1. 3.6.1.1. 13.1.1 Canonical Forms
        2. 3.6.1.2. 13.1.2 Sum-Product Algorithms
        3. 3.6.1.3. 13.1.3 Gaussian Belief Propagation
      2. 3.6.2. 13.2 Hybird Networks
        1. 3.6.2.1. 13.2.1 The Difficulities
        2. 3.6.2.2. 13.2.2 Factor Operations for Hybrid Gaussian Networks
        3. 3.6.2.3. 13.2.3 EP for CLG Networks
        4. 3.6.2.4. 13.2.4 An “Exact” ClG Algorithm (#)
      3. 3.6.3. 13.3 Nonlinear Dependencies
        1. 3.6.3.1. 13.3.1 Linearization
        2. 3.6.3.2. 13.3.2 Expectation Propagation with Gaussian Approximation
      4. 3.6.4. 13.4 Particle-Based Approximation Methods
        1. 3.6.4.1. 13.4.1 Sampling in Continuous Spaces
        2. 3.6.4.2. 13.4.2 Forwrad Sampling in Bayesian Networks
        3. 3.6.4.3. 13.4.3 MCMC Methods
        4. 3.6.4.4. 13.4.4 Collapsed Particles
        5. 3.6.4.5. 13.4.5 Nonparametric Message Passing
    7. 3.7. 14. Inference in Temporal Models
      1. 3.7.1. 14.1 Exact Inference
        1. 3.7.1.1. 14.1.1 Filtering in State-Observation Models
        2. 3.7.1.2. 14.1.2 Filtering as Clique Tree Propagation
        3. 3.7.1.3. 14.1.3 Clique Tree Inference in DBNs
        4. 3.7.1.4. 14.1.4 Entanglement
      2. 3.7.2. 14.2 Approximate Inference
        1. 3.7.2.1. 14.2.1 Key Ideas
        2. 3.7.2.2. 14.2.2 Factored Belief State Methods
        3. 3.7.2.3. 14.2.3 Particle Filtering
        4. 3.7.2.4. 14.2.4 Deterministic Search Techniques
      3. 3.7.3. 14.3 Hybrid DBNs
        1. 3.7.3.1. 14.3.1 Continuous Models
        2. 3.7.3.2. 14.3.2 Hybrid Models
    8. 3.8. 15. Learning Graphical Models: Overview
      1. 3.8.1. 15.1 Goal of Learning
        1. 3.8.1.1. 15.1.1 Density Estimation
        2. 3.8.1.2. 15.1.2 Specific Prediction Tasks
        3. 3.8.1.3. 15.1.3 Knowledge Discovery
      2. 3.8.2. 15.2 Learning as Optimization
        1. 3.8.2.1. 15.2.1 Empirical Risk as Overfitting
        2. 3.8.2.2. 15.2.2 Discriminative versus Generative Training
      3. 3.8.3. 15.3 Learning Tasks
        1. 3.8.3.1. 15.3.1 Model Constraints
        2. 3.8.3.2. 15.3.2 Data Observability
        3. 3.8.3.3. 15.3.3 Taxonmomy of Learning Tasks
    9. 3.9. 16. Parameter Estimation
      1. 3.9.1. 16.1 Maximum Likelihood Estimation
        1. 3.9.1.1. 16.1.1 The Thumbtack Example
        2. 3.9.1.2. 16.1.2 The Maximum Likelihood Principle
      2. 3.9.2. 16.2 MLE for Bayesian Networks
        1. 3.9.2.1. 16.2.1 A Simple Example
        2. 3.9.2.2. 16.2.2 Global Likelihood Decomposition
        3. 3.9.2.3. 16.2.3 Table-CPDs
        4. 3.9.2.4. 16.2.4 Gaussian Bayesian Networks (#)
        5. 3.9.2.5. 16.2.5 Maximum Likelihood Estimation as M-Projection
      3. 3.9.3. 16.3 Bayesian Parameter Estimation
        1. 3.9.3.1. 16.3.1 Parameter Independence and Global Decompotion
        2. 3.9.3.2. 16.3.2 Local Decompostion
        3. 3.9.3.3. 16.3.3 Priors for Baysian Network Learning
        4. 3.9.3.4. 16.3.4 MAP Estimation (#)
      4. 3.9.4. 16.4 Learning Models with Shared Parameters
        1. 3.9.4.1. 16.4.1 Global Parameter Sharing
        2. 3.9.4.2. 16.4.2 Local Parameter Sharing
        3. 3.9.4.3. 16.4.3 Bayesian Inference with Shared Parameters
        4. 3.9.4.4. 16.4.4 Hierarchical Priors (#)
      5. 3.9.5. 16.5 Generalization Analysis
        1. 3.9.5.1. 16.5.1 Asymptotic Analysis
        2. 3.9.5.2. 16.5.2 PAC-Bounds
    10. 3.10. 17. Structure Learning in Baysian Networks
      1. 3.10.1. 17.1 Constraint-Based Approcaches
        1. 3.10.1.1. 17.1.1 General Framework
        2. 3.10.1.2. 17.1.2 Independence Tests
      2. 3.10.2. 17.2 Structure Scores
        1. 3.10.2.1. 17.2.1 Likelihood Scores
        2. 3.10.2.2. 17.2.2 Bayesian Score
        3. 3.10.2.3. 17.2.3 Marginal Likelihood for a Single Variable
        4. 3.10.2.4. 17.2.4 Bayesian Score for Bayesian Networks
        5. 3.10.2.5. 17.2.4 Understanding the Bayesian Score
        6. 3.10.2.6. 17.2.5 Priors
        7. 3.10.2.7. 17.2.6 Score Equivalence (#)
      3. 3.10.3. 17.3 Strucutre Search
        1. 3.10.3.1. 17.3.1 Learning Tree-Structured Networks
        2. 3.10.3.2. 17.3.2 Known Order
        3. 3.10.3.3. 17.3.3 General Graphs
        4. 3.10.3.4. 17.3.4 Learning with Equivalence Classes (#)
      4. 3.10.4. 17.4 Bayesian Model Averaging (#)
        1. 3.10.4.1. 17.4.1 Basic Theory
        2. 3.10.4.2. 17.4.2 Model AVeraging Given an Order
        3. 3.10.4.3. 17.4.3 The General Case
      5. 3.10.5. 17.5 Learning Models with Additional Structure
        1. 3.10.5.1. 17.5.1 Learning with Local Structure
        2. 3.10.5.2. 17.5.2 Learning Template Models
    11. 3.11. 18. Partially Observed Data
      1. 3.11.1. 18.1 Foundations
        1. 3.11.1.1. 18.1.1 Likelihood of Data and Observation Models
        2. 3.11.1.2. 18.1.2 Decoupling of Observation Mechanism
        3. 3.11.1.3. 18.1.3 The Likelihood Function
        4. 3.11.1.4. 18.1.4 Identifiability
      2. 3.11.2. 18.2 Parameter Estimation
        1. 3.11.2.1. 18.2.1 Gradient Ascent
        2. 3.11.2.2. 18.2.2 Expectation Maximization (EM)
        3. 3.11.2.3. 18.2.3 Comparison: Gradient Ascent versus EM
        4. 3.11.2.4. 18.2.4 Approximate Inference (#)
      3. 3.11.3. 18.3 Bayesian Learning with Incomplete Data (#)
        1. 3.11.3.1. 18.3.1 Overview
        2. 3.11.3.2. 18.3.2 MCMC Sampling
        3. 3.11.3.3. 18.3.3 Variational Bayesian Learning
      4. 3.11.4. 18.4 Structure Learning
        1. 3.11.4.1. 18.4.1 Scoring Structures
        2. 3.11.4.2. 18.4.2 Structure Search
        3. 3.11.4.3. 18.4.3 Structural EM
      5. 3.11.5. 18.5 Learning Models with Hideen Variables
        1. 3.11.5.1. 18.5.1 Information Content of Hidden Varialbles
        2. 3.11.5.2. 18.5.2 Determining the Cardinality
        3. 3.11.5.3. 18.5.3 Introducing Hidden Variables
    12. 3.12. 19. Learning Undirected Models
      1. 3.12.1. 19.1 Overview
      2. 3.12.2. 19.2 The Likelihood Function
        1. 3.12.2.1. 19.2.1 An Example
        2. 3.12.2.2. 19.2.2 Form of the Likelihood Function
        3. 3.12.2.3. 19.2.3 Properties of the Likelihood Function
      3. 3.12.3. 19.3 Maximum (Conditiaonal) Likelihood Parameter Estimation
        1. 3.12.3.1. 19.3.1 Maximum Likelihood Estimation
        2. 3.12.3.2. 19.3.2 Conditionally Trained Models
        3. 3.12.3.3. 19.3.3 Learning with Missing Data
        4. 3.12.3.4. 19.3.4 Maximum Entropy and Maximum Likelihood (#)
      4. 3.12.4. 19.4 Parameter Priors and Regularization
        1. 3.12.4.1. 19.4.1 Local Priors
        2. 3.12.4.2. 19.4.2 Global Priors
      5. 3.12.5. 19.5 Learning with Approximate Inference
        1. 3.12.5.1. 19.5.1 Belief Propagation
        2. 3.12.5.2. 19.5.2 Map-Based Learning (#)
      6. 3.12.6. 19.6 Alternative Objectives
        1. 3.12.6.1. 19.6.1 Pseudolikelihood and Its Generalizations
        2. 3.12.6.2. 19.6.2 Constrastive Optimization Criteria
      7. 3.12.7. 19.7 Structure Learning
        1. 3.12.7.1. 19.7.1 Structure Learning Using Independence Tests
        2. 3.12.7.2. 19.7.2 Score-Based Learning: Hypothesis Spaces
        3. 3.12.7.3. 19.7.3 Objective Functions
        4. 3.12.7.4. 19.7.4 Optimization Task
        5. 3.12.7.5. 19.7.5 Evaluating Changes to the Model
    13. 3.13. 20. Causality
      1. 3.13.1. 20.1 Motivation and Overview
        1. 3.13.1.1. 20.1.1 Conditioning and Intervention
        2. 3.13.1.2. 20.1.2 Correlation and Causation
      2. 3.13.2. 20.2 Causal Models
        1. 3.13.2.1. 20.3 Structural Causal Identifiability
        2. 3.13.2.2. 20.3.1 Query Simplification Rules
        3. 3.13.2.3. 20.3.2 Interated Query Simplification
      3. 3.13.3. 20.4 Mechanisms and Response Variables (#)
      4. 3.13.4. 20.5 Partial Identifiability in Functional Causal Models (#)
      5. 3.13.5. 20.6 Counterfactual Queries (#)
        1. 3.13.5.1. 20.6.1 Twinned Netwroks
        2. 3.13.5.2. 20.6.2 Bounds on Counterfactual Queries
      6. 3.13.6. 20.7 Learning Causal Models
        1. 3.13.6.1. 20.7.1 Learning Causal Models without Confounding Factors
        2. 3.13.6.2. 20.7.2 Learning from Interventional Data
        3. 3.13.6.3. 20.7.3 Dealing with Latent Variables (#)
        4. 3.13.6.4. 20.7.4 Learning Functional Causal Models (#)
    14. 3.14. 21. Utilities and Decisions
      1. 3.14.1. 21.1 Foundations: Maximizing Expected Utility
        1. 3.14.1.1. 21.1.1 Decision Making Under Uncertainty
        2. 3.14.1.2. 21.1.2 Theoretical Justification (#)
      2. 3.14.2. 21.2 Utility Curves
        1. 3.14.2.1. 21.2.1 Utility of Money
        2. 3.14.2.2. 21.2.2 Attitudes Toward Risk
        3. 3.14.2.3. 21.2.3 Rationality
      3. 3.14.3. 21.3 Utility Elicitation
        1. 3.14.3.1. 21.3.1 Utility Elicitation Procedures
        2. 3.14.3.2. 21.3.2 Utility of Human Life
      4. 3.14.4. 21.4 Utilities of Complex Outcomes
        1. 3.14.4.1. 21.4.1 Preference and Utility Independence (#)
        2. 3.14.4.2. 21.4.2 Additive Independence Properties
    15. 3.15. 22. Structured Decision Problems
      1. 3.15.1. 22.1 Decision Trees
        1. 3.15.1.1. 22.1.1 Representation
        2. 3.15.1.2. 22.1.2 Backward Induction Algorithm
      2. 3.15.2. 22.2 Influence Diagrams
        1. 3.15.2.1. 22.2.1 Basic Representation
        2. 3.15.2.2. 22.2.2 Decision Rules
        3. 3.15.2.3. 22.2.3 Semantics and Optimality Criterion
      3. 3.15.3. 22.3 Backward Induction in Influence Diagrams
        1. 3.15.3.1. 22.3.1 Decision Trees for Influence Diagrams
        2. 3.15.3.2. 22.3.2 Sum-Max-Sum Rule
      4. 3.15.4. 22.4 Computing Expected Utilities
        1. 3.15.4.1. 22.4.1 Simple Variable Elimination
        2. 3.15.4.2. 22.4.2 Multiple Utility Variables: Simple Approaches
        3. 3.15.4.3. 22.4.3 Generalized Variable Elimination (#)
      5. 3.15.5. 22.5 Optimization in Influence Diagrams
        1. 3.15.5.1. 22.5.1 Optimizaing a Single Decision Rule
        2. 3.15.5.2. 22.5.2 Iterated Optimization Algorithm
        3. 3.15.5.3. 22.5.3 Strategic Relevance and Global Optimality
      6. 3.15.6. 22.6 Ignoring Irrelevant Information
      7. 3.15.7. 22.7 Value of Information
        1. 3.15.7.1. 22.7.1 Single Observations
        2. 3.15.7.2. 22.7.2 Multiple Observations

Probabilistic Graphical Models

1. Foundations

1.1 Probability Theory

1.1.1 Probability Distirbutions

1.1.2 Basic Concepts in Probability

1.1.3 Random Variables and Joint Distributions

1.1.4 Independence and Conditional Independence

1.1.5 Querying a Distribution

1.1.6 Continuous Space

1.1.7 Expectation and Variance

1.2 Graphs

1.2.1 Nodes and Edges

1.2.2 Subgraphs

1.2.3 Paths and Trails

1.2.4 Cycles and Loops

Representation

2. The Bayesian Network Representation

2.1 Exploting Independence Properties

2.1.1 Independent Random Variables

2.1.2 The Conditional Parameterization

2.1.3 The Naive Bayes Model

2.2 Bayesian Network

2.2.1 The Student Example Revistited

2.2.2 Basic Independencies in Bayesian Networks

2.2.3 Graphs and Distributions

2.3 Independencies in Graphs

2.3.1 D-separation

2.3.2 Soundness and Completeness

2.3.3 An Algorithm for d-Separation

2.3.4 I-Equivalence

2.4 From Distributions to Graphs

2.4.1 Minimal I-Maps

2.4.2 Perfect Maps

2.4.3 Finding Perfect Maps (#)

3. Undirected Graphical Models

3.1 The Misconception Example

3.2 Parameterization

3.2.1 Factors

3.2.2 Gibbs Distributions and Markov Networks

3.2.3 Reduced Markov Networks

3.3 Markov Netwrok Independencies

3.3.1 Basic Independencies

3.3.2 Independencies Revisited

3.3.3 From Distributions to Graphs

3.4 Parameterization Revisited

3.4.1 Finer-Grained Parameterization

3.4.2 Overparameterization

3.5 Bayesian Networks and Markov Networks

3.5.1 From Bayesian Networks to Markov Networks

3.5.2 From Markov Networks to Bayesian Networks

3.5.3 Chordal Graphs

3.6 Partially Directed Models

3.6.1 Conditional Random Fields

3.6.2 Chain Graph Models (#)

4. Local Probabilistic Models

4.1 Tabular CPDs

4.2 Deterministic CPDs

4.3 Context-Specific CPDs

4.4 Independence of Causal Influence

4.4.1 The Noisy-Or Model

4.4.2 Generalized Linear Models

4.4.3 The General Formulation

4.5 Continuous Variables

4.5.1 Hybrid Models

4.6 Conditional Bayesian Networks

5. Template-Based Representations

5.1 Temporal Models

5.1.1 Basic Assumptions

5.1.2 Dynamic Bayesian Networks

5.1.3 State-Observation Models

5.2 Template Variables and Template Factors

5.3 Directed Probabilistic Models for Object-Relational Domains

5.3.1 Plate Models

5.3.2 Probabilistic Realtional Models

5.4 Undirected Representation

5.5 Structural Uncertainty (#)

5.5.1 Relational Uncertainty

5.5.2 Object Uncertainty

6. Gaussian Network Models

6.1 Multivariate Gaussians

6.1.1 Basic Parameterization

6.1.2 Operations on Gaussians

6.1.3 Independencies in Gaussians

6.2 Gaussian Baysian Networks

6.3 Gaussian Markov Random Fiels

7. The Exponential Family

7.1 Exponential Families

7.1.1 Linear Exponential Families

7.2 Factored Exponential Families

7.2.1 Product Distributions

7.2.2 Bayesian Networks

7.3 Entropy and Relative Entropy

7.3.1 Entropy

7.3.2 Relative Entropy

Infernece

8. Exact Inference: Variable Elimination

8.1 Analysis of Complexity

8.1.1 Analysis of Exact Inference

8.1.2 Analysis of Approximate Infernece

8.2 Variable Elimination: The Basic Ideas

8.3 Variable Elimination

8.3.1 Basic Elimination

8.3.2 Dealing with Evidence

8.4 Complexity and Graph Structure: Variable Elimination

8.4.1 Simple Analysis

8.4.2 Graph-Theoretic Analysis

8.4.3 Finding Elimination Orderings (#)

8.5 Conditioning (#)

8.5.1 The Conditioning Algorithm

8.5.2 Conditioning and Varialble Elimination

8.5.3 Graph-Theoretic Analysis

8.5.4 ImProved Conditioning

8.5 Inference with Structured CPDs (#)

8.5.1 Independence of Causal Influence

8.5.2 Context-Specific Independence

9. Exact Inference: Clique Trees

9.1 Variable Elimination and Clique Trees

9.1.1 Cluster Graphs

9.1.2 Clique Trees

9.2 Message Passing: Sum Product

9.2.1 Variable Elimination in a Clique Tree

9.2.2 Clique Tree Calibration

9.2.3 A Calibrated Clique Tree as a Distribution

9.3 Message Passing: Belief Update

9.3.1 Message Passing with Division

9.3.2 Equivalence of Sum-Product and Belief Update Messages

9.3.3 Answering Queries

9.4 Constructiong a Clique Tree

9.4.1 Clique Trees from Variable Elimination

9.4.2 Clique Trees form Chordal Graphs

10. Inference as Optimization

10.1 Introduction

10.1.1 Exact Inference Revisited (#)

10.1.2 The Energy Functional

10.1.3 Optimizing the Energy Functional

10.2 Exact Inference as Optimization

10.2.1 Fixed-Poing Characterization

10.2.2 Inference as Optimization

10.3 Propagation-Based Approximation

10.3.1 A Simple Example

10.3.2 Cluster-Graph Belief Progagation

10.3.3 Properties of Cluster-Graph Belief Propagation

10.3.4 Analyzing Convergence (#)

10.3.5 Constructiong Cluster Graphs

10.3.6 Variatioanal Analysis

10.3.7 Other Enrtopy Approximations (#)

10.4 Propagation with Approximate Messages (#)

10.4.1 Facotrized Messages

10.4.2 Approximate Message Computation

10.4.3 Inference with Approximate Messages

10.4.4 Expectation Propagation

10.4.5 Variational Analysis

10.5 Structured Variatioanl Approximations

10.5.1 The Mean Field Approximation

10.5.2 Structured Approximations

10.5.3 Local Variational Methods (#)

11. Particle-Based Approximate Inference

11.1 Forward Sampling

11.1.1 Sampling from a Bayesian Network

11.1.2 Analysis of Error

11.1.3 Conditional Probability Queries

11.2 Likelihood Weighting and Importance Sampling

11.2.1 Likelihood Weighting: Intuition

11.2.2 Importance Sampling

11.2.3 Importance Sampling for Bayesian Networks

11.2.4 Importance Sampling Revisisted

11.3 Markov Chain Monte Carlo Mehtods

11.3.1 Gibbs Sampling Algorithm

11.3.2 Markov Chains

11.3.3 Gibbs Sampling Revisited

11.3.4 A Broader Class of Markov Chains (#)

11.3.4 Using a Markov Chain

11.4 Collapsed Particles

11.4.1 Collapsed Likelihood Weighting (#)

11.4.2 Collapsed MCMC

11.5 Deterministic Search Methods (#)

12. MAP Inference

12.1 Overview

12.1.1 Computational Complexity

12.1.2 Overview of Solution Methods

12.2 Variable Elimination for (Marginal) MAP

12.2.1 Max-Product Variable Elimination

12.2.2 Finding the Most Probalbe Assignment

12.2.3 Variable Elimination for Marginal MAP (#)

12.3 Max-Product in Clique Trees

12.3.1 Computing Max-Marginals

12.3.2 Message Passing as Reparameterization

12.3.3 Decoding Max-Marginals

12.4 Max-Product Belief Propagation in Loopy Cluster Graphs

12.4.1 Standard Max-Product Message Passing

12.4.2 Max-Product BP with Counting Numbers (#)

12.5 MAP as a Linear Optimization Problem

12.5.1 The Integer Program Formulation

12.5.2 Linear Programming Relaxation

12.5.3 Low-Temperature Limits

12.6 Using Graph Cuts for MAP

12.6.1 Inference Using Graph Cuts

12.6.2 Nonbinary Variables

12.7 Local Search Algorithms

13. Inference in Hybrid Networks

13.1 Variable Elimination in Gaussian Networks

13.1.1 Canonical Forms

13.1.2 Sum-Product Algorithms

13.1.3 Gaussian Belief Propagation

13.2 Hybird Networks

13.2.1 The Difficulities

13.2.2 Factor Operations for Hybrid Gaussian Networks

13.2.3 EP for CLG Networks

13.2.4 An “Exact” ClG Algorithm (#)

13.3 Nonlinear Dependencies

13.3.1 Linearization

13.3.2 Expectation Propagation with Gaussian Approximation

13.4 Particle-Based Approximation Methods

13.4.1 Sampling in Continuous Spaces

13.4.2 Forwrad Sampling in Bayesian Networks

13.4.3 MCMC Methods

13.4.4 Collapsed Particles

13.4.5 Nonparametric Message Passing

14. Inference in Temporal Models

14.1 Exact Inference

14.1.1 Filtering in State-Observation Models

14.1.2 Filtering as Clique Tree Propagation

14.1.3 Clique Tree Inference in DBNs

14.1.4 Entanglement

14.2 Approximate Inference

14.2.1 Key Ideas

14.2.2 Factored Belief State Methods

14.2.3 Particle Filtering

14.2.4 Deterministic Search Techniques

14.3 Hybrid DBNs

14.3.1 Continuous Models

14.3.2 Hybrid Models

15. Learning Graphical Models: Overview

15.1 Goal of Learning

15.1.1 Density Estimation

15.1.2 Specific Prediction Tasks

15.1.3 Knowledge Discovery

15.2 Learning as Optimization

15.2.1 Empirical Risk as Overfitting

15.2.2 Discriminative versus Generative Training

15.3 Learning Tasks

15.3.1 Model Constraints

15.3.2 Data Observability

15.3.3 Taxonmomy of Learning Tasks

16. Parameter Estimation

16.1 Maximum Likelihood Estimation

16.1.1 The Thumbtack Example

16.1.2 The Maximum Likelihood Principle

16.2 MLE for Bayesian Networks

16.2.1 A Simple Example

16.2.2 Global Likelihood Decomposition

16.2.3 Table-CPDs

16.2.4 Gaussian Bayesian Networks (#)

16.2.5 Maximum Likelihood Estimation as M-Projection

16.3 Bayesian Parameter Estimation

16.3.1 Parameter Independence and Global Decompotion

16.3.2 Local Decompostion

16.3.3 Priors for Baysian Network Learning

16.3.4 MAP Estimation (#)

16.4 Learning Models with Shared Parameters

16.4.1 Global Parameter Sharing

16.4.2 Local Parameter Sharing

16.4.3 Bayesian Inference with Shared Parameters

16.4.4 Hierarchical Priors (#)

16.5 Generalization Analysis

16.5.1 Asymptotic Analysis

16.5.2 PAC-Bounds

17. Structure Learning in Baysian Networks

17.1 Constraint-Based Approcaches

17.1.1 General Framework

17.1.2 Independence Tests

17.2 Structure Scores

17.2.1 Likelihood Scores

17.2.2 Bayesian Score

17.2.3 Marginal Likelihood for a Single Variable

17.2.4 Bayesian Score for Bayesian Networks

17.2.4 Understanding the Bayesian Score

17.2.5 Priors

17.2.6 Score Equivalence (#)

17.3.1 Learning Tree-Structured Networks

17.3.2 Known Order

17.3.3 General Graphs

17.3.4 Learning with Equivalence Classes (#)

17.4 Bayesian Model Averaging (#)

17.4.1 Basic Theory

17.4.2 Model AVeraging Given an Order

17.4.3 The General Case

17.5 Learning Models with Additional Structure

17.5.1 Learning with Local Structure

17.5.2 Learning Template Models

18. Partially Observed Data

18.1 Foundations

18.1.1 Likelihood of Data and Observation Models

18.1.2 Decoupling of Observation Mechanism

18.1.3 The Likelihood Function

18.1.4 Identifiability

18.2 Parameter Estimation

18.2.1 Gradient Ascent

18.2.2 Expectation Maximization (EM)

18.2.3 Comparison: Gradient Ascent versus EM

18.2.4 Approximate Inference (#)

18.3 Bayesian Learning with Incomplete Data (#)

18.3.1 Overview

18.3.2 MCMC Sampling

18.3.3 Variational Bayesian Learning

18.4 Structure Learning

18.4.1 Scoring Structures

18.4.3 Structural EM

18.5 Learning Models with Hideen Variables

18.5.1 Information Content of Hidden Varialbles

18.5.2 Determining the Cardinality

18.5.3 Introducing Hidden Variables

19. Learning Undirected Models

19.1 Overview

19.2 The Likelihood Function

19.2.1 An Example

19.2.2 Form of the Likelihood Function

19.2.3 Properties of the Likelihood Function

19.3 Maximum (Conditiaonal) Likelihood Parameter Estimation

19.3.1 Maximum Likelihood Estimation

19.3.2 Conditionally Trained Models

19.3.3 Learning with Missing Data

19.3.4 Maximum Entropy and Maximum Likelihood (#)

19.4 Parameter Priors and Regularization

19.4.1 Local Priors

19.4.2 Global Priors

19.5 Learning with Approximate Inference

19.5.1 Belief Propagation

19.5.2 Map-Based Learning (#)

19.6 Alternative Objectives

19.6.1 Pseudolikelihood and Its Generalizations

19.6.2 Constrastive Optimization Criteria

19.7 Structure Learning

19.7.1 Structure Learning Using Independence Tests

19.7.2 Score-Based Learning: Hypothesis Spaces

19.7.3 Objective Functions

19.7.4 Optimization Task

19.7.5 Evaluating Changes to the Model

20. Causality

20.1 Motivation and Overview

20.1.1 Conditioning and Intervention

20.1.2 Correlation and Causation

20.2 Causal Models

20.3 Structural Causal Identifiability

20.3.1 Query Simplification Rules

20.3.2 Interated Query Simplification

20.4 Mechanisms and Response Variables (#)

20.5 Partial Identifiability in Functional Causal Models (#)

20.6 Counterfactual Queries (#)

20.6.1 Twinned Netwroks

20.6.2 Bounds on Counterfactual Queries

20.7 Learning Causal Models

20.7.1 Learning Causal Models without Confounding Factors

20.7.2 Learning from Interventional Data

20.7.3 Dealing with Latent Variables (#)

20.7.4 Learning Functional Causal Models (#)

21. Utilities and Decisions

21.1 Foundations: Maximizing Expected Utility

21.1.1 Decision Making Under Uncertainty

21.1.2 Theoretical Justification (#)

21.2 Utility Curves

21.2.1 Utility of Money

21.2.2 Attitudes Toward Risk

21.2.3 Rationality

21.3 Utility Elicitation

21.3.1 Utility Elicitation Procedures

21.3.2 Utility of Human Life

21.4 Utilities of Complex Outcomes

21.4.1 Preference and Utility Independence (#)

21.4.2 Additive Independence Properties

22. Structured Decision Problems

22.1 Decision Trees

22.1.1 Representation

22.1.2 Backward Induction Algorithm

22.2 Influence Diagrams

22.2.1 Basic Representation

22.2.2 Decision Rules

22.2.3 Semantics and Optimality Criterion

22.3 Backward Induction in Influence Diagrams

22.3.1 Decision Trees for Influence Diagrams

22.3.2 Sum-Max-Sum Rule

22.4 Computing Expected Utilities

22.4.1 Simple Variable Elimination

22.4.2 Multiple Utility Variables: Simple Approaches

22.4.3 Generalized Variable Elimination (#)

22.5 Optimization in Influence Diagrams

22.5.1 Optimizaing a Single Decision Rule

22.5.2 Iterated Optimization Algorithm

22.5.3 Strategic Relevance and Global Optimality

22.6 Ignoring Irrelevant Information

22.7 Value of Information

22.7.1 Single Observations

22.7.2 Multiple Observations

文章目录
  1. 1. Probabilistic Graphical Models
    1. 1.1. 1. Foundations
      1. 1.1.1. 1.1 Probability Theory
        1. 1.1.1.1. 1.1.1 Probability Distirbutions
        2. 1.1.1.2. 1.1.2 Basic Concepts in Probability
        3. 1.1.1.3. 1.1.3 Random Variables and Joint Distributions
        4. 1.1.1.4. 1.1.4 Independence and Conditional Independence
        5. 1.1.1.5. 1.1.5 Querying a Distribution
        6. 1.1.1.6. 1.1.6 Continuous Space
        7. 1.1.1.7. 1.1.7 Expectation and Variance
      2. 1.1.2. 1.2 Graphs
        1. 1.1.2.1. 1.2.1 Nodes and Edges
        2. 1.1.2.2. 1.2.2 Subgraphs
        3. 1.1.2.3. 1.2.3 Paths and Trails
        4. 1.1.2.4. 1.2.4 Cycles and Loops
  2. 2. Representation
    1. 2.1. 2. The Bayesian Network Representation
      1. 2.1.1. 2.1 Exploting Independence Properties
        1. 2.1.1.1. 2.1.1 Independent Random Variables
        2. 2.1.1.2. 2.1.2 The Conditional Parameterization
        3. 2.1.1.3. 2.1.3 The Naive Bayes Model
      2. 2.1.2. 2.2 Bayesian Network
        1. 2.1.2.1. 2.2.1 The Student Example Revistited
        2. 2.1.2.2. 2.2.2 Basic Independencies in Bayesian Networks
        3. 2.1.2.3. 2.2.3 Graphs and Distributions
      3. 2.1.3. 2.3 Independencies in Graphs
        1. 2.1.3.1. 2.3.1 D-separation
        2. 2.1.3.2. 2.3.2 Soundness and Completeness
        3. 2.1.3.3. 2.3.3 An Algorithm for d-Separation
        4. 2.1.3.4. 2.3.4 I-Equivalence
      4. 2.1.4. 2.4 From Distributions to Graphs
        1. 2.1.4.1. 2.4.1 Minimal I-Maps
        2. 2.1.4.2. 2.4.2 Perfect Maps
        3. 2.1.4.3. 2.4.3 Finding Perfect Maps (#)
    2. 2.2. 3. Undirected Graphical Models
      1. 2.2.1. 3.1 The Misconception Example
      2. 2.2.2. 3.2 Parameterization
        1. 2.2.2.1. 3.2.1 Factors
        2. 2.2.2.2. 3.2.2 Gibbs Distributions and Markov Networks
        3. 2.2.2.3. 3.2.3 Reduced Markov Networks
      3. 2.2.3. 3.3 Markov Netwrok Independencies
        1. 2.2.3.1. 3.3.1 Basic Independencies
        2. 2.2.3.2. 3.3.2 Independencies Revisited
        3. 2.2.3.3. 3.3.3 From Distributions to Graphs
      4. 2.2.4. 3.4 Parameterization Revisited
        1. 2.2.4.1. 3.4.1 Finer-Grained Parameterization
        2. 2.2.4.2. 3.4.2 Overparameterization
      5. 2.2.5. 3.5 Bayesian Networks and Markov Networks
        1. 2.2.5.1. 3.5.1 From Bayesian Networks to Markov Networks
        2. 2.2.5.2. 3.5.2 From Markov Networks to Bayesian Networks
        3. 2.2.5.3. 3.5.3 Chordal Graphs
      6. 2.2.6. 3.6 Partially Directed Models
        1. 2.2.6.1. 3.6.1 Conditional Random Fields
        2. 2.2.6.2. 3.6.2 Chain Graph Models (#)
    3. 2.3. 4. Local Probabilistic Models
      1. 2.3.1. 4.1 Tabular CPDs
      2. 2.3.2. 4.2 Deterministic CPDs
      3. 2.3.3. 4.3 Context-Specific CPDs
      4. 2.3.4. 4.4 Independence of Causal Influence
        1. 2.3.4.1. 4.4.1 The Noisy-Or Model
        2. 2.3.4.2. 4.4.2 Generalized Linear Models
        3. 2.3.4.3. 4.4.3 The General Formulation
      5. 2.3.5. 4.5 Continuous Variables
        1. 2.3.5.1. 4.5.1 Hybrid Models
        2. 2.3.5.2. 4.6 Conditional Bayesian Networks
    4. 2.4. 5. Template-Based Representations
      1. 2.4.1. 5.1 Temporal Models
        1. 2.4.1.1. 5.1.1 Basic Assumptions
        2. 2.4.1.2. 5.1.2 Dynamic Bayesian Networks
        3. 2.4.1.3. 5.1.3 State-Observation Models
      2. 2.4.2. 5.2 Template Variables and Template Factors
      3. 2.4.3. 5.3 Directed Probabilistic Models for Object-Relational Domains
        1. 2.4.3.1. 5.3.1 Plate Models
        2. 2.4.3.2. 5.3.2 Probabilistic Realtional Models
      4. 2.4.4. 5.4 Undirected Representation
      5. 2.4.5. 5.5 Structural Uncertainty (#)
        1. 2.4.5.1. 5.5.1 Relational Uncertainty
        2. 2.4.5.2. 5.5.2 Object Uncertainty
    5. 2.5. 6. Gaussian Network Models
      1. 2.5.1. 6.1 Multivariate Gaussians
        1. 2.5.1.1. 6.1.1 Basic Parameterization
        2. 2.5.1.2. 6.1.2 Operations on Gaussians
        3. 2.5.1.3. 6.1.3 Independencies in Gaussians
      2. 2.5.2. 6.2 Gaussian Baysian Networks
      3. 2.5.3. 6.3 Gaussian Markov Random Fiels
    6. 2.6. 7. The Exponential Family
      1. 2.6.1. 7.1 Exponential Families
        1. 2.6.1.1. 7.1.1 Linear Exponential Families
    7. 2.7. 7.2 Factored Exponential Families
      1. 2.7.1. 7.2.1 Product Distributions
      2. 2.7.2. 7.2.2 Bayesian Networks
    8. 2.8. 7.3 Entropy and Relative Entropy
      1. 2.8.1. 7.3.1 Entropy
      2. 2.8.2. 7.3.2 Relative Entropy
  3. 3. Infernece
    1. 3.1. 8. Exact Inference: Variable Elimination
      1. 3.1.1. 8.1 Analysis of Complexity
        1. 3.1.1.1. 8.1.1 Analysis of Exact Inference
        2. 3.1.1.2. 8.1.2 Analysis of Approximate Infernece
      2. 3.1.2. 8.2 Variable Elimination: The Basic Ideas
      3. 3.1.3. 8.3 Variable Elimination
        1. 3.1.3.1. 8.3.1 Basic Elimination
        2. 3.1.3.2. 8.3.2 Dealing with Evidence
      4. 3.1.4. 8.4 Complexity and Graph Structure: Variable Elimination
        1. 3.1.4.1. 8.4.1 Simple Analysis
        2. 3.1.4.2. 8.4.2 Graph-Theoretic Analysis
        3. 3.1.4.3. 8.4.3 Finding Elimination Orderings (#)
      5. 3.1.5. 8.5 Conditioning (#)
        1. 3.1.5.1. 8.5.1 The Conditioning Algorithm
        2. 3.1.5.2. 8.5.2 Conditioning and Varialble Elimination
        3. 3.1.5.3. 8.5.3 Graph-Theoretic Analysis
        4. 3.1.5.4. 8.5.4 ImProved Conditioning
      6. 3.1.6. 8.5 Inference with Structured CPDs (#)
        1. 3.1.6.1. 8.5.1 Independence of Causal Influence
        2. 3.1.6.2. 8.5.2 Context-Specific Independence
    2. 3.2. 9. Exact Inference: Clique Trees
      1. 3.2.1. 9.1 Variable Elimination and Clique Trees
        1. 3.2.1.1. 9.1.1 Cluster Graphs
        2. 3.2.1.2. 9.1.2 Clique Trees
      2. 3.2.2. 9.2 Message Passing: Sum Product
        1. 3.2.2.1. 9.2.1 Variable Elimination in a Clique Tree
        2. 3.2.2.2. 9.2.2 Clique Tree Calibration
        3. 3.2.2.3. 9.2.3 A Calibrated Clique Tree as a Distribution
      3. 3.2.3. 9.3 Message Passing: Belief Update
        1. 3.2.3.1. 9.3.1 Message Passing with Division
        2. 3.2.3.2. 9.3.2 Equivalence of Sum-Product and Belief Update Messages
        3. 3.2.3.3. 9.3.3 Answering Queries
      4. 3.2.4. 9.4 Constructiong a Clique Tree
        1. 3.2.4.1. 9.4.1 Clique Trees from Variable Elimination
        2. 3.2.4.2. 9.4.2 Clique Trees form Chordal Graphs
    3. 3.3. 10. Inference as Optimization
      1. 3.3.1. 10.1 Introduction
        1. 3.3.1.1. 10.1.1 Exact Inference Revisited (#)
        2. 3.3.1.2. 10.1.2 The Energy Functional
        3. 3.3.1.3. 10.1.3 Optimizing the Energy Functional
      2. 3.3.2. 10.2 Exact Inference as Optimization
        1. 3.3.2.1. 10.2.1 Fixed-Poing Characterization
        2. 3.3.2.2. 10.2.2 Inference as Optimization
      3. 3.3.3. 10.3 Propagation-Based Approximation
        1. 3.3.3.1. 10.3.1 A Simple Example
        2. 3.3.3.2. 10.3.2 Cluster-Graph Belief Progagation
        3. 3.3.3.3. 10.3.3 Properties of Cluster-Graph Belief Propagation
        4. 3.3.3.4. 10.3.4 Analyzing Convergence (#)
        5. 3.3.3.5. 10.3.5 Constructiong Cluster Graphs
        6. 3.3.3.6. 10.3.6 Variatioanal Analysis
        7. 3.3.3.7. 10.3.7 Other Enrtopy Approximations (#)
      4. 3.3.4. 10.4 Propagation with Approximate Messages (#)
        1. 3.3.4.1. 10.4.1 Facotrized Messages
        2. 3.3.4.2. 10.4.2 Approximate Message Computation
        3. 3.3.4.3. 10.4.3 Inference with Approximate Messages
        4. 3.3.4.4. 10.4.4 Expectation Propagation
        5. 3.3.4.5. 10.4.5 Variational Analysis
      5. 3.3.5. 10.5 Structured Variatioanl Approximations
        1. 3.3.5.1. 10.5.1 The Mean Field Approximation
        2. 3.3.5.2. 10.5.2 Structured Approximations
        3. 3.3.5.3. 10.5.3 Local Variational Methods (#)
    4. 3.4. 11. Particle-Based Approximate Inference
      1. 3.4.1. 11.1 Forward Sampling
        1. 3.4.1.1. 11.1.1 Sampling from a Bayesian Network
        2. 3.4.1.2. 11.1.2 Analysis of Error
        3. 3.4.1.3. 11.1.3 Conditional Probability Queries
      2. 3.4.2. 11.2 Likelihood Weighting and Importance Sampling
        1. 3.4.2.1. 11.2.1 Likelihood Weighting: Intuition
        2. 3.4.2.2. 11.2.2 Importance Sampling
        3. 3.4.2.3. 11.2.3 Importance Sampling for Bayesian Networks
        4. 3.4.2.4. 11.2.4 Importance Sampling Revisisted
      3. 3.4.3. 11.3 Markov Chain Monte Carlo Mehtods
        1. 3.4.3.1. 11.3.1 Gibbs Sampling Algorithm
        2. 3.4.3.2. 11.3.2 Markov Chains
        3. 3.4.3.3. 11.3.3 Gibbs Sampling Revisited
        4. 3.4.3.4. 11.3.4 A Broader Class of Markov Chains (#)
        5. 3.4.3.5. 11.3.4 Using a Markov Chain
      4. 3.4.4. 11.4 Collapsed Particles
        1. 3.4.4.1. 11.4.1 Collapsed Likelihood Weighting (#)
        2. 3.4.4.2. 11.4.2 Collapsed MCMC
        3. 3.4.4.3. 11.5 Deterministic Search Methods (#)
    5. 3.5. 12. MAP Inference
      1. 3.5.1. 12.1 Overview
        1. 3.5.1.1. 12.1.1 Computational Complexity
        2. 3.5.1.2. 12.1.2 Overview of Solution Methods
      2. 3.5.2. 12.2 Variable Elimination for (Marginal) MAP
        1. 3.5.2.1. 12.2.1 Max-Product Variable Elimination
        2. 3.5.2.2. 12.2.2 Finding the Most Probalbe Assignment
        3. 3.5.2.3. 12.2.3 Variable Elimination for Marginal MAP (#)
      3. 3.5.3. 12.3 Max-Product in Clique Trees
        1. 3.5.3.1. 12.3.1 Computing Max-Marginals
        2. 3.5.3.2. 12.3.2 Message Passing as Reparameterization
        3. 3.5.3.3. 12.3.3 Decoding Max-Marginals
      4. 3.5.4. 12.4 Max-Product Belief Propagation in Loopy Cluster Graphs
        1. 3.5.4.1. 12.4.1 Standard Max-Product Message Passing
        2. 3.5.4.2. 12.4.2 Max-Product BP with Counting Numbers (#)
      5. 3.5.5. 12.5 MAP as a Linear Optimization Problem
        1. 3.5.5.1. 12.5.1 The Integer Program Formulation
        2. 3.5.5.2. 12.5.2 Linear Programming Relaxation
        3. 3.5.5.3. 12.5.3 Low-Temperature Limits
      6. 3.5.6. 12.6 Using Graph Cuts for MAP
        1. 3.5.6.1. 12.6.1 Inference Using Graph Cuts
        2. 3.5.6.2. 12.6.2 Nonbinary Variables
      7. 3.5.7. 12.7 Local Search Algorithms
    6. 3.6. 13. Inference in Hybrid Networks
      1. 3.6.1. 13.1 Variable Elimination in Gaussian Networks
        1. 3.6.1.1. 13.1.1 Canonical Forms
        2. 3.6.1.2. 13.1.2 Sum-Product Algorithms
        3. 3.6.1.3. 13.1.3 Gaussian Belief Propagation
      2. 3.6.2. 13.2 Hybird Networks
        1. 3.6.2.1. 13.2.1 The Difficulities
        2. 3.6.2.2. 13.2.2 Factor Operations for Hybrid Gaussian Networks
        3. 3.6.2.3. 13.2.3 EP for CLG Networks
        4. 3.6.2.4. 13.2.4 An “Exact” ClG Algorithm (#)
      3. 3.6.3. 13.3 Nonlinear Dependencies
        1. 3.6.3.1. 13.3.1 Linearization
        2. 3.6.3.2. 13.3.2 Expectation Propagation with Gaussian Approximation
      4. 3.6.4. 13.4 Particle-Based Approximation Methods
        1. 3.6.4.1. 13.4.1 Sampling in Continuous Spaces
        2. 3.6.4.2. 13.4.2 Forwrad Sampling in Bayesian Networks
        3. 3.6.4.3. 13.4.3 MCMC Methods
        4. 3.6.4.4. 13.4.4 Collapsed Particles
        5. 3.6.4.5. 13.4.5 Nonparametric Message Passing
    7. 3.7. 14. Inference in Temporal Models
      1. 3.7.1. 14.1 Exact Inference
        1. 3.7.1.1. 14.1.1 Filtering in State-Observation Models
        2. 3.7.1.2. 14.1.2 Filtering as Clique Tree Propagation
        3. 3.7.1.3. 14.1.3 Clique Tree Inference in DBNs
        4. 3.7.1.4. 14.1.4 Entanglement
      2. 3.7.2. 14.2 Approximate Inference
        1. 3.7.2.1. 14.2.1 Key Ideas
        2. 3.7.2.2. 14.2.2 Factored Belief State Methods
        3. 3.7.2.3. 14.2.3 Particle Filtering
        4. 3.7.2.4. 14.2.4 Deterministic Search Techniques
      3. 3.7.3. 14.3 Hybrid DBNs
        1. 3.7.3.1. 14.3.1 Continuous Models
        2. 3.7.3.2. 14.3.2 Hybrid Models
    8. 3.8. 15. Learning Graphical Models: Overview
      1. 3.8.1. 15.1 Goal of Learning
        1. 3.8.1.1. 15.1.1 Density Estimation
        2. 3.8.1.2. 15.1.2 Specific Prediction Tasks
        3. 3.8.1.3. 15.1.3 Knowledge Discovery
      2. 3.8.2. 15.2 Learning as Optimization
        1. 3.8.2.1. 15.2.1 Empirical Risk as Overfitting
        2. 3.8.2.2. 15.2.2 Discriminative versus Generative Training
      3. 3.8.3. 15.3 Learning Tasks
        1. 3.8.3.1. 15.3.1 Model Constraints
        2. 3.8.3.2. 15.3.2 Data Observability
        3. 3.8.3.3. 15.3.3 Taxonmomy of Learning Tasks
    9. 3.9. 16. Parameter Estimation
      1. 3.9.1. 16.1 Maximum Likelihood Estimation
        1. 3.9.1.1. 16.1.1 The Thumbtack Example
        2. 3.9.1.2. 16.1.2 The Maximum Likelihood Principle
      2. 3.9.2. 16.2 MLE for Bayesian Networks
        1. 3.9.2.1. 16.2.1 A Simple Example
        2. 3.9.2.2. 16.2.2 Global Likelihood Decomposition
        3. 3.9.2.3. 16.2.3 Table-CPDs
        4. 3.9.2.4. 16.2.4 Gaussian Bayesian Networks (#)
        5. 3.9.2.5. 16.2.5 Maximum Likelihood Estimation as M-Projection
      3. 3.9.3. 16.3 Bayesian Parameter Estimation
        1. 3.9.3.1. 16.3.1 Parameter Independence and Global Decompotion
        2. 3.9.3.2. 16.3.2 Local Decompostion
        3. 3.9.3.3. 16.3.3 Priors for Baysian Network Learning
        4. 3.9.3.4. 16.3.4 MAP Estimation (#)
      4. 3.9.4. 16.4 Learning Models with Shared Parameters
        1. 3.9.4.1. 16.4.1 Global Parameter Sharing
        2. 3.9.4.2. 16.4.2 Local Parameter Sharing
        3. 3.9.4.3. 16.4.3 Bayesian Inference with Shared Parameters
        4. 3.9.4.4. 16.4.4 Hierarchical Priors (#)
      5. 3.9.5. 16.5 Generalization Analysis
        1. 3.9.5.1. 16.5.1 Asymptotic Analysis
        2. 3.9.5.2. 16.5.2 PAC-Bounds
    10. 3.10. 17. Structure Learning in Baysian Networks
      1. 3.10.1. 17.1 Constraint-Based Approcaches
        1. 3.10.1.1. 17.1.1 General Framework
        2. 3.10.1.2. 17.1.2 Independence Tests
      2. 3.10.2. 17.2 Structure Scores
        1. 3.10.2.1. 17.2.1 Likelihood Scores
        2. 3.10.2.2. 17.2.2 Bayesian Score
        3. 3.10.2.3. 17.2.3 Marginal Likelihood for a Single Variable
        4. 3.10.2.4. 17.2.4 Bayesian Score for Bayesian Networks
        5. 3.10.2.5. 17.2.4 Understanding the Bayesian Score
        6. 3.10.2.6. 17.2.5 Priors
        7. 3.10.2.7. 17.2.6 Score Equivalence (#)
      3. 3.10.3. 17.3 Strucutre Search
        1. 3.10.3.1. 17.3.1 Learning Tree-Structured Networks
        2. 3.10.3.2. 17.3.2 Known Order
        3. 3.10.3.3. 17.3.3 General Graphs
        4. 3.10.3.4. 17.3.4 Learning with Equivalence Classes (#)
      4. 3.10.4. 17.4 Bayesian Model Averaging (#)
        1. 3.10.4.1. 17.4.1 Basic Theory
        2. 3.10.4.2. 17.4.2 Model AVeraging Given an Order
        3. 3.10.4.3. 17.4.3 The General Case
      5. 3.10.5. 17.5 Learning Models with Additional Structure
        1. 3.10.5.1. 17.5.1 Learning with Local Structure
        2. 3.10.5.2. 17.5.2 Learning Template Models
    11. 3.11. 18. Partially Observed Data
      1. 3.11.1. 18.1 Foundations
        1. 3.11.1.1. 18.1.1 Likelihood of Data and Observation Models
        2. 3.11.1.2. 18.1.2 Decoupling of Observation Mechanism
        3. 3.11.1.3. 18.1.3 The Likelihood Function
        4. 3.11.1.4. 18.1.4 Identifiability
      2. 3.11.2. 18.2 Parameter Estimation
        1. 3.11.2.1. 18.2.1 Gradient Ascent
        2. 3.11.2.2. 18.2.2 Expectation Maximization (EM)
        3. 3.11.2.3. 18.2.3 Comparison: Gradient Ascent versus EM
        4. 3.11.2.4. 18.2.4 Approximate Inference (#)
      3. 3.11.3. 18.3 Bayesian Learning with Incomplete Data (#)
        1. 3.11.3.1. 18.3.1 Overview
        2. 3.11.3.2. 18.3.2 MCMC Sampling
        3. 3.11.3.3. 18.3.3 Variational Bayesian Learning
      4. 3.11.4. 18.4 Structure Learning
        1. 3.11.4.1. 18.4.1 Scoring Structures
        2. 3.11.4.2. 18.4.2 Structure Search
        3. 3.11.4.3. 18.4.3 Structural EM
      5. 3.11.5. 18.5 Learning Models with Hideen Variables
        1. 3.11.5.1. 18.5.1 Information Content of Hidden Varialbles
        2. 3.11.5.2. 18.5.2 Determining the Cardinality
        3. 3.11.5.3. 18.5.3 Introducing Hidden Variables
    12. 3.12. 19. Learning Undirected Models
      1. 3.12.1. 19.1 Overview
      2. 3.12.2. 19.2 The Likelihood Function
        1. 3.12.2.1. 19.2.1 An Example
        2. 3.12.2.2. 19.2.2 Form of the Likelihood Function
        3. 3.12.2.3. 19.2.3 Properties of the Likelihood Function
      3. 3.12.3. 19.3 Maximum (Conditiaonal) Likelihood Parameter Estimation
        1. 3.12.3.1. 19.3.1 Maximum Likelihood Estimation
        2. 3.12.3.2. 19.3.2 Conditionally Trained Models
        3. 3.12.3.3. 19.3.3 Learning with Missing Data
        4. 3.12.3.4. 19.3.4 Maximum Entropy and Maximum Likelihood (#)
      4. 3.12.4. 19.4 Parameter Priors and Regularization
        1. 3.12.4.1. 19.4.1 Local Priors
        2. 3.12.4.2. 19.4.2 Global Priors
      5. 3.12.5. 19.5 Learning with Approximate Inference
        1. 3.12.5.1. 19.5.1 Belief Propagation
        2. 3.12.5.2. 19.5.2 Map-Based Learning (#)
      6. 3.12.6. 19.6 Alternative Objectives
        1. 3.12.6.1. 19.6.1 Pseudolikelihood and Its Generalizations
        2. 3.12.6.2. 19.6.2 Constrastive Optimization Criteria
      7. 3.12.7. 19.7 Structure Learning
        1. 3.12.7.1. 19.7.1 Structure Learning Using Independence Tests
        2. 3.12.7.2. 19.7.2 Score-Based Learning: Hypothesis Spaces
        3. 3.12.7.3. 19.7.3 Objective Functions
        4. 3.12.7.4. 19.7.4 Optimization Task
        5. 3.12.7.5. 19.7.5 Evaluating Changes to the Model
    13. 3.13. 20. Causality
      1. 3.13.1. 20.1 Motivation and Overview
        1. 3.13.1.1. 20.1.1 Conditioning and Intervention
        2. 3.13.1.2. 20.1.2 Correlation and Causation
      2. 3.13.2. 20.2 Causal Models
        1. 3.13.2.1. 20.3 Structural Causal Identifiability
        2. 3.13.2.2. 20.3.1 Query Simplification Rules
        3. 3.13.2.3. 20.3.2 Interated Query Simplification
      3. 3.13.3. 20.4 Mechanisms and Response Variables (#)
      4. 3.13.4. 20.5 Partial Identifiability in Functional Causal Models (#)
      5. 3.13.5. 20.6 Counterfactual Queries (#)
        1. 3.13.5.1. 20.6.1 Twinned Netwroks
        2. 3.13.5.2. 20.6.2 Bounds on Counterfactual Queries
      6. 3.13.6. 20.7 Learning Causal Models
        1. 3.13.6.1. 20.7.1 Learning Causal Models without Confounding Factors
        2. 3.13.6.2. 20.7.2 Learning from Interventional Data
        3. 3.13.6.3. 20.7.3 Dealing with Latent Variables (#)
        4. 3.13.6.4. 20.7.4 Learning Functional Causal Models (#)
    14. 3.14. 21. Utilities and Decisions
      1. 3.14.1. 21.1 Foundations: Maximizing Expected Utility
        1. 3.14.1.1. 21.1.1 Decision Making Under Uncertainty
        2. 3.14.1.2. 21.1.2 Theoretical Justification (#)
      2. 3.14.2. 21.2 Utility Curves
        1. 3.14.2.1. 21.2.1 Utility of Money
        2. 3.14.2.2. 21.2.2 Attitudes Toward Risk
        3. 3.14.2.3. 21.2.3 Rationality
      3. 3.14.3. 21.3 Utility Elicitation
        1. 3.14.3.1. 21.3.1 Utility Elicitation Procedures
        2. 3.14.3.2. 21.3.2 Utility of Human Life
      4. 3.14.4. 21.4 Utilities of Complex Outcomes
        1. 3.14.4.1. 21.4.1 Preference and Utility Independence (#)
        2. 3.14.4.2. 21.4.2 Additive Independence Properties
    15. 3.15. 22. Structured Decision Problems
      1. 3.15.1. 22.1 Decision Trees
        1. 3.15.1.1. 22.1.1 Representation
        2. 3.15.1.2. 22.1.2 Backward Induction Algorithm
      2. 3.15.2. 22.2 Influence Diagrams
        1. 3.15.2.1. 22.2.1 Basic Representation
        2. 3.15.2.2. 22.2.2 Decision Rules
        3. 3.15.2.3. 22.2.3 Semantics and Optimality Criterion
      3. 3.15.3. 22.3 Backward Induction in Influence Diagrams
        1. 3.15.3.1. 22.3.1 Decision Trees for Influence Diagrams
        2. 3.15.3.2. 22.3.2 Sum-Max-Sum Rule
      4. 3.15.4. 22.4 Computing Expected Utilities
        1. 3.15.4.1. 22.4.1 Simple Variable Elimination
        2. 3.15.4.2. 22.4.2 Multiple Utility Variables: Simple Approaches
        3. 3.15.4.3. 22.4.3 Generalized Variable Elimination (#)
      5. 3.15.5. 22.5 Optimization in Influence Diagrams
        1. 3.15.5.1. 22.5.1 Optimizaing a Single Decision Rule
        2. 3.15.5.2. 22.5.2 Iterated Optimization Algorithm
        3. 3.15.5.3. 22.5.3 Strategic Relevance and Global Optimality
      6. 3.15.6. 22.6 Ignoring Irrelevant Information
      7. 3.15.7. 22.7 Value of Information
        1. 3.15.7.1. 22.7.1 Single Observations
        2. 3.15.7.2. 22.7.2 Multiple Observations