Durham Symposia Logo
Online Talks

Durham Cathedral

London Mathematical Society Durham Symposium
Mathematical Aspects of Graphical Models
Monday 30th June - Thursday 10th July 2008
List of abstracts
David Barber (UCL, London)Saturday 5th July 17:20
Graph decomposition for community identification and covariance constraints
We introduce Clique Matrices as an alternative representation of undirected graphs, being a generalisation of the incidence matrix representation. Here we use clique matrices to decompose a graph into a set of possibly overlapping clusters, defined as well-connected subsets of vertices. The decomposition is based on a statistical description which encourages clusters to be well connected and few in number. Inference is carried out using a variational approximation. Clique matrices also play a natural role in parameterising positive definite matrices under zero constraints on elements of the matrix. We show that clique matrices can parameterise all positive definite matrices restricted according to a decomposable graph and form a structured Factor Analysis approximation in the non-decomposable case. Example applications are given in relational machine learning and max clique.

Sophie Barthel (Department of Biostatistics)Friday 4th July 11:15
Predicting survival using gene expression data and graphical models
F M-S Barthel, V. Didelez, J. Hein

Graphical models are frequently used to predict gene networks from microarray data (Markowetz & Spang, 2007). Our aim is to extend these methods to allow prediction of survival times (or time to other disease related events) from the graphical model.

An extensive literature search surrounding graphical models and their use for microarray data was conducted and results of this are presented. Currently, there are no methods available in the literature formally combining survival prediction and graphical models based on gene expression data.

In a first step we used data from the Van de Vijver et al. (2002) breast cancer analysis. In this study, a previously derived 70 gene signature predicting poor and good performing patients was further evaluated. We created a graphical model of the data based on Schäfer et al. (2006) and explored differences in this model for good and poor prognosis patients. Extensions of this work are discussed.


F. Markowetz and R. Spang. Inferring cellular network – a review. BMC Bioinformatics. 8:S5, 2007.

M. Van de Vijver, Y. He, L. Van’t Veer et al.. A gene expression signature as a predictor of survival in breast cancer. NEJM. 347:1999-2009, 2002.

J. Schäfer, R. Opgen-Rhein and K. Strimmer. Reverse engineering genetic networks using the genenet package. R News, 6:50-53, 2006.

Julian Besag (Universities of Bath and Washington)Saturday 5th July 09:30
Some statistical applications of constrained Monte Carlo
Continuum limits of Gaussian Markov fields resolving the conflicts with geostatistics
(Joint work with Debashis Mondal (Universities of Chicago and Washington))
For more than 30 years, Markov random fields (equivalently, graphical models with undirected edges) have been used with some success to account for spatial variation. This talk concentrates on Gaussian intrinsic versions on regular arrays and provides some examples from a Bayesian perspective. Recent limiting results by the authors (Biometrika, 2005) provide a surprising reconciliation with the continuum geostatistical approach, via the conformally invariant de Wijs process, with realizations that are region-based. Convergence to de Wijs is extremely rapid, which is important in practice. The de Wijs process is also shown to be a 2-d extension of Brownian motion.

Charles Blundell (University of York)Friday 4th July 11:15
Learning Causal Bayesian Networks via Time Orderings

Robert Cowell (City, London)Tuesday 1st July 16:00
A pot-pourri of Bayesian network learning methods
I present a brief summary of some approaches (mostly unpublished) I have worked on, that are directed towards learning the structure of Bayesian networks from a (large) dataset.

Most of the talk will concentrate on learning discrete networks, but I will finish with a speculative method for testing for conditional independence for application to learning continuous Bayesian networks.

David Cox (University of Oxford)Wednesday 9th July 11:15
Derived variables and graphical models
In the usual representation each node in a graph represents a variable, either observed or latent. There are a number of different reasons why it may be helpful statistically to define new variables, implying new nodes, that are deterministic transformations of previous existing variables. A number of such situations are outlined; the objective typically is to simplify the resulting graphical structure.

James Cussens (University of York)Monday 7th July 17:20
Bayesian network learning by compiling to weighted MAX-SAT
The problem of learning discrete Bayesian networks from data is encoded as a weighted MAX-SAT problem and the MaxWalkSat local search algorithm is used to address it. For each dataset, the per-variable summands of the (BDeu) marginal likelihood for different choices of parents - family scores - are computed prior to applying MaxWalkSat. Each permissible choice of parents for each variable is encoded as a distinct propositional atom and the associated family score encoded as a soft weighted single-literal clause. Two approaches to enforcing acyclicity are considered: either by encoding the ancestor relation or by attaching a total order to each graph and encoding that. The latter approach gives better results. Learning experiments have been conducted on 21 synthetic datasets sampled from 7 BNs. The largest dataset has 10,000 datapoints and 60 variables producing (for the ancestor encoding) a weighted CNF input file with 19,932 atoms and 269,367 clauses. For most datasets, MaxWalkSat quickly finds BNs with higher BDeu score than the true BN. The effect of adding prior information is assessed. It is further shown that Bayesian model averaging can be effected by collecting BNs generated during the search.

Philip Dawid (University of Cambridge)Wednesday 2nd July 09:30
Using influence diagrams for causal inference
Directed acyclic graphs have long been used to represent conditional independence relationships between random variables. Even though these relationships are not directional, the arrows in a DAG clearly are, and they are often interpreted, implicitly or explicitly, rightly or wrongly, as representing asymmetric causal relationships. However there is more than one way to make these revised semantics explicit. I shall review these, and argue for the value of representations by means of influence diagram. Specific applications treated may include (un)confounding, identification of optimal dynamic treatment strategies, and the interpretation of "the effect of treatment on the treated".

Vanessa Didelez (University of Bristol)Wednesday 2nd July 11:15
G-Formula, Inverse Probability of Treatment Weighting and Optimal Sequential Treatments
Inverse probability weighting is a common method when dealing with missing values. It is also useful for causal inference, which can be regarded as a missing data problem because the outcome of an individual had he/she received a different treatment from the one he/she actually received is missing. I will show how inverse probability weighting for causal inference can be motivated from a different point of view, based on a graphical approach and a notion of causality based on interventions. The case of sequential treatments will be paid particular attention.

Adrian Dobra (University of Washington)Tuesday 8th July 11:15
Bayesian structural learning and estimation in Gaussian graphical models and hierarchical log-linear models
(Joint work with Hélène Massam)
We describe a novel stochastic search algorithm for rapidly identifying regions of high posterior probability in large spaces of candidate models. This approach relies on the existence of a method to accurately and efficiently approximate the marginal likelihood associated with a model when it cannot be computed in closed form. To this end, we develop a new Laplace approximation method to the normalizing constant of a G-Wishart distribution associated with a Gaussian graphical model. We propose a similar method for computing the marginal likelihood of a hierarchical log-linear model based on the Diaconis-Ylvisaker conjugate prior for log-linear parameters. We show the efficiency of our methodology with respect to Markov chain Monte Carlo techniques and with other stochastic search algorithms proposed in the literature.

Mathias Drton (University of Chicago)Tuesday 1st July 11:15
Graphical methods for efficient likelihood inference in Gaussian covariance models
In graphical modelling, a bi-directed graph encodes marginal independences among random variables that are identified with the vertices of the graph. We show how to transform a bi-directed graph into a maximal ancestral graph that (i) represents the same independence structure as the original bi-directed graph, and (ii) minimizes the number of arrowheads among all ancestral graphs satisfying (i). Here the number of arrowheads of an ancestral graph is the number of directed edges plus twice the number of bi-directed edges. In Gaussian models, this construction can be used for more efficient iterative maximization of the likelihood function and to determine when maximum likelihood estimates are equal to empirical counterparts.

Michael Eichler (University of Maastricht)Monday 7th July 09:30
Learning causal structures in multivariate time series
In time series analysis, inference about cause-effect relationships among multiple time series is commonly based on the concept of Granger causality, which exploits temporal structure to achieve causal ordering of dependent variables. One major and well known problem in the application of Granger causality for the identification of causal relationships is the possible presence of latent variables that affect the measured components and thus lead to so-called spurious causalities. In this paper, we present a new graphical approach for describing and analysing Granger-causal relationships in multivariate time series that are possibly affected by latent variables. It is based on mixed graphs in which directed edges represent direct influences among the variables while dashed edges---directed or undirected---indicate associations that are induced by latent variables. We show how such representations can be used for inductive causal learning from time series and discuss the underlying assumptions and their implications for causal learning.

Jonathan Forster (Southampton University)Monday 7th July 11:15
Bayesian (conditionally) conjugate inference for discrete data models
In this talk, I will discuss Bayesian inference for graphical and other related models for categorical data. In particular I will focus on the conditional Dirichlet distribution, a special case of the general conjugate family considered by Massam et al (2008). Propeties of this distribution will be discussed, and computational issues addressed.

Michael Goldstein (Durham University)Wednesday 9th July 16:00
Bayes linear graphical models and computer simulators for complex physical systems
This talk summarises some of the basic aspects of the Bayes linear formulation for graphical models, and describes a variety of roles for such graphical structures in uncertainty analysis for complex physical systems, based on the use of computer simulators for such systems

Soren Hojsgaard (Aarhus University)Tuesday 8th July 16:40
RCOX models: Graphical Gaussian models with edge and vertex symmetries
RCOX models were originally motivated from analyzing "large d small n"-data (many responses but only few replicates). RCOX models are graphical Gaussian models where focus is on obtaining parsimony by restricting specific parameters to being equal. There are two types of RCOX models: RCON models in which specific concentration parameters are restricted to being identical and RCOR model in which specific partical correlation / partial variance parameters are restricted to being identical. These restrictions can be represented by colouring vertices and edges in the interaction graph in accordance with the restrictions on the parameters. We discuss the models, some of their properties and estimation methods for the models. If time permits we will also demonstrate the gRc-package (for R) for inference in these models. (This is joint work with Steffen Lauritzen)

Sung-Ho Kim (Korea Advanced Institute of Science and Technology)Saturday 5th July 16:00
Markovian combination of decomposable model structures: MCMoSt
(Joint work with Sangjin Lee)
Suppose that we are interested in modeling for a random vector X and that we are given a set of graphical decomposable models, G1,..., Gm, for subvectors of X each of which share some variables with at least one of the other models. Under the assumption that the model of X is graphical and decomposable, we propose an approach of searching for model structures of X based on the given decomposable graphical models. A main idea in this approach is that we combine G1,..., Gm using graphs of prime separators. When the true graphical model for the whole data is decomposable, prime separators in a marginal model are also prime separators in a maximal combined model of the marginal models. This property plays a key role in model-combination. The proposed approach is applied to a simulated data set of 40 binary variables and to a model of 100 variables for illustration.

Some keywords: combined model structure; decomposable graph; edge-subgraph; graph-separateness; interaction graph; Markovian subgraph; prime separator

Steffen Lauritzen (University of Oxford)Tuesday 8th July 17:20
Issues of existence of maximum likelihood estimators in Gaussian graphical models
Gaussian (undirected models) correspond to linear exponential families and have as such unique maximum likelihood estimates when these exist. Generally, the estimates exist when the number of observations is sufficiently high compared to the number of variables under study. When using graphical models to describe dependence structures for high-dimensional data the number of observations generally only suffices when sparse models are entertained. The lecture discusses existing results and open problems concerning this issue; in particular we investigate how symmetry assumptions affect these results.

Gérard Letac (Toulouse)Saturday 5th July 16:40
The tiling of a junction tree by the minimal separators and its application to Wishart laws on non homogeneous decomposable graphs
(Joint work with Hélène Massam)
An undirected graph models centered Gaussian variates such that certain pairs of them are uncorrelated when conditioned by the remainder of these variates: the edges of the graph are the other ones, the possibly correlated pairs. The graph is decomposable if it does not contain any induced cycle of size larger or equal to four. The graph is homogeneous if it is decomposable and if it does not contains any induced graph which is a chain with 4 vertices and 3 edges. The cliques of an undirected graph are the maximal complete subsets. A junction tree for the graph has the set of cliques as the set of its vertices and is such that the clique C is on the unique path from A to B only if the intersection of the cliques A and B is contained in C. Actually a graph has a junction tree if and only if it is decomposable.

A minimal separator of a decomposable graph is a set of vertices S such that there exists a pair (a,b) of unrelated vertices such that any path from a to b contains an element of S, and S is minimal with respect to this property.

Given a junction tree T of the graph we associate a subtree TS to each minimal separator S such that the sets of edges of TS (when S runs all possible minimal separators of the graph) form a partition of the edges of T. More specifically if AB is an edge of T then AB is an edge of TS if and only if S is the intersection of the cliques A and B. This tiling of the junction tree by the minimal separators enables us to show that the topological definition of the multiplicity of a minimal separator coincides with its statistical definition (made with the notion of perfect ordering of the cliques). More importantly for random matrices, it shows that the Wishart distributions associated to decomposable non homogeneous graphs as described by a recent paper (Annals of Statistics 2007) by the authors are parameterized by the family of minimal separators of the graph.

Luigi Malagò (Politecnico di Milano)Friday 4th July 11:15
Probabilistic Graphical Models in Evolutionary Computation: from Estimation of Distribution Algorithms to the Combination of Population-Based and Model-Based Search
Estimation of Distribution Algorithms (EDAs) are a recently proposed technique in Evolutionary Computation for combinatorial and continuous optimization problems. This meta-heuristic is characterized by the use of a parametric probability distribution estimated at each iteration according to prominent previously sampled instances. Many EDAs have been proposed in the Literature. Most of them make use of specific families of Probabilistic Graphical Models, such as Bayesian Networks and Markov Random Fields, in order to represent the linkage among variables. Our research is focused on a new class of mutation and crossover operators to be applied to parametric probability distributions. Differently from EDAs that use a single model or Genetic Algorithms that employ populations of solutions to an optimization problem, we are working on the proposal of a new family of algorithms, called Genetic Estimation-of-Distribution Algorithms, where a population of graphical models is evolved from one iteration to the next.

Dhafer Malouche (LEGI-ESSAI-EPT, Réseau National Universitaire)Monday 7th July 16:40
Gaussian covariance decomposition for PC-algorithm
PC algorithm is a procedure which is used for the estimation of Graphical Models or Bayesian Networks. The basic idea of this algorithm is to test marginal independence and to test after conditional independence given 1,2,... variables and so on. The procedure is stopped when the number of conditioning variables becomes greater than the degree of the last estimated graph (i.e ; the maximum number of neighbors of the vertices in the graph). Hence a sequence of nested graphs is obtained, G=Gk⊆ Gk-1⊆ ... G0, where G and G0 are respectively the estimated concentration and covariance graph.

The aim of this paper is to prove, using covariance decomposition, that the subset of variables in the conditional independence test between a pair of adjacent vertices (α,β) in Gk-1 should contains a path in G0 with length ≥2 connecting α and β. Otherwise, the edge (α,β) remains in Gk. Hence, this result reduces the number of conditional independence tests in the PC-algorithm. Some other properties concerning these PC-algorithms are also proved.

Sofia Massa (University of Padova)Friday 4th July 11:15
Combining statistical models
(Joint work with S. Lauritzen)
This paper develops a general framework for "structural meta-analysis", i.e. the combination of information from independent and related experiments, by considering combinations of families of distributions. A typical example is the combination of multivariate Gaussian families respecting conditional independence constraints, i.e. Gaussian graphical models. We begin by analysing the combination of distributions and this context is successively generalised to the combination of families of distributions. We extend the concept of meta-Markov combination introduced by Dawid and Lauritzen (1993) and present the concept of quasi-Markov combination. The proposed theory is then applied to graphical models.

Hélène Massam (Toronto)Thursday 3rd July 11:15
Flexible Wisharts disctributions and their applications
When a graphical Gaussian model is Markov with respect to a decomposable graph G, the inverse of its covariance parameter belongs to the cone P(G) of positive definite matrices with fixed zeros corresponding to the missing edges of G. The set of incomplete matrices obtained by inverting matrices of P(G) and discarding the entries corresponding to the zeros of P(G) is the cone Q(G). Actually an incomplete matrix belongs to Q(G) if and only if the submatrices corresponding to the cliques of G are positive definite.

Letac and Massam (2007) defined Wishart distributions on Q(G) and P(G). We will recall the definitions and properties of these distributions: their important characteristics includes the fact that they are hyper Markov laws and that they have multidimensional shape parameters as opposed to the one-dimensional shape parameter of the Wishart on the cone of positive definite matrices.

The inverse of the Wishart on P(G) is a strong directed hyper Markov law and is a generalization of the hyper inverse Wishart defined by Dawid and Lauritzen (1993). It forms a conjugate family with a multidimensional shape parameter, thus allowing for flexibility in the choice of a prior. We will illustrate the application of the inverse Wishart on P(G) as a flexible prior for the estimation of a high-dimensional covariance structure: our estimates are Bayes estimators and allow us to avoid recourse to MCMC, often infeasible in these cases (see Rajaratnam, Massam and Carvalho, 2008).

Frantisek Matus (Institute of Info Th. & Autom.
Academy of Sciences of the Czech Republic)
Friday 4th July 11:15
The worst data for hierarchical log-linear models
Maximization of the information divergence from a hierarchical log- linear model is studied. A maximizer admits interpretation of the empirical distribution of the worst dataset. A new upper bound on the maximum is presented. For the models given by the independent sets of a matroid tightness of the upper bound is described in terms of ideal secret sharing schemes or, equivalently, of matroid representations by partitions.

Frantisek Matus (Institute of Info Th. & Autom.
Academy of Sciences of the Czech Republic)
Tuesday 8th July 16:00
On minimization of entropy functionals under moment constraints
(Joint work with Imre Csiszar)
Minimization problems for entropy-like integrals and Bregman distances subject to a finite number of moment constraints are addressed in a general setting. Analogues of the authors' previous results on information projections to families determined by linear constraints, reverse information projections and maximum likelihood estimation in extensions of exponential families are established. The constraint qualification is avoided.

Anjali Mazumder (University of Oxford)Friday 4th July 11:15
Value of Evidence Analysis Using Probabilistic Expert Systems - applications in planning for forensic DNA identification

Julia Mortera (University Roma Tre)Wednesday 2nd July 17:20
Sensitivity of inference in Bayesian networks to assumptions about founders
Bayesian networks, with inferences computed by probability propagation methods, offer an appealing practical modelling framework for structured systems involving discrete variables in numerous domains, including forensic genetics. However, when standard assumptions are violated - for example when allele frequencies are unknown, there is identity by descent or the population is heterogeneous, dependence is generated among founding genes, that makes exact calculation of conditional probabilities by propagation methods less straightforward. Here we illustrate different methodologies for dealing with these problems by assessing sensitivity to assumptions about founders in forensic genetics problems. These methods comprise constrained steepest descent, linear fractional programming and representing dependence by structure. We illustrate these methodologies on several real casework forensic genetics examples comprising criminal identification, simple and complex disputed paternity and DNA mixtures.

Helene Neufeld (University of Oxford)Friday 4th July 11:15
Graphical Gaussian Models with Symmetries
I will describe new types of graphical Gaussian models by placing symmetry restrictions on the model parameters. My main focus will be on models with restrictions generated by permutation symmetry.

Klea Panayidou (University of Oxford)Friday 4th July 11:15
Tree Learning and Variable Selection

Giovanni Pistone (Politecnico di Torino)Friday 4th July 16:40
Information geometry of graphical models
The Information Geometry -- as defined by Amary and Nagaoka -- of graphical models and of models which are limits of graphical models is discussed from the point of view of Differential Geometry and Algebraic Geometry. In particular, models satisfying hard constrains are considered.

Roland Ramsahai (University of Oxford)Friday 4th July 11:15
Significance Test for Instrumental Model

Alberto Roverato (Università di Bologna)Tuesday 8th July 09:30
The non-rejection rate for structural learning of gene transcription networks from E.coli microarray data
Structural learning of gene networks in silico using microrray data is an important and challenging problem in bioinformatics. Apart from the relationship between transcription factors and corresponding binding sites, there is no standard model of the regulatory mechanism for the genes, and a commonly used approach assumes that the available data constitute a random sample from a multivariate normal distribution. However, the dimension of the data (with a number of genes p larger than the number of experiments n) precludes the direct use of standard multivariate techniques for this purpose. Castelo and Roverato (2006) introduced a quantity, called the non--rejection rate, which can be used to identify sparse conditional independence structures between genes. Here we introduce a structural learning procedure for gene networks based on the non-rejection rate. The effectiveness of procedures is evaluated on the basis of the precision-recall trade-off both on synthetic data and on real microarray data from E. coli.

Simon Shaw (University of Bath)Wednesday 9th July 17:20
Generalised Bayesian graphical modelling utilising Bayes linear kinematics
(Joint work with Michael Goldstein)
Bayes linear kinematics is a method of generalised belief adjustment for revising a prior Bayes linear specification over a collection of random quantities given a new Bayes linear specification over a subset of these random quantities. The new specification may have been obtained by a formal Bayes linear adjustment given some data or informally by pure reflection, unexpected sensory stimulation or by other means. In particular, it provides a method for embedding full conditional updates within Bayes linear belief adjustments. In this talk, we give a brief overview of the theory of Bayes linear kinematics and show how it can be utilised to enable the construction and analysis of the Bayes linear Bayes (BLB) graphical model, a mixture of fully Bayesian and Bayes linear graphical models. We then discuss advancements of the methodology to cover more complex mixed belief graphical models such as the conditional BLB graphical model where updating by both Bayes linear kinematics and probability kinematics, the full probabilistic analogue to Bayes linear kinematics developed by Richard Jeffrey, is explored.

Ricardo Silva (Cambridge University)Tuesday 1st July 16:40
Factorial mixture of Gaussians and the marginal independence model
We introduce mixtures of Gaussians models that obey marginal independence constraints. Besides providing a flexible way of estimating joints under such constraints, marginal independence models can be used to model the error distribution of a mixed graph model with additive errors. This is done by indexing different subsets of a pool of covariance matrices by different mixture indicators. The computational challenges of maximum likelihood estimation can be daunting due to the presence of a very large number of constraints, and non-standard approximation techniques might be necessary. We will describe work in progress to solve the maximum likelihood problem without approximations, and discuss some initial ideas in approximation algorithms and Bayesian inference.

Jim Smith (University of Warwick)Monday 7th July 16:00
Large sample robustness Bayes nets with incomplete information
Inference using graphical models often proceeds using various assumptions of independence such as local and global independence. We also often make distributional assumptions: for example assuming Dirichlet distributions for the conditional probabilities in a discrete Bayesian Network factorisation. When data is incomplete it is far from clear what the impacts of such assumtions are. Here we study this issue with the aid of a new family of separation measures called local De Robertis separations. We show that when the approximating posterior density concentrates itsd hmass on to an open ball with radius decreasing in sample size, with the appropriate mutual smoothness conditions on the approximating and genuine prior, inference will be robust with large samples provided the model does not concetrate on the boundary of the parameter space. Furthermore it is possible to obtain explicit bounds for the posterior total variation between models using the approximating and genuine priors even when the likelihood is mispecified.

Milan Studeny (Institute of Info Th. & Autom.
Academy of Sciences of the Czech Republic)
Tuesday 1st July 09:30
On Bayesian criteria for learning Bayesian network structure
The topic of the talk is learning Bayesian network structure by the method of maximization of a quality criterion. In the first part, the basic concepts and the idea of the algebraic approach to this maximization problem will be recapitulated.

The talk will deal with Bayesian approach to the derivation of quality criteria, which leads to the class so-called Bayesian criteria, defined as the logarithms of the marginal likelihoods. The formulas for so-called data vectors of these criteria (in terms of hyperparameters of Dirichlet priors) will be presented and the discussion on their behaviour iniciated.

The second part of the talk will be devoted to the geometric aspects of the maximization task. A simple example will be given that the well-known GES algorithm may fail to find the maximal value of a Bayesian criterion provided it is based on classic inclusion neighborhood concept. This justifies the usefulness of recently introduced geometric neighborhood concept.

Seth Sullivant (Harvard University)Friday 4th July 09:30
Algebraic aspects of Gaussian graphical models
The set of covariance matrices consisted with a Gaussian graphical model is a semialgebraic subset of cone of positive definite matrices. In particular, the set of covariance matrices that can arise comes from a nice combinatorial parametrization in terms of the underlying graph. Developing an algebraic and combinatorial understanding of this parametrization can be useful for proving results about the semialgebraic constraints on the covariance matrices.

I will describe two results on the constraints sets for hidden variable models, where this algebraic perspective has proven useful. The first result gives a complete characterization of the covariance matrices that can arise from hidden tree models. In particular, the constraints for a hidden tree model are all direct implications of conditional independence statements (a result that fails for general graphs with hidden variables).

Our second result concerns a new generalization of the tetrad representation theorem to higher order subdeterminants of the covariance matrix. In particular, the vanishing of a higher order determinant is characterized by the t-separation condition, a natural generalization of the usual d-separation for DAG models.

Yee Whye Teh (UCL, London)Tuesday 1st July 17:20
Collapsed variational inference for infinite state Bayesian networks
(Joint work with Max Welling and Ken Kurihara)
Structure learning in Bayesian networks is an important challenge in their application to large and complex data analysis problems. In this talk we consider a simpler problem---that of estimating the cardinality of latent variables given a known graphical structure. We achieve this using Infinite State Bayesian Networks (ISBNs), which are nonparametric extensions of the usual Bayesian networks where latent variables are discrete but can take on an infinite number of values. We propose an approximate inference algorithm for ISBNs based on collapsed variational approximation, and compare against Gibbs sampling in such models.

Caroline Uhler (University of California)Friday 4th July 11:15
Detecting Interacting SNPs in Disease Association Studies Using MCMC on Multidimensional Contingency Tables
Rapid research progress in genotyping techniques have allowed large genome-wide disease association studies. Existing methods often focus on determining associations between single loci and the disease. However, most diseases involve complex relationships between multiple loci and the environment. Here we describe a method for finding interacting loci by combining the traditionally used single-locus search with a search for multiway interactions on contingency tables. Our method is based on an extended Fisher's exact test for multidimensional contingency tables. To perform this test, a Markov chain is constructed on the space of multidimensional contingency tables using Markov bases. Concluding, we test our method on simulated data.

Paola Vicard (University Roma Tre)Wednesday 2nd July 16:00
Model based and model assisted estimators using probabilistic expert systems
Two classes of estimators of a contingency table when a sample is drawn according to a stratified sampling design are introduced. The two classes are composed of model based and model assisted estimators (Särndal et al., 1992) respectively, where the model is defined in terms of probabilistic expert systems (PES, see for instance Cowell et al., 1999). We show that in a model based framework the PES machinery can be easily adapted to complex sampling designs through the definition of an additional node, SD, representing the sample design. SD is a categorical variable assuming as many categories as the strata. The marginal probability attached to SD is defined through the sampling design as the fraction of the overall weight relative to the units of the population in each stratum. In a model assisted framework it is not necessary to include SD in a PES. Both the classes include the usual Horvitz-Thompson estimator as a special case when the PES corresponds to the complete (i.e. saturated) model. PES can be particularly useful to propagate information through a set of variables and consequently allow the definition and performance of poststratification. Monte Carlo simulations have been carried out to evaluate and compare the two classes of estimators.

Martin Wainwright (UC Berkeley)Tuesday 1st July 17:55
Marginal polytopes of graphical models: Linear programs, max-product, and variational relaxation
Applications of Markov random field (MRF) models require efficient methods for computing (either exactly or approximately) quantities such as marginals, likelihoods, and modes. All of these problems are easily solvable for tree-structured graphs (or more generally, graphs with low treewidth) by sum-product or max-product message-passing within the junction tree framework.

This talk focuses on approximate methods for these same problems on graphs without low treewidth. Our perspective centers on a key geometric object associated with a discrete-state MRF, known as the marginal polytope. We discuss how a broad class of variational methods (sum-product, expectation-propagation, mean field) can be understood as approximating this polytope in different ways. Focusing on mode computation, we then show that the ordinary max-product algorithm can be very misleading when applied to graphs with cycles, by converging rapidly to highly sub-optimal solutions. We then introduce the class of tree-reweighted max-product algorithms, and show that for any problem instance, they either provably return a correct mode, or acknowledge failure. Via a dual reformulation, these algorithms turn out be linked to linear programming (LP) relaxations, based on outer approxmations to the marginal polytope. Time-permitting, we describe dual-certificate methods for establishing optimality of LP relaxation and message-passing over random ensembles of graphical models.

Nanny Wermuth (Chalmers/Gothenburg University)Thursday 3rd July 09:30
Probability distributions with summary graph structure
A given joint density of several variables may satisfy a possibly large set of independence statements, called its independence structure. If this structure is captured by a graph, then consequences of this independence structure can be derived for any type and form of the given distribution. In particular, for a probability distribution to a multivariate regression chain graph, that could have been generated over a directed acyclic graph, effects of removing some variables by integrating or summing and of taking levels of other variables as given, can be represented by a type of graph that we call summary graph.

We use operators for matrix representations of graphs to derive and study properties of summary graphs and of corresponding densities. Summary graphs turn out to preserve their after marginalising, orthogonalising and conditioning. For the remaining variables, they capture the independence structure implied by the starting graph, they lead to interpretations of paths and they separate some of the generating dependencies from indirect confounding.

Joe Whittaker (Lancaster University)Wednesday 2nd July 16:40
Bootstrapping divergence weighted independence graphs for design based survey analysis
The analysis of survey data collected on a set of response variables defined over a finite population, benefits from a birds-eye view of the strengths of their inter-relationships. The weighted graph based on divergence measures of independence strength fulfills this purpose. The bootstrap may give some pointers for establishing how large conditioning sets may become while retaining stable estimates.

David Wooff (Durham University)Wednesday 9th July 16:40
Bayes linear revision for plates
We consider Bayes linear graphical models for which there is an unobserved parameter set separating multiple instances of plates, for example for multivariate multiple regression models Y=XB+E. We show that there is a natural linear transformation of the observables which produces a sequence whose mean is sufficient for all the observables and so induces Bayes linear separation in the graph. We show how the separation simplifies Bayes linear computation for such problems, and illustrate information flow on the graph for the parameters and a natural set of residual constructs. We show that there is a simple extension to cases where the parameter set is hierarchical. We show how the next design point may be chosen, for sequential design. We show that the set of observables is consistent with the existence of an infinite second-order exchangeable sequence which may then be used to explain likely variance reductions as the number of design points is increased further.

Henry Wynn (London School of Economics)Friday 4th July 16:00
Junction tubes for Bayes nets and related algebra
A junction tube is a generalisation of a junction tree in which a simplicial complex is associated with a collection of subsets of random variables, with each subset being allocated a vertex. The tube property is that the subsimplical complex associated with all subsets containing a given index (random variable) is simply connected. This is a construction which was used by D Q Naiman and the author to study certain collection of measurable sets, but here the use is rather for the index set of the random variable. When the index set has geometry the collection of sets can be considered as a cover and if the cover is good, in the sense of J Leray, then the simplicial complex can be taken as the nerve of the cover. But other constructions of a more combinatorial are possible, including those introduced by K Dohmen in following the tube theory. The property is preserved under projection, that is under marginalisations over one or more variables, and one can capture all the implied conditional independence structures. The system imposes a set of linear conditions on the log-likelihood which give a valuation on the simplicial complex and all subsimplicial complexes obtained via projection. The dual of this system gives a full and explict exponential representation of the distribution. There are also equivalent implicit conditions in terms of commuting conditional expections.

Or Zuk (MIT and Harvard)Friday 4th July 11:15
The sample complexity of learning Bayesian Networks
The problem of reconstructing the structure of a graphical model from observational data is a well studied one. A fundamental question in this context is how does the accuracy of reconstruction depend on the sample size. Several nice results have appeared recently, the problem is still not well understood. Here, we present both new asymptotic bounds, as well as simulations for moderate sample size, for the case of discrete fully observed Bayesian Networks.