Large Scale Machine Learning and Other Animals: GaBP

Showing posts with label GaBP. Show all posts

Saturday, August 25, 2012

On convergence of Gaussian BP

I got the following question from Gregor Gorjanc, our man in Lubliyana:

Dear Danny,

We have spend quite some time trying out several things to use GaBP with the models we work with. Unfortunatelly, we can never make it work without regularization. It seems that GaBP is in particular sensitive to models where graphs have V like DAG structures, i.e., some nodes having two parental nodes. For example we work with the model describing genetic inheritance and in essence we model this process between a trio of two parents and a child. In what follows I will describe the model (just a tiny example) and show you some examples that appear to give some insight with which types of model GaBP is having problems.

Say we have two parents (1 and 2) and their children (3 and 4). For simplicity assume we observe phenotypic value (y) on all four individuals. The aim of the model is to infer genetic value (g) for each individual based on phenotypic values and pedigree links. The generic model (I have skipped some parts) I work with is:

y_i = g_i + e_i

g_i = 1/2 * g_f(i) + 1/2 * g_m(i) + w_i

e_i ~ N(0, 4)

w_i ~ N(0, 2)

where:

y_i = phenotypic value on individual i

g_i = genetic    value of individual i

e_i = phenotypic residual

g_f(i) = genetic    value of father of individual i

g_m(i) = genetic    value of mother of individual i

w_i    = genetic residual

The full model graph for this family (see attached files namedfamily_full*) would be:

g_1 = w_1

g_2 = w_2

g_3 = 1/2 * g_1 + 1/2 * g_2 + w_3

g_4 = 1/2 * g_1 + 1/2 * g_2 + w_4

y_1 = g_1 + e_1

y_2 = g_2 + e_2

y_3 = g_3 + e_3

y_4 = g_4 + e_4

With the LHS of a system of equations equal to:

## [1,]  5  2 -2 -2

## [2,]  2  5 -2 -2

## [3,] -2 -2  5  .

## [4,] -2 -2  .  5

isDiagDom(x=LHS)

## $isDiagDomMatrix

## [1] FALSE

spectralRadius(x=LHS)

## $spectralRadius

## [1] 1.02

## $spectralRadiusSmallerThan1

## [1] FALSE

This very simple model does not converge. So I attempted to build smaller models to understand what is going on. The first attempt was to take one parent and one child. This model (see attached files named family_one-one*) is now:

g_1 = w_1

g_3 = 1/2 * g_1 + w_3

y_1 = g_1 + e_1

y_3 = g_3 + e_3

With the LHS of a system of equations equal to:

## [1,]  3.67 -1.33

## [2,] -1.33 3.67

isDiagDom(x=LHS)

## $isDiagDomMatrix

## [1] TRUE

spectralRadius(x=LHS)

## $spectralRadius

## [1] 0.36

## $spectralRadiusSmallerThan1

## [1] TRUE

Since spectral radius is smaller than 1 this model converges to exact marginal means as well as variances! Now I added another child node (note we have now A like structure of a DAG) so the model (see attached files named family_one-two*) is now

g_1 = w_1

g_3 = 1/2 * g_1 + w_3

g_4 = 1/2 * g_1 + w_4

y_1 = g_1 + e_1

y_3 = g_3 + e_3

y_4 = g_4 + e_4

With the LHS of a system of equations equal to:

## [1,]  4.333333 -1.333333 -1.333333

## [2,] -1.333333  3.666667  .       

## [3,] -1.333333  .         3.666667

isDiagDom(x=LHS)

## $isDiagDomMatrix

## [1] TRUE

spectralRadius(x=LHS)

## $spectralRadius

## [1] 0.47

## $spectralRadiusSmallerThan1

## [1] TRUE

Which again converges to exact marginal means as well as variances. Now I created a model with two parents and one child (note we have now V like structure of a DAG) so the model (see attached files named family_two-one*) is: 

g_1 = w_1

g_2 = w_2

g_3 = 1/2 * g_1 + 1/2 * g_2 + w_3

y_1 = g_1 + e_1

y_2 = g_2 + e_2

y_3 = g_3 + e_3

With the LHS of a system of equations equal to:

## [1,]  4  1 -2

## [2,]  1  4 -2

## [3,] -2 -2  5

isDiagDom(x=LHS)

## $isDiagDomMatrix

## [1] TRUE

spectralRadius(x=LHS)

## $spectralRadius

## [1] 0.77

## $spectralRadiusSmallerThan1

## [1] TRUE

This also converges to exact marginal means but not for variances.

From the above tests it seems that GaBP:

- underestimates variances when V like structures appear in the model (example family_two-one) --> in the Markov network this structure is a small loop between the three involved nodes

- fails to converge when there are overlapping loops in Markov network, i.e., due to two V structures sharing parental nodes (example family_full)

Is seems that this shows why GaBP does not work with our applications as above structures are highly abundant in our models. Do you see any value in these observations?

Attached is a set of files for each example and a small shell script that show how GaBP from GraphLab was being used - only basic options were used deliebrately to understand what was going on.

./GaBP.sh family_one-one 0 0 > family_one-one.log 2>family_one-one.screen

./GaBP.sh family_one-two 0 0 > family_one-two.log 2>family_one-two.screen

./GaBP.sh family_two-one 0 0 > family_two-one.log 2>family_two-one.screen

./GaBP.sh family_full    0 0 > family_full.log    2>family_full.screen

Regards, Gregor

My Answer:

Hi Gregor,

It seems you reverse engineered the known convergence conditions of Gaussian BP (and generally BP).

1) For a graph which is a tree, BP converges to the exact result, namely both the means and the variances are exact.

2) A known result by Yair Weiss (Y. Weiss. Correctness of local probability propagation in graphical models with loops. Neural Computation, 12(1), 2000.) is that for graphs with a single cycle, BP always converges to a unique fixed point. In this case the mean is exact.

3) When the algorithm converges (for a graph with cycles), the mean is exact, but the variance is an underestimation of the true variance. (Weiss, Yair (2000). "Correctness of Local Probability Propagation in Graphical Models with Loops". Neural Computation 12 (1): 1–41.)
See also the great explanation by Jason K. Johnson: Malioutov, Johnson, Willsky. Walk-sums and belief propagation in Gaussian graphical models, Journal of Machine Learning Research, vol. 7, pp. 2031-2064, October 2006.

I believe all your conclusions are correct, maybe except for the two V structures, where the algorithm may converge depends on the matrix values...

Saturday, January 28, 2012

Animal breeding with GraphLab?

Believe it or not, there is unlimited supply of real life interesting problems that can be solved usign GraphLab! Here is an email I got this morning from Asst. Prof. Gregor Gorjanc from the Biotechnical Faculty of University of Ljubljana:

I work in the area of animal breeding and genetics. The task of paramount importance in my field is the inference of so called breeding (genetic) values from phenotypic values and pedigree graph. These breeding values are used for selection of animals that will become parents of better future generations. We use linear mixed models of a type:
y | b, a, \sigma^2_e ~ N(Xb + Za, I\sigma^2_e)
b | B, \sigma^2_b    ~ N(0, B\sigma^2_b)
a | A, \sigma^2_a    ~ N(0, A\sigma^2_b)
where y - is a vector of known phenotypic values (body weight, ...), b - is a vector of unknown factors affecting phenotype (sex, age, ...), and a - is a vector of breeding values, while X and Z are design matrices linking b and a with y. Matrices B and A are known covariance (kernel) matrices - the later describes the genealogy between animals. We usually infer variance components (\sigma^2_i) and use point estimates in the above model as known quantities. Therefore, the inference of b and a basically involves solving a system of linear equations, where LHS is:
| X'X+B⁻1*beta X'Z           |
| Z'Z          Z+Z+A⁻1*alpha |,
with beta=\sigma^2_e / \sigma^2_b and alpha=\sigma^2_e / \sigma^2_a. The X'X+B⁻1 can be quite dense, but is often much much smaller than the Z+Z+A⁻1 part so LHS is in general very sparse. Both B⁻1 and A⁻1 are sparse as well, but do have some structure. The RHS is:

| X'y |
| Z'y |.

We can easily solve the above system and state of the art applications use the PCG (Preconditioned Conjugate Gradiaten) method due to shear size of data sets we deal with in animal breeding - small applications involve few thousands of animals (~number of unknowns), while nation-wide applications can easily involve hundreds of thousands of animals or more for large countries. Iterative methods only deliver as solutions posterior means of b and a, while posterior variances are not knwon as the inversion of LHS is to costly. Posterior covariances between a are not of high importance. I came across GaBP algorithm in the Graphical models book by Koller and Friedman and got very interested when I read that GaBP can deliver both, posterior means and variances for my model. Do you have any suggestions regarding my applications?

And here is my answer:
The first thing that comes to my mind, is that GaBP does not require a square matrix to operate, it can work on non-square matrices using the construction described here: equation (2).
Now, if the matrix is square, it should better be symmetric or else you will get reduced performance in terms of accuracy.
It seems that your matrix is square but not symmetric - I suggest playing a little with the equation and creating a non-square matrix as in the construction above.

My second advice is first try my Matlab GaBP code (available here) on smaller matrices just to verify that the algorithms converges and you are happy with accuracy. When you need a performance boost for larger matrices then GraphLab GaBP should be the right tool to use.

Regarding variance approximation - it is only an approximation, the values may be lower than the actual variance. I suggest again trying with smaller example in Matlab and trying to characterize the approximation error magnitude.

Q) You mention that GaBP delivers underestimated posterior variances for x. What is the rational behind this? I will surely test out with the exact case, but it would be great to get some theoretical insight into this.

A) In a nutshell: The matrix (inv(I-A)), can be expanded using an infinite sum, (inv(I-A)) = A + A^2 + A^3... . (When it is first normalized to have a unit diagonal). This can be viewed as an random walk on a markov chain, when summing routes of distances of size 1,2,3 ... See the JMLR paper of Jason K. Johnson et al.

GaBP is capturing only a part of those routes - the ones which start and ends in the same node. So the variance is an estimation to the total variance.

Saturday, August 6, 2011

The quiet rise of Gaussian Belief Propagation (GaBP)

Gaussian Belief Propagation is an inference method on a Gaussian graphical model which is related to solving a linear system of equations, one of the fundamental problems in computer science and engineering. I have published my PhD thesis on applications of GaBP in 2008.
When I started working on GaBP, it was absolutely useless algorithm with no documented applications.
Recently, I am getting a lot of inquiries from people who applying GaBP on real world problems. Some examples:
* Carnegie Mellon graduate student Kyung-Ah Sohn, working with Eric Xing, is working on regression problem for finding causal genetic variants of gene expressions, considered using GaBP for computing matrix inverses.
* UCSC researcher Daniel Zerbino using suing GaBP for smoothing genomic sequencing measurements with constraints.
* UCSB graduate student Yun Teng is working on implementing GaBP as part of the KDT (knowledge discovery toolbox package).

Furthermore, I was very excited to find out today from Noam Koenigstein, a Tel Aviv university graduate about Microsoft Research Cambridge project called MatchBox, which is using Gaussian BP for collaborative filtering and being actually deployed in MS. Some examples to other conversations I had are:
* Wall Street undisclosed company (that asked to remain private) who is using GaBP for parallel computation of linear regression of online stock market data.
* A gas and oil company was considering to use GaBP for computing the main diagonal
of the inverse of a sparse matrix.

I thought about writing on the common questions I am getting so my answers can be useful for a wider audience. Here are some of the questions:

Q: What is the difference between GaBP and other iterative methods for solving a linear system of equations like the Jacobi method?
A: GaBP typically converges in a fewer number of iterations relative to Jacobi, since it deploys information from the second derivative (the precision/variance of the Gaussian). However, each iteration is more computation intensive since for each non-zero entry of the linear relation matrix a Jacobi like computation is done. Thus, the algorithms works well for sparse matrices.

Q: When should I use GaBP?
A: I believe there is no silver bullet and the answer depends on the data. I recommend to always try several algorithms like Jacobi, conjugate gradients etc. and compare performance to GaBP. On some cases GaBP works very well. For example: matrices with zero, -1, 1 entries. Matrices with same number of non zeros in each row and column. CDMA. 2D Poisson problem.

Q: Can GaBP be applied to dense matrices?
A: Yes, there is an algorithm variants for dense matrices - see section 3.3 of my PhD thesis.

Q: Does GaBP always converge?
A: No, like the Jacobi method, its converges is relates to the data matrix. However, it is always possible to force converges, at the cost of higher computation. See http://arxiv.org/abs/0901.4192

Q: What is the best way to understand GaBP better?
A: I recommend downloading the Matlab code from here, and experimenting with it on some toy problems.

Q: Can GaBP converge in cases where Jacobi does not converge?
A: Yes, in some cases. See example in equation 57 in here.

Q: Can the main diagonal of the sparse inverse matrix be computed using GaBP?
A: Yes, but only an approximation of it. The main inverse if approximated by the
output precision entries.

Q: can GaBP be used on non-square/non-symmetric matrices?
A: Yes, see: section III in here.

Q: Can GaBP be used for computing the full inverse of a sparse matrix?
A: Yes. The way to compute the full inverse is by solving n systems of linear equations where
the observation vector b_i = e_i, namely the all zero vector with 1 in its i-th entry. This can be done efficiently in parallel. Note that while the linear system may be sparse, the inverse solution is typically not sparse.

Q: Is there an efficient multicore implementation of GaBP?
A: Yes! as a part of the GraphLab linear solver library.

Q: What is the relation to linear belief functions?
A: Linear belief functions describe a joint Gaussian distribution, combine them, marginalize, etc. Compared to representing a joint Gaussian by its covariance matrix, linear belief functions allows to have infinite variance in some directions, which corresponds to no belief about the value in that direction. The same can be done in GaBP, by having a zero precision value of that node. However, careful handling of the ordering of computation is needed to avoid division by zero.

Q: I need to solve Ax=b where A = UU' and U is given and not sparse.
The problem is that I would like to avoid computation of UU'.
A: First you need to solve U\b using GaBP for getting the observation
and then you need to solve U'\(U\b).

Q: Recently i found your PHD thesis about solving systems of linear equations using GaBP and then the GraphLab. I have two questions, what's the advantage of GaBP over algorithms like sparseLM in solving linear systems of equations?

A: I am not familiar with sparseLM algorithm - do you mean http://www.ics.forth.gr/~lourakis/sparseLM/ ? It seem to be a non-linear algorithm. When the cost function is linear (like in a linear system) there is no benefit in using a non-linear method.

Q: Is it possible to incrementally solve the "Ax=b" equation as new measurements are added to the matrices "A" and "b"?
A: No. The GaBP is an offline algorithm. The scenario you are describing sounds like a Kalman filter problem - where new measurements are constantly introduced into the model. You can take a look
at my paper: "Distributed Kalman filter via Gaussian Belief Propagaion". You may also want to take a look at rank one update algorithms: where matrix A is updated based on new observations.

If you have any other interesting questions - please email me I am always happy to answer!

Saturday, June 4, 2011

Python wrapper for running GraphLab GaBP linear solver

I got from Daniel Zerbino, a postdoc in UCSC, who is working on smoothing genomic sequencing measurements with constraints, a Python script for converting data to GraphLab GaBP format. Here it is:

#!/usr/bin/env python

#############################################
# Python wrapper for GaBP application in GraphLab
# By Daniel Zerbino, based on Matlab code by Danny Bickson
#############################################

import sys
import os
import tempfile
import struct
import numpy as np
import subprocess

#############################################
# Convenience binary writer/reader functions:
#############################################

def writeDouble(F, x):
     F.write(struct.pack('<d', x))

def writeVector(F, vector):
     for X in vector: 
 writeDouble(F, X)

def writeInt(F, x):
     F.write(struct.pack('<i', x))

def readInt(F):
     return struct.unpack('<i', F.read(struct.calcsize('i')))[0]

def readDouble(F):
     return struct.unpack('<d', F.read(struct.calcsize('d')))[0]

def readVector(F, n):
     return [readDouble(F) for i in range(n)]

#############################################
# Convenience MatLab like functions
#############################################

def enumerate(M):
     for i in range(np.shape(M)[0]):
 for j in range(np.shape(M)[1]):
    # Beware of Matlab numbering!!
    yield i+1, j+1, M[i,j]

def find(M):
     return filter(lambda X: X[2] != 0, enumerate(M)) 

#############################################
# script for exporting system of linear equations of the type Ax = y to
# graphlab format.
# Written by Danny Bickson, CMU
# Input: fn - output file name
# A - A mxn matrix (if A is square than m=n)
# y - mx1 observation vector
# x - nx1 known solution (optional, if not given will write a vector of zeros)
# sigma - a vector m+nx1 of noise levels (optional, for non-square matrices only)
# Conversion to Python by Daniel Zerbino, UCSC
#############################################

def save_c_gl(fn, A, y, x=None, sigma_y=None, sigma_x=None, square=False):
    m, n = np.shape(A)
    if len(y) != m:
       sys.exit('y vector should be of len as the number of A rows (%i and %i resp.)' % (len(y), m))

    if x is not None and len(x) != n:
       sys.exit('x vector length should be as the matrix A columns')

    if sigma_y is None and sigma_x is not None:
       sys.exit("sigma_y should be provided when entering sigma_x")

    if sigma_x is None and sigma_y is not None:
       sys.exit("sigma_x should be provided when entering sigma_y")

    if sigma_x is not None:
       if square:
          sys.exit('sigma noise level input is allowed only for non-square matrices')
       else:
          if len(sigma_x) != n:
             sys.exit('sigma_x length should be number of cols of A')
          if len(sigma_y) != m:
             sys.exit('sigma_y length should be number of rows of A')

    if square:
        # matrix is square, edges are non digonal entries
        vals = find((A - np.diag(np.diag(A))))
 print "Saving a square matrix A"
    else:
        # matrix is not square, edges are non zero values
        vals = find(A)
 print "Saving a non-square matrix A"

    F = open(fn, 'wb')

    #write matrix size
    writeInt(F, m)
    writeInt(F, n)

   # write y (the observation), x (the solution, if known), diag(A) the
   # variance (if known, else default variance of 1)
    writeVector(F, y)
    if x is not None:
 writeVector(F, x)
    else:
 writeVector(F, (0 for x in range(n)))

    if square:
 writeVector(F, diag(A))
    else:
        if sigma_y is not None:
     writeVector(F, sigma_y)
     writeVector(F, sigma_x)
        else:
            writeVector(F, (1 for x in range(m+n)))

    #write number of edges
    assert len(vals) > 0
    writeInt(F, len(vals))
    # pad with zeros for 64 bit offset
    writeInt(F, 0)

    if not square:
        offset = m
    else:
        offset = 0

    #write all edges
    for val in vals:
       writeInt(F, val[0])
       writeInt(F, val[1] + offset)
       writeDouble(F, val[2])

    F.close()

    #verify written file header
    F = open(fn,'rb')
    x = readInt(F)
    assert x == m
    F.close()

    print 'Wrote succesfully into file: %s' % fn

########################################################
#script for reading the output of the GaBP GraphLab program into matlab
# returns x = inv(A)*b as computed by GaBP
# returns diag = diag(inv(A)) - an approximation to the main diagonal of
# the inverse matrix of A.
# Written by Danny Bickson, CMU
# Conversion to Python by Daniel Zerbino, UCSC
########################################################

def load_c_gl(filename, columns):
    F = open(filename, 'rb')

    x = readVector(F, columns)
    diag = readVector(F, columns)

    F.close()
    os.remove(filename)
    return x, diag

########################################################
## Wrapper Utility to be used from outside
########################################################

def runGaBP(convergence, A, y, sigma_y=None, x=None, sigma_x=None, square=False):
    file, input = tempfile.mkstemp(dir='.')

    save_c_gl(input, A, y, x=x, sigma_y=sigma_y, sigma_x=sigma_x, square=square)

    args = ['gabp', '--data', input, '--threshold', str(convergence), '--algorithm', '0', '--scheduler=round_robin', '--square']
    if not square:
        args.append('false')
    else:
 args.append('true')
    print "Running " + " ".join(args)
    ret = subprocess.Popen(args, stdout=sys.stdout, stderr=subprocess.STDOUT).wait()

    if ret != 0:
 sys.exit("GaBP did not complete")

    os.remove(input)
    x2, diag = load_c_gl(input + ".out", len(x))
    return x2, diag

#########################################################
## Unit test
#########################################################
def main():
 A = np.array([[0.2785, 0.9649],[0.5469, 0.1576],[0.9575, 0.9706]])
        y = np.array([1.2434, 0.7045, 1.9281])
 sigma_y= np.array([1e-10, 1e-10, 1e-10]) 
        x = np.array([0, 0])
 sigma_x = np.array([1, 1])
        convergence = 1e-10
        x2, diag = runGaBP(convergence, A, y, sigma_y=sigma_y, x=x, sigma_x=sigma_x)

 print 'A'
 print A
 print 'y'
 print y
 print 'Initial X'
 print x
 print 'Initial Error'
 print A.dot(x) - y
 print 'Final X'
 print x2
 print 'Final Error'
 print A.dot(x2) - y
 print 'diag' 
 print diag

if __name__=='__main__':
        main()

Wednesday, June 1, 2011

On the performance of Graphlab for large scale linear solver

I just got some interesting performance numbers from Yun Teng, a master student in UCSB, who tried GraphLab GaBP implementation of a linear solver. Yun used steady state thermal problem:

This is a linear system of equations, where the matrix A has 1.2M rows x 1.2M columns and

8.4M non-zeros. Here are some performance numbers. For round-robin scheduler, running 70 iterations on a 32 core Opteron (2.4GHz) computer with 128GB of RAM with CentOS release 5.6.

We got the best speedup, around 12, using 24 cores:

For the chromatic scheduler, we got a slightly less good speedup of 8 using 20 cores.

In terms of running time, each iteration runs in about 0.2 seconds on 20 cores.
Which means we handle about 40M non-zero matrix entries a second.

Large Scale Machine Learning and Other Animals

Saturday, August 25, 2012

On convergence of Gaussian BP

Saturday, January 28, 2012

Animal breeding with GraphLab?

Saturday, August 6, 2011

The quiet rise of Gaussian Belief Propagation (GaBP)

Saturday, June 4, 2011

Python wrapper for running GraphLab GaBP linear solver

Wednesday, June 1, 2011

On the performance of Graphlab for large scale linear solver

Labels

GraphLab Users Google Group

pagerank

google analytics

syntax