Showing posts with label Eigen. Show all posts
Showing posts with label Eigen. Show all posts

Monday, October 3, 2011

it++ vs. Eigen - part 2 - performance results

Following the first part of this post, where I compared some properties of it++ vs. Eigen, two popular linear algebra packages (it++ is an interface to BLAS/LaPaCK).  I took some time for looking more closely at their performance.

I am using Graphlab's collaborative filtering library, where the following algorithms are implemented:
ALS (alternating least squares), Tensor ALS (Alternating least squared on 3D tensor), PMF (probablistic matrix factoriation) , BPTF (Bayesian prob. tensor factorization), SVD++, NMF (non-negative matrix factorization), Lanczos algorithm, Weighted ALS, ALS with sparse factor matrices.

Each algorithm was tested once on top of it++ and a second time on Eigen, using a subset of Netflix data described here.

Following is a summary of the results:
* Installation: Eigen has no installation, since it is composed of header files. it++ has a problematic installation on many systems, since installation packages are available on only part of the platforms, and potentially BLAS and LaPaCK should bre preinstalled.
* Licensing: Eigen has LGPL3+ license, while it++ has a limiting GPL license.
* Compilation: Eigen compilation tends to be slower because of excessive template usage.
* Performance: Typically Eigen is slightly faster as shown in the graphs below.
* Accuracy: Eigen is more accurate when using ldlt() method (relative to it++ solve() method), while it++ performs better on some other problems, when using backslash() least squares solution.

Here is timing performance plot. Lower is running time is better.


Here is accuracy plot. Lower training RMSE (root square mean error) is better.


Overall, the winner is: Eigen - because of simpler installation, equivalent if not better performance and a more flexible license.

Wednesday, September 21, 2011

On the importance of different configurations of linear algebra packages

I got the following question from Kayhan, a Upenn graduate student:

I am one of the readers of your weblog. I have a question about one of your posts in your weblog about comparison of of two linear algebra libraries: 'it++ vs eigen" ; I guess you are the expert person who can answer my question.
I have an algorithm that involves matrix-matrix and matrix-vector matrix multiplication iteratively and involves all kinds of dense and sparse matrices and vectors. I have implemented my algorithm using gmm with atlas flag active but it seems that it is still slower than MATLAB. More specifically, it seems that gmm uses one thread comparing to MATLAB that uses multiple threads when it is compiled with MCC.
I was wondering if any of those libraries you have introduced in your post (it++, eigen) are capable to of multi-threading and how does it compared with MATLAB linear algebra engine.


Regards,
Kayhan

It is always nice to get feedback from my readers! Especially the ones who call
me an expert (although without "I guess" - next time please!! :-)
There is definitely a room for improving blas/lapack performance. Need to dig into the details of the library you are using.

Eigen has some nice benchmarks here:

As you can see, Atlas has relatively inferior performance vs. Eigen and
Intel MKL.(Higher MFLOPS is better).

Here is an example setup for Intel MKL I got from Joel Welling (PSC):

I'm guessing that the current configuration produces too many threads,
or puts those threads in the wrong places. See for example the section
'Choosing the number of threads with MKL' on
http://www.psc.edu/general/software/packages/mkl/ . It might also be
worth linking against the non-threaded version of MKL, which I think
would involve doing:

-L${MKL_PATH} -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lpthread
instead of:

-L${MKL_PATH} -lmkl_intel_lp64  -lmkl_intel_thread -lmkl_core \
-L/opt/intel/Compiler/11.1/072/lib/intel64 -liomp5 -fopenmp

From my experience, there is a huge difference in performance between different lapack configurations on the same machine. For example, on BlackLight supercomputer
I got the following timing results for Alternating least squares on Netflix data.
Here is a graph comparison different implementations. I used 16 BlackLight cores. Alternating least squares is run 10 iterations to factorize a matrix of 100,000,000 nnz. The width of the factor matrices was set to D=30.

As you can see, wrong configuration resulted in x24 more running time! (In this Graph - lower is better!) Overall, if you are using an Intel platform I highly recommend using MKL.

Why don't you try out GraphLab? It is designed for iterative algorithms on sparse data. In case you use it is much easier to deploy efficiently the multiple cores.

Sunday, September 18, 2011

it++ vs. Eigen

it++ and Eigen are both popular and powerful matrix linear algebra packages for C++.

We got a lot of complaints from our users about the relative difficulty in installing it++, as well for its limited GPL license. We have decided to try and swith to Eigen linear library instead. Eigen has no installation since the code is composed of header files. It is licensed under LGPL3+ license.

Today I have created a pluggable interface that allows swapping it++ and Eigen underneath our GraphLab code. I have run some tests to verify speed and accuracy of Eigen vs. it++.

And here are the results:
Framework and Algorithm Running time (sec) Training RMSE Validation RMSE
it++ ls_solve_chol 16.8 0.7000 0.9704
it++ ls_solve 17.8 0.7000 0.9704
Eigen ldlt 18.3 0.6745 0.9495
Eigen llt 18.7 0.6745 0.9495
Eigen JacobiSVD 63.0 0.6745 0.9495

Experiment details: I have used GraphLab's alternating least squares, with a subset of Netlix data. Dataset is described here. I let the algorithm run for 10 iterations, in release mode, on our AMD Opteron 8 core machine.

Experiment conclusions: It seems that Eigen is more accurate than it++. It slightly runs slower than it++ but accuracy of both training and validation RMSE is better.

Tho those of you who are familiar with it++ and would like to try out Eigen I made some short
list of compatible function calls of both systems.


















it++ Eigen
double matrix mat MatrixXd
double vector vec VectorXd
Value assignment a.set(i,j,val) a(i,j)=val
Get row a.get_row(i) a.row(i)
Identity matrix eye(size) Indentity(size)
Matrix/vecotr of ones ones(size) Ones(size)
Matrix/vecotr of zeros zeros(size) Zero(size)
Least squares solution x=ls_solve(A,b) x=A.ldlt().solve(b)
transpose transpose(a) or a.transpose() a.transpose()
set diagonal a=diag(v) a.diagonal()=v
sum values a.sumsum() a.sum
L2 norm a.norm(2) a.squaredNorm()
inverse inv(a) a.inverse()
outer product outer_product(a,b) a*b.transpose()
Eigenvalue of symmetric mat eig_sym
VectorXcd eigs = T.eigenvalues()
Subvector v.mid(1,n) a.head(1,n)
Sum squares sum_sqr(v) v.array().pow(2).sum()
trace trace(a) a.trace()
min value min(a) a.minCoeff()
max value max(a) a.maxCoeff()
Random uniform randu(size) VectorXi::Random(size)
Concat vectors concat(a,b) VectorXi ret(a.size()+b.size()); ret << a,b;
Sort vector Sort sorter;
sorter.sort(0, a.size()-1, a)
std::sort(a.data(), a.data()+a.size());
Sort index Sort sorter;
sorter.sort_index(0, a.size()-1, a)
N/A
Get columns a.get_cols(cols_vec) N/A
Random normal randn(size) N/A