Large Scale Machine Learning and Other Animals: it++ vs. Eigen

Sunday, September 18, 2011

it++ vs. Eigen

it++ and Eigen are both popular and powerful matrix linear algebra packages for C++.

We got a lot of complaints from our users about the relative difficulty in installing it++, as well for its limited GPL license. We have decided to try and swith to Eigen linear library instead. Eigen has no installation since the code is composed of header files. It is licensed under LGPL3+ license.

Today I have created a pluggable interface that allows swapping it++ and Eigen underneath our GraphLab code. I have run some tests to verify speed and accuracy of Eigen vs. it++.

And here are the results:

Framework and Algorithm	Running time (sec)	Training RMSE	Validation RMSE
it++ ls_solve_chol	16.8	0.7000	0.9704
it++ ls_solve	17.8	0.7000	0.9704
Eigen ldlt	18.3	0.6745	0.9495
Eigen llt	18.7	0.6745	0.9495
Eigen JacobiSVD	63.0	0.6745	0.9495

Experiment details: I have used GraphLab's alternating least squares, with a subset of Netlix data. Dataset is described here. I let the algorithm run for 10 iterations, in release mode, on our AMD Opteron 8 core machine.

Experiment conclusions: It seems that Eigen is more accurate than it++. It slightly runs slower than it++ but accuracy of both training and validation RMSE is better.

Tho those of you who are familiar with it++ and would like to try out Eigen I made some short
list of compatible function calls of both systems.

	it++	Eigen
double matrix	mat	MatrixXd
double vector	vec	VectorXd
Value assignment	a.set(i,j,val)	a(i,j)=val
Get row	a.get_row(i)	a.row(i)
Identity matrix	eye(size)	Indentity(size)
Matrix/vecotr of ones	ones(size)	Ones(size)
Matrix/vecotr of zeros	zeros(size)	Zero(size)
Least squares solution	x=ls_solve(A,b)	x=A.ldlt().solve(b)
transpose	transpose(a) or a.transpose()	a.transpose()
set diagonal	a=diag(v)	a.diagonal()=v
sum values	a.sumsum()	a.sum
L2 norm	a.norm(2)	a.squaredNorm()
inverse	inv(a)	a.inverse()
outer product	outer_product(a,b)	a*b.transpose()
Eigenvalue of symmetric mat	eig_sym	VectorXcd eigs = T.eigenvalues()
Subvector	v.mid(1,n)	a.head(1,n)
Sum squares	sum_sqr(v)	v.array().pow(2).sum()
trace	trace(a)	a.trace()
min value	min(a)	a.minCoeff()
max value	max(a)	a.maxCoeff()
Random uniform	randu(size)	VectorXi::Random(size)
Concat vectors	concat(a,b)	VectorXi ret(a.size()+b.size()); ret << a,b;
Sort vector	Sort sorter; sorter.sort(0, a.size()-1, a)	std::sort(a.data(), a.data()+a.size());
Sort index	Sort sorter; sorter.sort_index(0, a.size()-1, a)	N/A
Get columns	a.get_cols(cols_vec)	N/A
Random normal	randn(size)	N/A

Large Scale Machine Learning and Other Animals

Sunday, September 18, 2011

it++ vs. Eigen

No comments:

Post a Comment

Labels

GraphLab Users Google Group

pagerank

google analytics

syntax