PAPER: shotgun paper appears in this year ICML 2011. The paper explains the approach we took to allow running our algorithm on multicore machines. It analyzes the theory, and justifies the cases where parallel execution does not hurt accuracy.
CODE: the shotgun code is found here: http://select.cs.cmu.edu/code/
TARGET: the goal of this code, is to handle large scale problems, that from the one hand fit into a multicore machine, but from the other hand, other solvers such as GLMNET, Boy'd l1 interior point methods and liblinear fail to scale.
LICENSE: The code is licensed under Apache license.
INTERFACES: We have both a C code version, as well as Matlab interface for running the code from within Matlab. Due to Patrick Harrington (OneRiot.com) request, we added support for Matrix Market input format.
Additional R interface is found here, thanks to Steve Lianoglou, Cornell graduate student.
We use the following cost function formulation.
argmin_x sum_i [(A_i*x - y_i)^2 + lambda * |x|_1]
For sparse logistic regression:
argmin_x sum_i [-log(1 + exp(-y_i * x* A_i) ) + lambda * |x|_1]
where |x|_1 is the first norm (sum of absolute value of the vector x).
x = shotgun_logreg(A,y,lambda) x = shotgun_lasso(A,y,lambda)
./mm_lasso [ A input matrix_file] [y input vector file] [x vector output file] [algorithm] [ threshold] [ K] [max_iter] [num_threads] [lammbda] Program inputs are: Matrix and vector files are mandaroty inputs Usage: ./mm_lasso -m matrix A in sparse matrix market format -v vector y in sparse matrix market format -o output file name (will contain solution vector x, default is x.mtx) -a algorithm (1=lasso, 2=logitic regresion, 3 = find min lambda for all zero solution) -t convergence threshold (default 1e-5) -k solution path length (for lasso) -i max_iter (default 100) -n num_threads (default 2) -l lammbda - positive weight constant (default 1) -V verbose: 1=verbose, 0=quiet (default 0)