Showing posts with label PMF. Show all posts
Showing posts with label PMF. Show all posts

Friday, June 17, 2011

KDD Cup progress update - We are in the 5th place!

I am proud to report that yesterday JustinYan, a graduate student from the Institute of Automation, Chinese Academy of Sciences, a member of the LeBuSiShu team, suggested that I will join their team.
The reason is that LeBuSiShu is using GraphLab ALS solution as one of the components in their final result. Currently LeBuSiShu is the 5th group in the leaderboard (out of 100 groups!) with RMSE = 22.2820.

Anyway, of course I was honored to accept as well as quite excited! Although when writing the GraphLab collaborative filtering code of matrix factorization I had no competition in mind, it is always nice to know that someone finds your solution useful.

Besides of JustinYun, another nice team member in the LeBuSiShu group is: Yao Wu, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academic of Sciences.

A quick update 6/23: We are actually now at the 4th place! The Chinese group is an amazing team, they simply work around the clock. I myself am running some fancy MCMC methods for slightly improving prediction, but the Chinese team is responsible of 99.9% of the progress.

Thursday, June 9, 2011

GraphLab PMF on 32 bit Ubuntu

NOTE: This code is deprecated. Please take a look here for GraphChi (multicore implementation): http://bickson.blogspot.co.il/2012/12/collaborative-filtering-with-graphchi.html
Or here for GraphLab (distributed implementation): http://graphlab.org/toolkits/collaborative-filtering/
As GraphLab collaborative filtering library is growing, we are adding new algorithms and getting more and more users (around 110 unique installations originated from this blog in the last couple of months). Yesterday I had an interesting email exchange with Timmy Wilson, our man in Cleveland Ohio. Timmy is working on clustering in social networks.

Timmy tried to install Graphlab on linode, but encountered some problems. Here is what he writes:

Installing Graphlab on Ubuntu 10.04 (32bit)
Or, 'I wish i had installed 64bit Ubuntu'

I followed Danny's instructions here:
http://bickson.blogspot.com/2011/04/yahoo-kdd-cup-using-graphlab.html

At the end of this email, attached are all the commands I used.

I can't say for sure, but i assume everything would have went without
a hitch if i were running the 64bit version of Ubuntu. When trying to compile GraphLab I got the following error:
CMakeFiles/disk_graph_test.cxxtest.dir/disk_graph_test.cxx.o: In
function `graphlab::atomic::inc()':
/home/timmyt/graphlabapi/src/graphlab/parallel/atomic.hpp:39:
undefined reference to `__sync_add_and_fetch_8'

This is the output of my
gcc version:

$ gcc -v
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
4.4.3-4ubuntu5'
--with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr
--enable-shared --enable-multiarch --enable-linker-build-id
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4
--program-suffix=-4.4 --enable-nls --enable-clocale=gnu
--enable-libstdcxx-debug --enable-plugin --enable-objc-gc
--enable-targets=all --disable-werror --with-arch-32=i486
--with-tune=generic --enable-checking=release --build=i486-linux-gnu
--host=i486-linux-gnu --target=i486-linux-gnu
Thread model: posix
gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)

Luckily, the graphlab guys take this stuff pretty seriously.  "There
never was a compilation problem we failed to solve the same day...",
Danny told me.

Here Yucheng Low describes my problem:

> The problems you are encountering are somewhat complicated.
> I suspect you have a strange mismatch between what gcc thinks your system is
> and what your system architecture actually is.
>
> According to an email you sent before, gcc seems to think the target
> architecture is 486. I am going to assume that you are not actually running
> this on a 486 machine.  The missing "__sync_add_and_fetch_8" instructions
> are available 586 and beyond. What architecture are you running on?
>
> This ITPP error looks like ITPP was build without SSE2 support possibly
> because it used the default architecture as defined by gcc. However, for
> performance reasons, the GraphLab compile flags force SSE/SSE2 through
> -mfpmath=sse -msse2
>
> Two possible solutions:
> 1: line 310 through line 328 of the root CMakeLists.txt
> To each set of compile flags add -march=pentium, delete -mfpmath=sse -msse2
>
> 2: More fundamentally, try to figure out why gcc thinks you are running on a
> 486.

First the more interesting problem:

> 2: More fundamentally, try to figure out why gcc thinks you are running on a
> 486.

I wrote the guys at http://www.linode.com/ and got a response in 15minutes:

> Ubuntu compiles their software optimized for the 486 architecture in its 32-bit
> iteration. There is not a significant performance increase when optimizing for
> 686 compared to optimizing for 486, so 486 optimization remains the default
> for 32-bit Ubuntu/Debian systems.

Mystery solved.

I opted for the first solution:

> 1: line 310 through line 328 of the root CMakeLists.txt
> To each set of compile flags add -march=pentium, delete -mfpmath=sse -msse2


Then build pmf:
$ cd ~/graphlabapi/release/demoapps/pmf
$ make

Now I got the following error:
PMF compilation error:
[ 97%] Built target graphlab
[100%] Built target itdiff
Linking CXX executable pmf
CMakeFiles/pmf.dir/pmf.o: In function `itpp::DSFMT<19937 data-blogger-escaped-10376655713290109737ull="" data-blogger-escaped-1047295u="" data-blogger-escaped-1048063u="" data-blogger-escaped-117="" data-blogger-escaped-19="" data-blogger-escaped-1ull="" data-blogger-escaped-4237361149u="" data-blogger-escaped-4291106551315987578ull="" data-blogger-escaped-4294966079u="" data-blogger-escaped-4432916062321256576ull="" data-blogger-escaped-4498102069230399ull="" data-blogger-escaped-4501400546508797ull="">::init_gen_rand(unsigned int)':
/usr/local/include/itpp/base/random_dsfmt.h:161: undefined reference
to `itpp::DSFMT<19937 data-blogger-escaped-10376655713290109737ull="" data-blogger-escaped-1047295u="" data-blogger-escaped-1048063u="" data-blogger-escaped-117="" data-blogger-escaped-19="" data-blogger-escaped-1ull="" data-blogger-escaped-4237361149u="" data-blogger-escaped-4291106551315987578ull="" data-blogger-escaped-4294966079u="" data-blogger-escaped-4432916062321256576ull="" data-blogger-escaped-4498102069230399ull="" data-blogger-escaped-4501400546508797ull="">::sse2_param_mask'
CMakeFiles/pmf.dir/pmf.o: In function `itpp::DSFMT<19937 data-blogger-escaped-10376655713290109737ull="" data-blogger-escaped-1047295u="" data-blogger-escaped-1048063u="" data-blogger-escaped-117="" data-blogger-escaped-19="" data-blogger-escaped-1ull="" data-blogger-escaped-4237361149u="" data-blogger-escaped-4291106551315987578ull="" data-blogger-escaped-4294966079u="" data-blogger-escaped-4432916062321256576ull="" data-blogger-escaped-4498102069230399ull="" data-blogger-escaped-4501400546508797ull="">::do_recursion(itpp::DSFMT<19937 data-blogger-escaped-10376655713290109737ull="" data-blogger-escaped-1047295u="" data-blogger-escaped-1048063u="" data-blogger-escaped-117="" data-blogger-escaped-19="" data-blogger-escaped-1ull="" data-blogger-escaped-4237361149u="" data-blogger-escaped-4291106551315987578ull="" data-blogger-escaped-4294966079u="" data-blogger-escaped-4432916062321256576ull="" data-blogger-escaped-4498102069230399ull="" data-blogger-escaped-4501400546508797ull="">::W128_T*, itpp::DSFMT<19937 data-blogger-escaped-10376655713290109737ull="" data-blogger-escaped-1047295u="" data-blogger-escaped-1048063u="" data-blogger-escaped-117="" data-blogger-escaped-19="" data-blogger-escaped-1ull="" data-blogger-escaped-4237361149u="" data-blogger-escaped-4291106551315987578ull="" data-blogger-escaped-4294966079u="" data-blogger-escaped-4432916062321256576ull="" data-blogger-escaped-4498102069230399ull="" data-blogger-escaped-4501400546508797ull="">::W128_T*,
itpp::DSFMT<19937 data-blogger-escaped-10376655713290109737ull="" data-blogger-escaped-1047295u="" data-blogger-escaped-1048063u="" data-blogger-escaped-117="" data-blogger-escaped-19="" data-blogger-escaped-1ull="" data-blogger-escaped-4237361149u="" data-blogger-escaped-4291106551315987578ull="" data-blogger-escaped-4294966079u="" data-blogger-escaped-4432916062321256576ull="" data-blogger-escaped-4498102069230399ull="" data-blogger-escaped-4501400546508797ull="">::W128_T*,
itpp::DSFMT<19937 data-blogger-escaped-10376655713290109737ull="" data-blogger-escaped-1047295u="" data-blogger-escaped-1048063u="" data-blogger-escaped-117="" data-blogger-escaped-19="" data-blogger-escaped-1ull="" data-blogger-escaped-4237361149u="" data-blogger-escaped-4291106551315987578ull="" data-blogger-escaped-4294966079u="" data-blogger-escaped-4432916062321256576ull="" data-blogger-escaped-4498102069230399ull="" data-blogger-escaped-4501400546508797ull="">::W128_T*)':
/usr/local/include/itpp/base/random_dsfmt.h:375: undefined reference

It seems that libitpp.a location was not identified. I linked to the installed itpp libraries by adding the following line below line 210 of CMakeLists.txt:
link_directories(/usr/local/lib/)


Awesome -- it works!

Timmy has also provided step by step instructions:

Instructions


http://bickson.blogspot.com/2011/04/yahoo-kdd-cup-using-graphlab.html



Installing itpp


http://bickson.blogspot.com/search/label/itpp

$ sudo apt-get install --yes --force-yes automake autoconf libtool* gfortran  
$ sudo apt-get install --yes --force-yes liblapack-dev
$ export LDFLAGS="-L/usr/lib -lgfortran"
$ cd ~/
$ wget http://sourceforge.net/projects/itpp/files/itpp/4.2.0/itpp-4.2.tar.gz  
$ tar xvzf itpp-4.2.tar.gz  
$ cd itpp-4.2  
$ ./autogen.sh  
$ ./configure --without-fft --with-blas=/home/ubuntu/lapack-3.3.0/blas_LINUX.a --with-lapack=/home/ubuntu/lapack-3.3.0/lapack_LINUX.a CFLAGS=-fPIC CXXFLAGS=-fPIC CPPFLAGS=-fPIC  
$ make  
$ sudo make install 



Installing graphlabapi

$ hg clone https://graphlabapi.googlecode.com/hg/ graphlabapi
$ cd graphlabapi/
$ ./configure --bootstrap


CMakeLists.txt edits


$ vim CMakeLists.txt

add following line below line 210:
link_directories(/usr/local/lib/)

replace all instances of: (be careful - only for 32 bit Linux!)
-mfpmath=sse -msse2

with
-march=pentium


Build PMF


$ cd ~/graphlabapi/release/demoapps/pmf
$ make 


Test

$ ./pmf smalltest 0 --float=true --scheduler="round_robin(max_iterations=15)"
Thanks so much Timmy - we really appreciate your feedback!

Wednesday, April 27, 2011

Yahoo! KDD Cup using Graphlab - Track 2?

I was delighted to hear from Suhrid Balakrishnan from AT&T Labs, that he is using GraphLab pmf for factorizing a linear model for Yahoo! KDD cup - track 2. Initially I focused only on track1, but it seems that Graphlab is potentially useful for track2 as well.

Overall, this month I am aware of 20 installations of Graphlab pmf code on various research groups all over the world. Specifically I got feedback from University of Austin, University of the Aegean, University of Macedonia, Carnegie Mellon University and AT&T Labs.

I got several valuable inputs regarding his experience with GraphLab I wanted to share and ask if anyone had the same experience.

1) It is recommended to download latest version from mercurial repository. See explanation:
in my previous post http://bickson.blogspot.com/2011/04/yahoo-kdd-cup-using-graphlab.html
I am constantly improving the matrix factorization code and adapting it to the KDD dataset. 

2) It is better to install itpp first, since Graphlab auto detects it and it saves much later trouble.

3) Suhrid got an interesting error when saving the factorized matrices U,V. It seems that on his Ubuntu system, factor ordering was somehow reversed.
The following matlab code solved this issue:
Ud=reshape(U(:),size(U,2),size(U,1));
Ud=Ud';

I have opened a google group for discussions and questions concerning KDD usage of GraphLab. Everyone is welcome to join:
* Group name: GraphLab KDD
* Group home page: http://groups.google.com/group/graphlab-kdd
* Group email address graphlab-kdd@googlegroups.com

Friday, April 8, 2011

GraphLab on BlackLight!!

I am super excited to report that GraphLab is up and running on BlackLight, the largest
shared memory computer in the world! With 32TB shared memory and 4,096 cores.

A tutorial for BlackLight is found here

I will soon post some performance results for matrix factorization algorithms, as we make more progress in testing.

Below you can find some instructions on how to install Graphlab on BlackLight, to those of you who are lucky enough to get an account.. :-)

GraphLab Installation

1) Login using ssh into  tg-login1.blacklight.psc.teragrid.org

2) Follow the instructions on http://graphlab.org/download.html to obtain
GraphLab code/

3)
module load cmake boost kyotocabinet IT++
cd graphlabapi
./configure --bootstrap --itpp_include_dir=${ITPP_INC} --itpp_static_link_dir=${ITPP_LIB} -D MKL_PATH=${MKL_PATH}
cd release
make -j 8

Note: Thanks to Joel Welling from Pittsburgh Supercomputing Center, who significantly helped simplifying installation as well as improving performance.

Example GraphLab PMF job
Create a file named kddcup.job with the following content:
#!/bin/csh
#PBS -l ncpus=16
#ncpus must be a multiple of 16
#PBS -l walltime=4:00:00                  
#PBS -j oe
#PBS -q batch
#PBS -m bea
set echo

ja

#move to my $SCRATCH directory
cd $SCRATCH

#copy executable to $SCRATCH
cp $HOME/graphlabapi/release/demoapps/pmf/pmf .

#run my executable
omplace -nt $PBS_NCPUS ./pmf kddcup 0 --scheduler="round_robin(max_iterations=20)" --float=true --zero=true --lambda=1 --D=150 --ncpus=$PBS_NCPUS --aggregatevalidation=true
cp $SCRATCH/kddcupt.kdd.out $HOME/$PBS_JOBID.kdd.out

ja -chlst

Submit this job using the command
qsub kddcup.job

Check the status of the job using the command
qstat 

Check remaining qouta:
bickson@tg-login1:~> xbanner


PSC Grantnumber: DMS110004P Teragrid Grantnumber: DMS110015
P.I. Name: Carlos Guestrin
 Resource        = BLACKLIGHT
 Charge ID       = ms3bdkp
 Start Date      = 02/10/2011
 Expiration Date = 02/10/2012
 Allocation      = 50000.00
 Remaining       = 48541.23
 Last Job        = 06/16/2011

Last Accounting Update: 06/16/2011