Comments on Large Scale Machine Learning and Other Animals: Collaborative filtering - 3rd generation - part 2

Great find! I just merged your pull request. Much ...

2014-07-20T07:54:39.386-07:00

Great find! I just merged your pull request. Much appreciated!

Hi Danny, I looked into gensgd.cpp to find out the...

2014-07-20T04:32:48.935-07:00

Hi Danny, I looked into gensgd.cpp to find out the difference of RMSE. It turned out that step3 gets gensgd_rate multiplied 2 times instead of 1 for a step. Now it works. This seems to date from 2 commits made on oct 4 and 10 in 2013. I made a pull request. Regards, Xavier

p.s. I will be happy to setup up a phone call to d...

2014-07-19T15:24:08.706-07:00

p.s.
I will be happy to setup up a phone call to discuss your problem and give some advice regarding Graphlab Create evaluation.

Our project has open source foundations and you ca...

2014-07-19T15:22:36.768-07:00

Our project has open source foundations and you can always stick to the open source if you like. GraphLab Create, while not open source, is still free in the foreseeable future. Fine tuning the open source directly is more difficult. I am now traveling, I will be happy to take a look at the example in a few days - if you don't mind please post a question at our user forum: http://forum.graphlab.com so I could keep track of the issue and not forget.

Hi Danny, Thanks for your feedback. GraphLab Creat...

2014-07-19T15:02:07.847-07:00

Hi Danny,
Thanks for your feedback. GraphLab Create seems great but seems risky to me: I went into terms & conditions and read "We grant you a limited, revocable license". I am currently testing different solutions and it seems hard to know what is the future of such an option considering t&c.

Hi Xavier, We have re-implmentated this code as p...

2014-07-19T13:44:07.769-07:00

Hi Xavier,
We have re-implmentated this code as part of GraphLab Create. You are highly encouraged to try it out - it is free and it gets to much better results. Send me an email and I will send you the ipython notebook to reproduce the exact same experiment in GLC.

Hello, Great thanks for this post. I was able to r...

2014-07-19T09:29:32.273-07:00

Hello, Great thanks for this post. I was able to run all of the different samples but I get an RMSE far higher than expected even after many iterations.

For the exemple which should lead to 2 minutes RMSE, I get an RMSE of 32 minutes after 19 iterations.

I run an Ubuntu, could it be a library issue or setup ?

Thanks

The userproductmatrix you sent me has only 3 colum...

2013-10-01T00:06:57.332-07:00

The userproductmatrix you sent me has only 3 columns. In that case there is no point in using gensgd - you should use sgd. (Unless you have more columns in your version)

Hi Danny, I am running it using the following com...

2013-09-30T23:46:17.840-07:00

Hi Danny,
I am running it using the following command

$GRAPHCHI_ROOT/toolkits/collaborative_filtering/gensgd --training=userproductmatrix --test=userproducttestfile --from_pos=0 --to_pos=1 --val_pos=3 --rehash=1 --gensgd_rate3=1e-5 --gensgd_mult_dec=0.9999 --max_iter=20 --file_columns=4 --gensgd_rate1=1e-5 --gensgd_rate2=1e-5 --quiet=1 --features=2

Send me an example input file and I will take a lo...

2013-09-30T09:11:36.653-07:00

Send me an example input file and I will take a look - most chances you are probably having one of the command line arguments wrong.

Both files are in same format, all the fields are ...

2013-09-30T05:13:50.963-07:00

Both files are in same format, all the fields are separated by space. But no predictions are captured in test file. I was able to run the gensgd command successfully.

Test file should be in the same exact format as tr...

2013-09-30T03:26:57.749-07:00

Test file should be in the same exact format as training file.
So if you have a csv for training, you should have csv with the same format for test

Hi Danny, I have given the test file in the follo...

2013-09-30T03:21:44.913-07:00

Hi Danny,
I have given the test file in the following format
userid productid
test file contains all the user ids and product ids of the training file. But I have not found any predictions in the test data. Please help me whether this is proper way of running

Gensgd is not support for the rating command. The ...

2013-09-30T01:29:08.412-07:00

Gensgd is not support for the rating command.
The only option you have is to give a file with --test=FILENAME
and then you will get predictions for each line of features in the test data.
(test data should have same format as training data)

When I am giving gensgd it is giving following exc...

2013-09-30T01:26:11.036-07:00

When I am giving gensgd it is giving following exception

FATAL: rating.cpp(main:296): --algorithms should be one of: als, sparse_als, sgd, nmf, wals

1) which algorithm are you running - sgd? 2) you n...

2013-09-30T01:20:30.248-07:00

1) which algorithm are you running - sgd?
2) you need to give the same string as given to the sgd utility using the --training=XXXX command.

I am running this algorithm and it produced *_U.mm...

2013-09-30T00:28:50.529-07:00

I am running this algorithm and it produced *_U.mm file after execution. To get the recommendation, I tried to run rating command and getting following exception

$GRAPHCHI_ROOT/toolkits/collaborative_filtering/rating --training=*_U.mm --num_ratings=5 --quiet=1 --algorithm=sgd
WARNING: common.hpp(print_copyright:183): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any comments or bug reports to danny.bickson@gmail.com
[training] => [*_U.mm]
[num_ratings] => [5]
[quiet] => [1]
[algorithm] => [sgd]
FATAL: io.hpp(read_matrix_market_banner_and_size:61): Sorry, this application does not support complex values and requires a sparse matrix.
terminate called after throwing an instance of 'char const*'
Aborted

Please help me where am I doing wrong

Hi Alex, 1) You are right. The default cutoff is 0...

2013-02-02T00:15:08.748-08:00

Hi Alex,
1) You are right. The default cutoff is 0.
2) --minval and --maxval are optional arguments, the slightly improve performance in some cases but when the result is any in the range, there is no need to truncate.
3) --minval and --maxval are independent of the loss function used, you can use them with any loss function.
4) Please send our user mailing list (graphlab-kdd) the exact command you used and the error you got using the --validation - it should work. (Even better if you have some small dataset to show the error).
5) The --test option should work - send me a scenario where you get an error and I will debug it.
6) Adding implicit rating does not have feature information and thus I suggest not to apply it here.

Best,

Hello Danny, As i said in another post i'm wo...

2013-02-01T12:40:28.569-08:00

Hello Danny,

As i said in another post i'm working on a one class and i tried your new soft on my database, i have few questions.

- in your first example you set "--minval=-1 --maxval=1 --calc_error=1" but no cutoff, it automatically set the cutoff value at 0 ?

- in the sparse example you don't set --minval and --maxval but --cutoff=0.5, is there a specific reason you write the command this way in this case ?

- when you set --minval and --maxval what kind of loss function is used ?

- you use the --validation option in the sparse example but when i try to use it with gensgd it doesn't work, is it normal ?

- do you plan to implement the --test option ?

- as i'm dealing with a one class problem i tried the implicite rating option and it worked but i'm curious of what is done when features option is used, what value are put to the features associated to these additionnal ratings ?

Thanks.

Regards.

Thanks Danny. It works perfectly now.

2013-01-04T07:09:22.682-08:00

Thanks Danny.

It works perfectly now.

Hi, Sorry about that. Please retake from mercuria...

2013-01-02T10:37:01.325-08:00

Hi,
Sorry about that. Please retake from mercurial using "hg pull; hg update" and recompile using "make clean; make cf". A MAC OS contributed patch that was supposed to fix getline() missing function did a mess in the Linux version..
Let me know if it now works.

Hello, I've installed graphlab on a VM with u...

2013-01-02T06:35:44.397-08:00

Hello,

I've installed graphlab on a VM with unbuntu and ran the demo scripts of this page and got some errors :

dataset 2008.CSV
- traditional matrix factorization : OK
- temporal matrix factorization :
[Other]
app: sharder
gensgd: malloc.c:2451: sYSMALLOc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' failed.
Aborted (core dumped)
- More features : OK
- TaxiIn : OK

dataset Modeling_1.csv
INFO: gensgd.cpp(convert_matrixmarket_N:559): Starting to read matrix-market input. Matrix dimensions: 11 x 13, non-zeros: 400000
FATAL: gensgd.cpp(read_line:333): Error reading line 0 feature 115 [ N,N,,G,,8,0,1,0,0,0,0,1,0,0,0,0,4,0,0,1,0,0,0,B,U,0,,M,Y,0,0,0,0,0,1,1,1,0,2,0,A,C,0,J,18,Y,66,,A,U,U,U,U,U,34,,U,U,84,M,H,1,1,M,5,I,01,00,67,3,,E06,Y,7,3,0,05,0,37,78.09,30,63,36,13.27,59,,N,N,N,N,N,N,U,UU,U,07,6,J,4,,J,4,U,,Y,U,0,Y,,24,,,,h ]
terminate called after throwing an instance of 'char const*'
Aborted (core dumped)

The first error is strange because it works with more features and i couldn't find what's wrong in the second file that cause a reading error (tried to change --file_columns but still doesn't work).

Thanks.

Thanks for the update! I have fixed the documentat...

2012-12-16T22:17:57.283-08:00

Thanks for the update! I have fixed the documentation.

I found the problem: the file should be named &quo...

2012-12-16T15:27:36.641-08:00

I found the problem: the file should be named "2008.csv:info" not "csv.2008:info"

Hi, I've installed graph-chi on my macbook, a...

2012-12-16T15:09:53.868-08:00

Hi,

I've installed graph-chi on my macbook, and ran a few of the demo scripts without error. However, it appears I cannot load data from a .csv file. When I try to run "traditional matrix factorization" I get the following error: "FATAL: gensgd.cpp(convert_matrixmarket_N:582): Bug: can not add edge from 0 to J 0 since max is: 0x0"

It appears that the conversion from .csv to matrix market is failing. What could be causing this?

Thanks,

Zach