Download version 6.1 from here. Compile using:
make make install
Note: A newer version of VW is now found in GitHub.
Here are some tutorial slides given in the big learning workshop.
Now to a quick example on how to run logistic regression:
Prepare an input file named inputfile with the following data in it:
-1 | 1:12 2:3.5 4:1e-2 1 | 3:11 4:12 -1 | 2:4 3:1
Explanation: -1/1 are the labels. 1:12 -> means that the first feature is 12. 2:3.5 means
that the 2nd feature is 3.5 and so on. Note that feature names can be strings, as well as their values. In case feature are string they will be hashed into integers during the run.
Now run vw using:
./vw -d A --loss_function logistic --readable_model outfile using no cache Reading from A num sources = 1 Num weight bits = 18 learning rate = 10 initial_t = 1 power_t = 0.5 average since example example current current current loss last counter weight label predict features 0.679009 0.679009 3 3.0 -1.0000 -0.1004 3 finished run number of examples = 3 weighted example sum = 3 weighted label sum = -1 average loss = 0.679 best constant = -1 total feature number = 10
The program output is:
bickson@thrust:~/JohnLangford-vowpal_wabbit-9c65131$ cat outfile Version 6.1 Min label:-100.000000 max label:100.000000 bits:18 ngram:0 skips:0 index:weight pairs: rank:0 lda:0 1:-0.139726 2:-0.360716 3:-0.011953 4:0.074106 116060:-0.085449
In the above example, we did a single pass on the dataset. Now assume we want to make several passes for fine tuning the solution. We can do:
./vw -d inputfile --loss_function logistic --readable_model outfile --passes 6 -c
Explanation: -c means creating a cache file, which significantly speeds execution. it is required when running multiple iterations.
When running multiple passes we get:
creating cache_file = inputfile.cache Reading from inputfile num sources = 1 Num weight bits = 18 learning rate = 10 initial_t = 1 power_t = 0.5 decay_learning_rate = 1 average since example example current current current loss last counter weight label predict features 0.895728 0.895728 3 3.0 -1.0000 0.2633 5 0.626871 0.358014 6 6.0 -1.0000 -0.9557 5 0.435506 0.205868 11 11.0 1.0000 1.1889 5 finished run number of examples = 18 weighted example sum = 18 weighted label sum = -6 average loss = 0.3181 best constant = -0.4118 total feature number = 102
Now assume we want to compute predictions on test data. We use the same command as before:
./vw -d inputfile --loss_function logistic -f outfile
But we changes the --readable_model to -f, output binary file.
Next we compute predictions on test data using:
./vw -d testinputfile --loss_function logistic -i outfile -t -p out_predictions
Note that we use the -i flag to indicate the input model, and -p flag to output the predictions file to.
Further reading: a more comprehensive VW tutorial by Rob Zinkov.