Download version 6.1 from here. Compile using:
make make install
Note: A newer version of VW is now found in GitHub.
Here are some tutorial slides given in the big learning workshop.
Now to a quick example on how to run logistic regression:
Prepare an input file named inputfile with the following data in it:
-1 | 1:12 2:3.5 4:1e-2 1 | 3:11 4:12 -1 | 2:4 3:1
Explanation: -1/1 are the labels. 1:12 -> means that the first feature is 12. 2:3.5 means
that the 2nd feature is 3.5 and so on. Note that feature names can be strings, as well as their values. In case feature are string they will be hashed into integers during the run.
Now run vw using:
./vw -d A --loss_function logistic --readable_model outfile using no cache Reading from A num sources = 1 Num weight bits = 18 learning rate = 10 initial_t = 1 power_t = 0.5 average since example example current current current loss last counter weight label predict features 0.679009 0.679009 3 3.0 -1.0000 -0.1004 3 finished run number of examples = 3 weighted example sum = 3 weighted label sum = -1 average loss = 0.679 best constant = -1 total feature number = 10
The program output is:
bickson@thrust:~/JohnLangford-vowpal_wabbit-9c65131$ cat outfile Version 6.1 Min label:-100.000000 max label:100.000000 bits:18 ngram:0 skips:0 index:weight pairs: rank:0 lda:0 1:-0.139726 2:-0.360716 3:-0.011953 4:0.074106 116060:-0.085449
In the above example, we did a single pass on the dataset. Now assume we want to make several passes for fine tuning the solution. We can do:
./vw -d inputfile --loss_function logistic --readable_model outfile --passes 6 -c
Explanation: -c means creating a cache file, which significantly speeds execution. it is required when running multiple iterations.
When running multiple passes we get:
creating cache_file = inputfile.cache Reading from inputfile num sources = 1 Num weight bits = 18 learning rate = 10 initial_t = 1 power_t = 0.5 decay_learning_rate = 1 average since example example current current current loss last counter weight label predict features 0.895728 0.895728 3 3.0 -1.0000 0.2633 5 0.626871 0.358014 6 6.0 -1.0000 -0.9557 5 0.435506 0.205868 11 11.0 1.0000 1.1889 5 finished run number of examples = 18 weighted example sum = 18 weighted label sum = -6 average loss = 0.3181 best constant = -0.4118 total feature number = 102
Now assume we want to compute predictions on test data. We use the same command as before:
./vw -d inputfile --loss_function logistic -f outfile
But we changes the --readable_model to -f, output binary file.
Next we compute predictions on test data using:
./vw -d testinputfile --loss_function logistic -i outfile -t -p out_predictions
Note that we use the -i flag to indicate the input model, and -p flag to output the predictions file to.
Further reading: a more comprehensive VW tutorial by Rob Zinkov.
I'm still pretty new to VW, so thought I'd mention a couple of things that caused me to waste a fair bit of time when I first started using it.
ReplyDeleteFirst, the input file format is very complex and very particular. For example, it is extremely sensitive to white space - e.g.
1 |test two three
means something completely different than
1 | test two three
(The former has the features "two" and "three" in the "test" namespace, while the latter has three features and no namespace, resulting in a very different model.)
Also, the input file format is the same for generating predictions as for generating a model, even though certain fields (like the labels) are completely irrelevant and ignored when generating predictions.
This Kevin! I also did notice that any change in format for example missing space may result in loss of data.. I think that currently they have no input sanity check and completely ignore formats which are not 100% compatible to what they had in mind.
DeleteThank you for this simple end to end intro with commands you can actually run and understand the output from. The main vw site does not give you that.
ReplyDeleteThanks Rado! I am glad my note was useful!
DeleteThanks for this note. Super useful.
ReplyDeleteThe input format is indeed very particular.
Does VW deal with multiclass classification? If yes, the first column can be any string other than 0/1. Is that correct?
As far as I know only binary classification. See here: http://tech.groups.yahoo.com/group/vowpal_wabbit/message/600
DeleteI would like to use VW for regression problems. The example provided in John's website is not clear to me.
ReplyDeleteAlso, the spaces in the input format sometimes gives "malformed example". Though I changed the spaces of the input file. I still am getting these errors.
Hi!
DeleteDid you try to run my example? :-)
VW is very sensitive to missing spaces.
Your example and also other available examples (0001.dat) in the VW directory are working very fine. My input data is in CSV format and have written a code in Python to convert that into VW input format.
DeleteNot all the observations in my input data file give the error type "mallformed example" but only few (e.g. 200 out of 140000 observations). If my conversion from CSV to VW input format is wrong, then it should have given the error for all the observations in the input file.
Also, I am more curious to know whether VW works for regression problems ? If so, can you help. Thanks Danny!
VW is excellent for regression problems. You should debug the problematic examples to understand why they fail.
DeleteSo far I've installed and compiled GraphCHI and VowpalWabbit (and boost) and both seem to be working with their test data.
ReplyDeleteI wrote a Java utility to convert flight CSV to VW input format. Head of several million records looks like this:
head /Volumes/brad/Dropbox-Overflow/ASADataExpo2009/2008.csv.txt
-14 1 1|DepDelay:8 FlightNum:335 Distance:810 DepTime:2003 ActualElapsedTime:128 ArrTime:2211 AirTime:116 DayofMonth:3 Month:1 DayOfWeek:4
2 1 2|DepDelay:19 FlightNum:3231 Distance:810 DepTime:754 ActualElapsedTime:128 ArrTime:1002 AirTime:113 DayofMonth:3 Month:1 DayOfWeek:4
Am I doing this right?
And running vw looks like this:
imac:vowpalWabbit Brad$ vw /Volumes/brad/Dropbox-Overflow/ASADataExpo2009/2008.csv.txt --cache --audit >audit.txt -p pred.txt
using cache_file = /Volumes/brad/Dropbox-Overflow/ASADataExpo2009/2008.csv.txt.cache
ignoring text input in favor of cache input
num sources = 1
Num weight bits = 18
learning rate = 10
initial_t = 1
power_t = 0.5
predictions = pred.txt
average since example example current current current
loss last counter weight label predict features
253.365987 253.365987 3 3.0 14.0000 -3.5527 10
222.075467 190.784946 6 6.0 11.0000 34.0000 10
183.182085 136.510027 11 11.0 1.0000 -0.3415 10
547.109904 911.037723 22 22.0 19.0000 -18.0000 10
993.467205 1439.824505 44 44.0 -26.0000 -26.0000 10
3621.129285 6309.899785 87 87.0 78.0000 120.9613 10
4906.932688 6192.736091 174 174.0 22.0000 114.8907 10
4332.321049 3757.709410 348 348.0 39.0000 39.3193 10
3316.887131 2301.453214 696 696.0 95.0000 78.1861 10
2940.052849 2563.218567 1392 1392.0 23.0000 79.7706 10
2308.099629 1676.146409 2784 2784.0 13.0000 -10.1179 10
2003.443074 1698.786520 5568 5568.0 2.0000 -5.9222 10
1964.735590 1926.021153 11135 11135.0 21.0000 0.9365 10
1384.648145 804.508600 22269 22269.0 28.0000 -0.0364 10
922.324877 459.980847 44537 44537.0 0.0000 -1.6663 10
888.889076 855.452525 89073 89073.0 -6.0000 -9.0099 10
1138.161107 1387.433138 178146 178146.0 17.0000 12.1046 10
1090.651227 1043.141080 356291 356291.0 -5.0000 -0.1046 10
1182.735683 1274.820139 712582 712582.0 45.0000 93.6976 10
1232.995228 1283.254843 1425163 1425163.0 -5.0000 -12.3362 10
1134.289299 1035.583370 2850326 2850326.0 43.0000 54.9404 10
1216.480057 1298.670845 5700651 5700651.0 42.0000 59.5428 10
finished run
number of examples = 6855029
weighted example sum = 6.855e+06
weighted label sum = 5.599e+07
average loss = 1376
best constant = 8.168
total feature number = 68550290
And the prediction file (pred.txt) looks like this:
imac:vowpalWabbit Brad$ head pred.txt
0.000000 1
-14.000000 2
-3.552719 3
-12.584439 4
34.000000 5
34.000000 6
57.000000 7
-1.245023 8
14.000894 9
-0.000006 10
imac:vowpalWabbit Brad$ tail pred.txt
27.738859 7009719
-0.110489 7009720
-3.580440 7009721
0.160012 7009722
-0.080507 7009723
3.700165 7009724
0.018907 7009725
-2.126096 7009726
15.932721 7009727
17.999048 7009728
The problem is, I don't have a clue as to what any of this means. I watched the author's presentation,
but he took so much for granted it didn't help at all.
I think what I'm missing is the insider lingo. I assume "label" means one datum
from the learning set, and id is row number in my case. If so,
what is a "prediction?", particularly insofar as there's a separate one for each input redord
And so forth for other terms in this output. such as:
average since example example current current current
loss last counter weight label predict features
So, how to read these sheep entrails? ;) Hope you can help.
HI Brad,
DeleteAs far as I recall, each line in VW should have the target / label as the first item and then " | "
namely space + pipeline + space and then the rest of the field names. When they are missing spaces VW may silently fail without error. What are the additional two numbers before the "|" sign you have?
p.s.
DeleteIn this terminology, the label/target is the outcome field, for example in your flight data it is the number of minutes late arrival.
That's what I thought. First number is delay time (label), second is weight (1), third is id (row number), then space pipe, then the row elements, each named.
DeleteSorry, I meant just pipe, not space pipe.
Deleteplease remove weight and id - they do not have the meaning you think... and also add spaces
DeleteI assume "features" below means variable data that might affect the label value?
DeleteFeatures is a sequence of whitespace separated strings, each of which is optionally followed by a float (e.g., NumberOfLegs:4.0 HasStripes). Each string is a feature and the value is the feature value for that example. Omitting a feature means that its value is zero. Including a feature but omitting its value means that its value is 1.
Ok, building now. What I miscalled "id" is what he calls "tag" below:
DeleteTag is a string that serves as an identifier for the example. It is reported back when predictions are made. It doesn't have to be unique. The default value if it is not provided is the empty string. If you provide a tag, you must also provide an importance. If you don't provide a tag, put a space before the vertical bar.
Rebuilding the data set without the weight and tag gave exactly the same results, except that pred.txt now omits the tag values (row numbers).
DeleteDanny confirmed via chat that prediction means predicted arrival delay based on the provided feature values, which was my biggest question. There's one prediction per row because each row provides different feature values.
I am new to VW and have a question.
DeleteCan we enter multiple input files to VW and expect to get all predictions in the same output file?
Not as far as I know
DeleteHow can you do online learning with this? I have a model that I've trained and would like to give new input to update the model. Is this supported by VW?
ReplyDeleteNot in this sense - the algorithm is online in the sense that a single pass on the data is enough to build the model. But better ask in VW mailing list.
DeleteI see. Thanks for the reply :)
DeleteI noticed that the biglearn.org link to the slides is broken. The old link was http://biglearn.org/files/slides/invited/langford-01.pdf, the correct link is http://biglearn.org/2011/files/slides/invited/langford-01.pdf
ReplyDeletei followed the above steps n getting the following error in ubuntu:
ReplyDeleteonly testing
bad model format!
terminate called after throwing an instance of 'std::exception'
what(): std::exception
Aborted (core dumped)
am i missing something?
Thanks
Hi Danny,
ReplyDeleteI'm very new to VW (today in fact) and I am going to use it to have a play with the Kaggle Titanic data. I've formated the data file VW and run the training data. A binary model is created. Now I want to run that model against the test data. My input training data looks like this:
-1 survival| class:3 gender:0 age:22 sibsp:1 parch:0 fare:7.25
1 survival| class:1 gender:1 age:38 sibsp:1 parch:0 fare:71.2833
What format should the test data take? I've used the same layout as for the training data, but substituted the -1, 1 with 0 for survival. When I run VW it throws an exception 'bad model' error.
Any help would be greatly appreciated.
Thanks in advance
Paul
Here is a VW data file validator:
ReplyDeletehttp://hunch.net/~vw/validate.html