Monday, January 9, 2012

Vowpal Wabbit Tutorial

Vowpal Wabbit is a popular online machine learning implementation for solving linear models like LASSO, sparse logistic regression, etc. Library was initiated in and written by John Langford, Yahoo! Research.

Download version 6.1 from here. Compile using:
make
make install

Note: A newer version of VW is now found in GitHub.

Here are some tutorial slides given in the big learning workshop.

Now to a quick example on how to run logistic regression:
Prepare an input file named inputfile with the following data in it:
-1 | 1:12 2:3.5 4:1e-2
1 | 3:11 4:12
-1 | 2:4 3:1

Explanation: -1/1 are the labels. 1:12 -> means that the first feature is 12. 2:3.5 means
that the 2nd feature is 3.5 and so on. Note that feature names can be strings, as well as their values. In case feature are string they will be hashed into integers during the run.

Now run vw using:
./vw -d A --loss_function logistic --readable_model outfile
using no cache
Reading from A
num sources = 1
Num weight bits = 18
learning rate = 10
initial_t = 1
power_t = 0.5
average    since       example  example    current  current  current
loss       last        counter   weight      label  predict features
0.679009   0.679009          3      3.0    -1.0000  -0.1004        3

finished run
number of examples = 3
weighted example sum = 3
weighted label sum = -1
average loss = 0.679
best constant = -1
total feature number = 10

Explanation: -d is the input file. --loss_function is the type of loss function (can be one of: squared,logistic,hinge,quantile,classic). --readable_model speicifies the output file name in readable format.

The program output is:
bickson@thrust:~/JohnLangford-vowpal_wabbit-9c65131$ cat outfile 
Version 6.1
Min label:-100.000000 max label:100.000000
bits:18
ngram:0 skips:0
index:weight pairs:
rank:0
lda:0
1:-0.139726
2:-0.360716
3:-0.011953
4:0.074106
116060:-0.085449


In the above example, we did a single pass on the dataset. Now assume we want to make several passes for fine tuning the solution. We can do:
./vw -d inputfile --loss_function logistic --readable_model outfile --passes 6 -c

Explanation: -c means creating a cache file, which significantly speeds execution. it is required when running multiple iterations.

When running multiple passes we get:
creating cache_file = inputfile.cache
Reading from inputfile
num sources = 1
Num weight bits = 18
learning rate = 10
initial_t = 1
power_t = 0.5
decay_learning_rate = 1
average    since       example  example    current  current  current
loss       last        counter   weight      label  predict features
0.895728   0.895728          3      3.0    -1.0000   0.2633        5
0.626871   0.358014          6      6.0    -1.0000  -0.9557        5
0.435506   0.205868         11     11.0     1.0000   1.1889        5

finished run
number of examples = 18
weighted example sum = 18
weighted label sum = -6
average loss = 0.3181
best constant = -0.4118
total feature number = 102

Now assume we want to compute predictions on test data. We use the same command as before:
./vw -d inputfile --loss_function logistic -f outfile 
But we changes the --readable_model to -f, output binary file.
Next we compute predictions on test data using:
 ./vw -d testinputfile --loss_function logistic -i outfile -t -p out_predictions
Note that we use the -i flag to indicate the input model, and -p flag to output the predictions file to.

Further reading: a more comprehensive VW tutorial by Rob Zinkov.

28 comments:

  1. I'm still pretty new to VW, so thought I'd mention a couple of things that caused me to waste a fair bit of time when I first started using it.

    First, the input file format is very complex and very particular. For example, it is extremely sensitive to white space - e.g.

    1 |test two three

    means something completely different than

    1 | test two three

    (The former has the features "two" and "three" in the "test" namespace, while the latter has three features and no namespace, resulting in a very different model.)

    Also, the input file format is the same for generating predictions as for generating a model, even though certain fields (like the labels) are completely irrelevant and ignored when generating predictions.

    ReplyDelete
    Replies
    1. This Kevin! I also did notice that any change in format for example missing space may result in loss of data.. I think that currently they have no input sanity check and completely ignore formats which are not 100% compatible to what they had in mind.

      Delete
  2. Thank you for this simple end to end intro with commands you can actually run and understand the output from. The main vw site does not give you that.

    ReplyDelete
    Replies
    1. Thanks Rado! I am glad my note was useful!

      Delete
  3. Thanks for this note. Super useful.
    The input format is indeed very particular.
    Does VW deal with multiclass classification? If yes, the first column can be any string other than 0/1. Is that correct?

    ReplyDelete
    Replies
    1. As far as I know only binary classification. See here: http://tech.groups.yahoo.com/group/vowpal_wabbit/message/600

      Delete
  4. I would like to use VW for regression problems. The example provided in John's website is not clear to me.

    Also, the spaces in the input format sometimes gives "malformed example". Though I changed the spaces of the input file. I still am getting these errors.

    ReplyDelete
    Replies
    1. Hi!
      Did you try to run my example? :-)
      VW is very sensitive to missing spaces.

      Delete
    2. Your example and also other available examples (0001.dat) in the VW directory are working very fine. My input data is in CSV format and have written a code in Python to convert that into VW input format.

      Not all the observations in my input data file give the error type "mallformed example" but only few (e.g. 200 out of 140000 observations). If my conversion from CSV to VW input format is wrong, then it should have given the error for all the observations in the input file.


      Also, I am more curious to know whether VW works for regression problems ? If so, can you help. Thanks Danny!

      Delete
    3. VW is excellent for regression problems. You should debug the problematic examples to understand why they fail.

      Delete
  5. So far I've installed and compiled GraphCHI and VowpalWabbit (and boost) and both seem to be working with their test data.
    I wrote a Java utility to convert flight CSV to VW input format. Head of several million records looks like this:

    head /Volumes/brad/Dropbox-Overflow/ASADataExpo2009/2008.csv.txt
    -14 1 1|DepDelay:8 FlightNum:335 Distance:810 DepTime:2003 ActualElapsedTime:128 ArrTime:2211 AirTime:116 DayofMonth:3 Month:1 DayOfWeek:4
    2 1 2|DepDelay:19 FlightNum:3231 Distance:810 DepTime:754 ActualElapsedTime:128 ArrTime:1002 AirTime:113 DayofMonth:3 Month:1 DayOfWeek:4

    Am I doing this right?

    And running vw looks like this:

    imac:vowpalWabbit Brad$ vw /Volumes/brad/Dropbox-Overflow/ASADataExpo2009/2008.csv.txt --cache --audit >audit.txt -p pred.txt
    using cache_file = /Volumes/brad/Dropbox-Overflow/ASADataExpo2009/2008.csv.txt.cache
    ignoring text input in favor of cache input
    num sources = 1
    Num weight bits = 18
    learning rate = 10
    initial_t = 1
    power_t = 0.5
    predictions = pred.txt
    average since example example current current current
    loss last counter weight label predict features
    253.365987 253.365987 3 3.0 14.0000 -3.5527 10
    222.075467 190.784946 6 6.0 11.0000 34.0000 10
    183.182085 136.510027 11 11.0 1.0000 -0.3415 10
    547.109904 911.037723 22 22.0 19.0000 -18.0000 10
    993.467205 1439.824505 44 44.0 -26.0000 -26.0000 10
    3621.129285 6309.899785 87 87.0 78.0000 120.9613 10
    4906.932688 6192.736091 174 174.0 22.0000 114.8907 10
    4332.321049 3757.709410 348 348.0 39.0000 39.3193 10
    3316.887131 2301.453214 696 696.0 95.0000 78.1861 10
    2940.052849 2563.218567 1392 1392.0 23.0000 79.7706 10
    2308.099629 1676.146409 2784 2784.0 13.0000 -10.1179 10
    2003.443074 1698.786520 5568 5568.0 2.0000 -5.9222 10
    1964.735590 1926.021153 11135 11135.0 21.0000 0.9365 10
    1384.648145 804.508600 22269 22269.0 28.0000 -0.0364 10
    922.324877 459.980847 44537 44537.0 0.0000 -1.6663 10
    888.889076 855.452525 89073 89073.0 -6.0000 -9.0099 10
    1138.161107 1387.433138 178146 178146.0 17.0000 12.1046 10
    1090.651227 1043.141080 356291 356291.0 -5.0000 -0.1046 10
    1182.735683 1274.820139 712582 712582.0 45.0000 93.6976 10
    1232.995228 1283.254843 1425163 1425163.0 -5.0000 -12.3362 10
    1134.289299 1035.583370 2850326 2850326.0 43.0000 54.9404 10
    1216.480057 1298.670845 5700651 5700651.0 42.0000 59.5428 10

    finished run
    number of examples = 6855029
    weighted example sum = 6.855e+06
    weighted label sum = 5.599e+07
    average loss = 1376
    best constant = 8.168
    total feature number = 68550290

    And the prediction file (pred.txt) looks like this:

    imac:vowpalWabbit Brad$ head pred.txt
    0.000000 1
    -14.000000 2
    -3.552719 3
    -12.584439 4
    34.000000 5
    34.000000 6
    57.000000 7
    -1.245023 8
    14.000894 9
    -0.000006 10

    imac:vowpalWabbit Brad$ tail pred.txt
    27.738859 7009719
    -0.110489 7009720
    -3.580440 7009721
    0.160012 7009722
    -0.080507 7009723
    3.700165 7009724
    0.018907 7009725
    -2.126096 7009726
    15.932721 7009727
    17.999048 7009728

    The problem is, I don't have a clue as to what any of this means. I watched the author's presentation,
    but he took so much for granted it didn't help at all.

    I think what I'm missing is the insider lingo. I assume "label" means one datum
    from the learning set, and id is row number in my case. If so,
    what is a "prediction?", particularly insofar as there's a separate one for each input redord

    And so forth for other terms in this output. such as:
    average since example example current current current
    loss last counter weight label predict features

    So, how to read these sheep entrails? ;) Hope you can help.

    ReplyDelete
    Replies
    1. HI Brad,
      As far as I recall, each line in VW should have the target / label as the first item and then " | "
      namely space + pipeline + space and then the rest of the field names. When they are missing spaces VW may silently fail without error. What are the additional two numbers before the "|" sign you have?

      Delete
    2. p.s.
      In this terminology, the label/target is the outcome field, for example in your flight data it is the number of minutes late arrival.

      Delete
    3. That's what I thought. First number is delay time (label), second is weight (1), third is id (row number), then space pipe, then the row elements, each named.

      Delete
    4. Sorry, I meant just pipe, not space pipe.

      Delete
    5. please remove weight and id - they do not have the meaning you think... and also add spaces

      Delete
    6. I assume "features" below means variable data that might affect the label value?

      Features is a sequence of whitespace separated strings, each of which is optionally followed by a float (e.g., NumberOfLegs:4.0 HasStripes). Each string is a feature and the value is the feature value for that example. Omitting a feature means that its value is zero. Including a feature but omitting its value means that its value is 1.

      Delete
    7. Ok, building now. What I miscalled "id" is what he calls "tag" below:

      Tag is a string that serves as an identifier for the example. It is reported back when predictions are made. It doesn't have to be unique. The default value if it is not provided is the empty string. If you provide a tag, you must also provide an importance. If you don't provide a tag, put a space before the vertical bar.

      Delete
    8. Rebuilding the data set without the weight and tag gave exactly the same results, except that pred.txt now omits the tag values (row numbers).

      Danny confirmed via chat that prediction means predicted arrival delay based on the provided feature values, which was my biggest question. There's one prediction per row because each row provides different feature values.

      Delete
    9. I am new to VW and have a question.
      Can we enter multiple input files to VW and expect to get all predictions in the same output file?

      Delete
  6. How can you do online learning with this? I have a model that I've trained and would like to give new input to update the model. Is this supported by VW?

    ReplyDelete
    Replies
    1. Not in this sense - the algorithm is online in the sense that a single pass on the data is enough to build the model. But better ask in VW mailing list.

      Delete
    2. I see. Thanks for the reply :)

      Delete
  7. I noticed that the biglearn.org link to the slides is broken. The old link was http://biglearn.org/files/slides/invited/langford-01.pdf, the correct link is http://biglearn.org/2011/files/slides/invited/langford-01.pdf

    ReplyDelete
  8. i followed the above steps n getting the following error in ubuntu:

    only testing
    bad model format!
    terminate called after throwing an instance of 'std::exception'
    what(): std::exception
    Aborted (core dumped)

    am i missing something?

    Thanks

    ReplyDelete
  9. Hi Danny,

    I'm very new to VW (today in fact) and I am going to use it to have a play with the Kaggle Titanic data. I've formated the data file VW and run the training data. A binary model is created. Now I want to run that model against the test data. My input training data looks like this:

    -1 survival| class:3 gender:0 age:22 sibsp:1 parch:0 fare:7.25
    1 survival| class:1 gender:1 age:38 sibsp:1 parch:0 fare:71.2833

    What format should the test data take? I've used the same layout as for the training data, but substituted the -1, 1 with 0 for survival. When I run VW it throws an exception 'bad model' error.

    Any help would be greatly appreciated.

    Thanks in advance
    Paul

    ReplyDelete
  10. Here is a VW data file validator:

    http://hunch.net/~vw/validate.html

    ReplyDelete