For GraphLab matrix libraries: linear solvers, matrix factorization and clustering we recommend using this format for the input file. Once this format is used for the input, the same format will be used for the output files.
Sparse matrices
Matrix market has a very simple format: 2 header lines. Here are some examples. Here is
example 3x4 matrix:
A = 0.8147 0 0.2785 0 0.9058 0.6324 0.5469 0.1576 0.1270 0.0975 0.9575 0
And here is the matching matrix market file:
%%MatrixMarket matrix coordinate real general % Generated 27-Feb-2012 3 4 9 1 1 0.8147236863931789 2 1 0.9057919370756192 3 1 0.1269868162935061 2 2 0.6323592462254095 3 2 0.09754040499940952 1 3 0.2784982188670484 2 3 0.5468815192049838 3 3 0.9575068354342976 2 4 0.1576130816775483
The first line, include the header. coordinate means this is a sparse matrix.
The third row includes the matrix size: 3 4 9 means 3 rows, 4 columns and 9 non zero entries.
The rest of the rows include the edges. For example the 4th row is:
1 1 0.8147236863931789, namely means that it is the first row, first column and its value.
TIP: Sparse matrices should NOT include zero entries!
For example, the row 1 1 0 is not allowed!
TIP: First two numbers in each non-zero entry line should be integers and not double notation. For example 1e+2 is not a valid row/col number. It should be 100 instead.
TIP: Row/column number always start from one (and not from zero as in C!)
TIP: It is not legal to include the same non zero entry twice. In GraphLab it will result in a duplicate edge error. Note that the number of edges in GraphLab starts in zero, so you have to add one to source and target values to detect the edge in the matrix market file.
Dense matrices:
Here is an example on how to save the same matrix as dense matrix:
A = 0.8147 0 0.2785 0 0.9058 0.6324 0.5469 0.1576 0.1270 0.0975 0.9575 0
%%MatrixMarket matrix array real general 3 4
0.8147
0.9058 0.1270 0 0.6324 0.0975 0.2785 0.5469 0.9575 0
0.1576
0
Symmetric sparse matrices:
Here is an example for sparse symmetric matrix:
B = 1.5652 0 1.4488 0 2.0551 2.1969 1.4488 2.1969 2.7814
And here is the matching matrix market file:
%%MatrixMarket matrix coordinate real symmetric % Generated 27-Feb-2012 3 3 5 1 1 1.5652 3 1 1.4488 2 2 2.0551 3 2 2.1969 3 3 2.7813
Note that each non-diagonal edges is written only once.
Sparse vectors:
Here is an example for sparse vector:
v = 1 0 0 1
%%MatrixMarket matrix coordinate real general % Generated 27-Feb-2012 1 4 2 1 1 1 1 4 1
Working with matlab:
download the files http://graphlab.org/mmwrite.m and http://graphlab.org/mmread.m
In Matlab you can save a dense matrix using:
>> mmwrite('filename', full(matrixname));And save a sparse matrix using:
>> mmwrite('filename', sparse(matrixname));For reading a sparse or dense matrix you can:
>> A = mmread('filename');
Writing a conversion function in Java
This section explains how to convert Mahout 0.4 sequence vectors into matrix market format.
Create a file named Vec2mm.java with the following content:
import java.io.BufferedWriter; import java.io.FileWriter; import java.util.Iterator; import org.apache.mahout.math.SequentialAccessSparseVector; import org.apache.mahout.math.Vector; import org.apache.mahout.math.VectorWritable; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.SequenceFile; /** * Code for converting Hadoop seq vectors to matrix market format * @author Danny Bickson, CMU * */ public class Vec2mm{ public static int Cardinality; /** * * @param args[0] - input svd file * @param args[1] - output matrix market file * @param args[2] - number of rows * @param args[3] - number of columns * @param args[4] - number oi non zeros * @param args[5] - transpose */ public static void main(String[] args){ try { if (args.length != 6){ System.err.println(Usage: java Vec2mm <input seq vec file> < output matrix market file> <number of rows> <number of cols> <number of non zeros> <transpose output>Compile this code using the command:); System.exit(1); } final Configuration conf = new Configuration(); final FileSystem fs = FileSystem.get(conf); final SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path(args[0]), conf); BufferedWriter br = new BufferedWriter(new FileWriter(args[1])); int rows = Integer.parseInt(args[2]); int cols = Integer.parseInt(args[3]); int nnz = Integer.parseInt(args[4]); boolean transpose = Boolean.parseBoolean(args[5]); IntWritable key = new IntWritable(); VectorWritable vec = new VectorWritable(); br.write("%%MatrixMarket matrix coordinate real general\n"); if (!transpose) br.write(rows + " " +cols + " " + nnz + "\n"); else br.write(cols + " " + rows + " " + nnz + "\n"); while (reader.next(key, vec)) { //System.out.println("key " + key); SequentialAccessSparseVector vect = (SequentialAccessSparseVector)vec.get(); //System.out.println("key " + key + " value: " + vect); Iterator iter = vect.iterateNonZero(); while(iter.hasNext()){ Vector.Element element = iter.next(); //note: matrix market offsets starts from one and not from zero if (!transpose) br.write((element.index()+1) + " " + (key.get()+1)+ " " + vect.getQuick(element.index())+"\n"); else br.write((key.get()+1) + " " + (element.index()+1) + " " + vect.getQuick(element.index())+"\n"); } } reader.close(); br.close(); } catch(Exception ex){ ex.printStackTrace(); } } }
javac -cp /mnt/bigbrofs/usr7/bickson/hadoop-0.20.2/lib/core-3.1.1.jar:/mnt/bigbrofs/usr7/bickson/mahout-0.4/taste-web/target/mahout-taste-webapp-0.5-SNAPSHOT/WEB-INF/lib/mahout-core-0.5-SNAPSHOT.jar:/mnt/bigbrofs/usr7/bickson/mahout-0.4/taste-web/target/mahout-taste-webapp-0.5-SNAPSHOT/WEB-INF/lib/mahout-math-0.5-SNAPSHOT.jar:/mnt/bigbrofs/usr7/bickson/hadoop-0.20.2/lib/commons-cli-1.2.jar:/mnt/bigbrofs/usr7/bickson/hadoop-0.20.2/hadoop-0.20.2-core.jar *.java
Now run it using the command:
java -cp .:/mnt/bigbrofs/usr7/bickson/hadoop-0.20.2/lib/core-3.1.1.jar:/mnt/bigbrofs/usr7/bickson/mahout-0.4/taste-web/target/mahout-taste-webapp-0.5-SNAPSHOT/WEB-INF/lib/mahout-core-0.5-SNAPSHOT.jar:/mnt/bigbrofs/usr7/bickson/mahout-0.4/taste-web/target/mahout-taste-webapp-0.5-SNAPSHOT/WEB-INF/lib/mahout-math-0.5-SNAPSHOT.jar:/mnt/bigbrofs/usr7/bickson/hadoop-0.20.2/lib/commons-cli-1.2.jar:/mnt/bigbrofs/usr7/bickson/hadoop-0.20.2/hadoop-0.20.2-core.jar:/mnt/bigbrofs/usr7/bickson/hadoop-0.20.2/lib/commons-logging-1.0.4.jar:/mnt/bigbrofs/usr7/bickson/hadoop-0.20.2/lib/commons-logging-api-1.0.4.jar:/mnt/bigbrofs/usr7/bickson/mahout-0.4/taste-web/target/mahout-taste-webapp-0.5-SNAPSHOT/WEB-INF/lib/google-collections-1.0-rc2.jar Vec2mm A.seq A.mm 25 100 2500 false
Where A.seq is the input file in SequentialAccessSparseVector key/value store. A.mm
will be the resulting output file in matrix market format. 25 is the number of rows, 100 columns, and 2500 non zeros. false - not to transpose the resulting matrix.
Depends on the saved vector type, you may want to change my code from SequentialAccessSparseVector to the specific type you need to convert.
In the post,
ReplyDelete"TIP: Sparse matrices should include zero entries!"
maybe have typo. should --> should not
Right?
Nice catch! fixed.
ReplyDeleteFor the dense matrix matrices, I think you miss the first two entries: 0.8147 and 0.9058.
ReplyDeleteNice catch! Example fixed. Thanks
DeleteI didn't get about dense matrix:
ReplyDeleteI suppose (correct me if i'm wrong) that all elements (including zero elements) must follow in order of rows and columns then.
So I don't understand why
1) there are strange indents
2) several elements have difference for 0.0001
Hi Alex,
Delete1) Indents were a blog formatting issue - now fixed.
2) I used matlab print code which did some rounding of the full numbers, I fixed it now to match.
Thanks for detecting the typos!