An interesting paper which explores the twitter social network is:
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue B. Moon. What is twitter, a social network or a news media? In WWW, pages 591–600, 2010.
The twitter graph is available fro download from here. The format is very simple:
user follower\n
The graph gas 41M nodes, and 1.4 billion edges. What is nice about it, is that you can view the profile of each node id using the twitter web API. For example, for user 12 you can do:
http://api.twitter.com/1/users/show.xml?user_id=12
Some statistics about the graph are found here.
If you like to use it in Graphlab v2, you need to do the following:
1) assuming the graph file name is user_follower.txt, sort the graph using:
sort -u -n -k 1,1 -k 2,2 -T . user_follower.txt > user_follower.sorted
2) Add the following matrix market format header to the file:
%%MatrixMarket matrix coordinate real general
61578414 61578414 1468365182
I am using k-cores algorithm to reveal this graph structure. I will add some results soon.
And here is a library of webgraphs and other big graphs I got from
Kanat Tangwongsan.
And here is a library of webgraphs and other big graphs I got from
Can you check the download link? it is not working
ReplyDeleteNow I checked and it is working. If not, you can contact the authors...
DeleteIt didn't work for me. It says forbidden. Maybe it is because you are authenticated to access that directory?
DeleteAnyway, you can always access the data here:
http://an.kaist.ac.kr/~haewoon/release/twitter_social_graph/twitter_rv.tar.gz
Is this the "twitter-2010" graph used in the "GraphCHI" paper. But the vertex/edge number is slightly different the number in the paper (42M nodes, 1.5B edges). Confused...
ReplyDeleteI may be have rounded the number of nodes and edges in my blog description since only the magnitude matters. I suggest you download the original dataset and check how many nodes and edges are exactly if this is important for you.
DeleteIs it possible to run sparse matrix twitter .mm file in matlab?
ReplyDeleteAfter step 2, you can use the script: http://select.cs.cmu.edu/code/graphlab/mmread.m
Deleteto load the dataset into matlab or octave. However it is likely that matlab will get out of memory since the dataset is big.