tag:blogger.com,1999:blog-3211409948956809184.post1545614329610827233..comments2023-01-31T01:08:11.969-08:00Comments on Large Scale Machine Learning and Other Animals: Speeding up SVD computation on a mega matrix using GraphLabDanny Bicksonhttp://www.blogger.com/profile/01517237836051035400noreply@blogger.comBlogger5125tag:blogger.com,1999:blog-3211409948956809184.post-4607425905279928732011-12-02T03:28:35.562-08:002011-12-02T03:28:35.562-08:00Thanks for your answer, Danny!
If you're askin...Thanks for your answer, Danny!<br />If you're asking me, I have no idea. The only thing I know is that such dimensionality reduction seems to be a common preprocessing step before dealing with document-term matrix (or "noun phrase"-"text fragment" matrix in your case). Could be some kind of denoising as Tom explained. You can check "ASF Mail Archives" example for a similar application scenario <a href="http://www.linkedin.com/redirect?url=https%3A%2F%2Fcwiki%2Eapache%2Eorg%2Fconfluence%2Fdisplay%2FMAHOUT%2FDimensional%2BReduction&urlhash=w6ob&_t=tracking_disc" rel="nofollow">here</a>.Agnonchikhttps://www.blogger.com/profile/03587418173098235942noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-43965101662129147612011-12-02T02:07:32.811-08:002011-12-02T02:07:32.811-08:00Hi Agnonchik!
This is an excellent question which ...Hi Agnonchik!<br />This is an excellent question which actually puzzled. This is the reply I got from Tom Mitchell: CMU Machine Learning Dept Head:<br />"Our vectors are cooccurrence statistics, where the counts are fairly<br />sparse (even though they are based on 500,000,000 web pages).<br />Therefore, you can think of each row as a high-variance approximation<br />to the unknown, true probability distribution of contexts for that<br />row's noun phrase. One hoped-for benefit is that the the lower<br />dimension representation will aggregate these high-variance counts in<br />a way that gives us a lower variance, and therefore more reliable,<br />1000 dimensional representation of the noun phrase's meaning."<br /><br />I would love to hear your thoughts about it.Danny Bicksonhttps://www.blogger.com/profile/01517237836051035400noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-13494952164729075802011-12-01T03:02:34.313-08:002011-12-01T03:02:34.313-08:00You start from 1,200,000,000 non-zeros and end up ...You start from 1,200,000,000 non-zeros and end up with 17,000,000,000 floating point numbers. There is no saving in memory consumption. Neither do you gain on operation count performing matrix vector multiplication. Do you know why their classification algorithm performs better on reduced data?Agnonchikhttps://www.blogger.com/profile/03587418173098235942noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-77911776626611609302011-11-22T04:27:02.902-08:002011-11-22T04:27:02.902-08:00Hi,
The idea is that the original data is too big...Hi, <br />The idea is that the original data is too big. They first reduce its dimensionality, to let's say, a feature vector of size 1000 for each row and column. Then they can compute classification using SVM or other methods on the low rank data which supposed to represent the high dimensional data pretty closely.<br /><br />Best, <br /><br />DBDanny Bicksonhttps://www.blogger.com/profile/01517237836051035400noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-9034481266355590392011-11-22T04:16:51.709-08:002011-11-22T04:16:51.709-08:00Danny, do you know? Just for curiosity.
What are t...Danny, do you know? Just for curiosity.<br />What are the practical implications of such an analysis? Which problem can we solve using this 1000 singular vectors? Why 1000?Agnonchikhttps://www.blogger.com/profile/03587418173098235942noreply@blogger.com