A few days ago I got the following note from Dmitry, a principal stuff member at Oracle:
Hi Danny,
I found out about GraphLab just two days ago.
I was working on a MapReduce based QR factorization and whilst searching web for references, found your blog & GraphLab. No question, I am planning to learn more about the project. Looks very exciting!
In general, our group is focusing on in-database and Hadoop based
data mining and statistical algorithms development.
R http://www.r-project.org/ is a big part of it.
Kind regards,
DG
As always I am absolutely thrilled for getting my readers feedback! I asked Dmitry if he can share some more insight about R project and here is what he wrote:
R is huge in data mining and statistical camps.
The number of contributed packages is staggering, it is amongst the most complete and feature rich environments for statistical and data mining computing.
Another very important observation concerns the quality of some of the contributed packages: outstanding work & implementation.
The biggest problem with R has to do with its inherent data storage model: everything must be stored in memory and most algorithms are sequential.
For instance the notion of a matrix in R is captured in the following C one liner:
double* a = (double*) malloc(sizeof(double) * nElements));
It is possible to build R with a vendor-supplied matrix packages (BLAS and LAPACK) and thus have multithreaded matrix computations in R (which helps a lot).
However if the input does not fit into memory, then it is somewhat problematic to run even the simplest algorithms.
We enable R folks to carry out computations directly on the database data (no need to move the data out). The in-memory limitation has been lifted for some algorithms (not all of course).
More is here
http://www.oracle.com/us/corporate/features/features- oracle-r-enterprise-498732. html
Kind regards,
DG
We definitely agree that R is very useful statistical package. In fact, one of our users, Steve Lianoglou from Weill Cornell Medical Collage ported our Shotgun solver package to R. And here is an excerpt from Steve's homepage which summarizes his ambivalent relation to R:
I have a love/hate relationship with R. I appreciate its power, but at times I feel like I'm "all thumbs" trying use it to to bend the computer to my will (update: I actually don't feel like this anymore. In fact I'm quite comfortable in R now, but try not to get too frustrated if you're not yet ... it only took me about 6 months or so!).
If that feeling sounds familiar to you, these references might be useful.
- The R Inferno [PDF]. "If you are using R and you think you're in hell, this is a map for you." I stumbled on this document after using R for about 8 months or so and I could still, sympathize with that statement.
- An R & Bioconductor Manual by Thomas Girke
Anyway if you are a reader of this blog or a Graphlab user - send me a note!