Apache Mahout, the traditional avenue for building machine learning models in Hadoop, “has reached the end of its road,” Owen said. It’s stuck in a batch-only first-generation MapReduce era, and it requires a lot of work on users’ parts to get a working system in place. “
A heated discussion was recorded a couple of months ago. For example, one of the main Mahout contributors, Sebastian Schelter does not stay idle:
..., I also cannot understand why Cloudera and you need to start a new open source project that in many ways mirrors what mahout offers. Why not contribute the algorithm implementations (the computation layer) to mahout and built the serving layer as a project on top of that? I don't see what would have prevented this, I would think it would have been warmly welcomed by this community.
It is not that this new project creates competition from which users will benefit, its exactly the opposite. To me it feels like an intentional abandonment of mahout. Instead of giving users a single project where we could have united efforts, users now have to choose between two things that in general do the same things with each of them missing some functionality. In my eyes, users lose here.
Its a very bad day for mahout today.
One of the reasons beyond this controversy is that Mahout is backed up by MapR who is backed up by EMC. From the other hand Oryx is backed up by Cloudera. Both MapR and Cloudera have competing Hadoop versions.
Additional interesting note at the Gigaom article about Spark:
Owen is spending a lot of time contributing to the Apache Spark project because he plans to rewrite Oryx to make Spark the primary processing framework instead of MapReduce. “There’s actually a lot of reasons to be interested in Spark from a machine learning point of view,” he said. “… I’d much rather put my energies there.”
He’s not alone. As we have explained, Spark is becoming a popular choice for next-generation big data applications and companies such as Cloudera and Hortonworks are embracing it as a big part of Hadoop’s future.