It supports the running of applications on large clusters of commodity hardware.
Hadoop Apache Hadoop is an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. I thought it would be useful to self-taught enthusiasts like me if I lay out the steps in a comprehensive manner, since I have spent some time dealing with the quirks in the process. I did manage to clear these hurdles and went on to installing R and RStudio along with RHadoop packages. Although there are solutions, the resources are scattered and obscure. I came across different hurdles when it came to addition of VirtualBox Guest Additions, which is intended to spruce up the virtual machine by offering such features as a shared folder with the host OS. Most of the trouble started after a hassle free installation of VirtualBox and creation of the cloudera’s demo VM. VirtualBox offers an open-source alternative and thenceforth, I chose this. I know most of the people including me like to hear the words open-source and free, especially when it is a smooth ride. One downside to using VMware is that it’s not free. However, this tutorial describes the implementation using VMware’s application. I was inspired by Revolution’s blog and step-by-step tutorial from Jeffrey Breen on the set up of a local virtual instance of Hadoop with R.