March 25, 2017

What is Hadoop?

Apache Hadoop is, an open-source software framework, written in Java, by Doug Cutting and Michael J. Cafarella, that supports data-intensive distributed applications, licensed under the Apache v2 license. It supports the running of applications on large clusters of commodity hardware. Hadoop was derived from Google's MapReduce and Google File System (GFS) papers.

The Hadoop framework transparently provides both reliability and data motion to applications. Hadoop implements a computational paradigm named MapReduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. It provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. 

Both map/reduce and the distributed file system are designed so that node failures are automatically handled by the framework. It enables applications to work with thousands of computation-independent computers and petabytes of data. 

The entire Apache Hadoop platform is commonly considered to consist of the Hadoop kernel, MapReduce and Hadoop Distributed File System (HDFS), and number of related projects including Apache Hive, Apache HBase, Apache Pig, Zookeeper etc.

Related Articles:  NoSQL Databases        What is Apache Cassandra


  1. Excellent Sharing. You have done great job. I gathered lots of new information... Big Data Training Institute in Chennai

  2. Nice content .In now a days the demand of Hadoop is more and the content that you shared is very helpful for the learners.Thanks for Sharing the

  3. I simply wanted to write down a quick word to say thanks to you for those wonderful tips and hints you are showing on this site.

  4. Those guidelines additionally worked to become a good way to recognize that other people online have the identical fervor like mine to grasp great deal more around this condition.

    Data Science Training in Bangalore