Wednesday 5 November 2014

What is Hadoop - Kapil Sharma

What is Hadoop?

Hadoop - is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Essentially, it accomplishes two tasks: massive data storage and faster processing.

Currently three core components are included with your basic download from the Apache Software Foundation.


HDFS - the Java-based distributed file system that can store all kinds of data without prior organization.

MapReduce – a software programming model for processing large sets of data in parallel.

YARN – a resource management framework for scheduling and handling resource requests from distributed applications.

No comments:

Post a Comment