February 28, 2018

What is Spark ? Why Apache Spark?


Spark is one of Hadoop’s sub project developed in 2009 in UC Berkeley’s AMPLab by Matei Zaharia, as an alternative to the MapReduce paradigm.


It was Open Sourced in 2010 under a BSD license.


It was donated to Apache software foundation in 2013.


Now Apache Spark has become a top level Apache project from Feb-2014.



Spark is known mainly for its high-speed cluster computing capabilities and support for programming languages such as Python, R, Java, SQL, Scala etc.

Spark has the capability to connect to a variety of data sources like S3, Cassandra, HDFS, HBase, Avro, Parquet, etc.




1 comment:

  1. Really Good blog post.provided a helpful information.I hope that you will post more updates like thisHadoop Admin Online Course

    ReplyDelete