Introduction to Apache Spark

Apache spark is a Scheduling Monitoring and Distribution engine which does lightning fast fault tolerant in-memory* parallel processing of data. It came out of the Apmlab project UC Berkeley. Apache Spark was developed as a unified engine to meet all the needs of a big data processing. 
Introduction to Apache Spark

Spark core uses both memory and disk while processing the data, it had 4 traditional APIs, Scala, Java, Python and R(experimental phase) but now it also has a new one called data-frame API(introduced in 1.3). Around Spark core there are high level libraries like SparkSQL, GraphX, Streaming, MLlib etc.

Spark has four modes of execution based on the resource manager and coordinator used for running spark job. They are:-
  1. Local Mode
  2. Standalone Mode
  3. Yarn Mode
  4. Mesos
Running Spark in Yarn is very common in Industry.
To know more on Resource Managers and Execution modes in Spark click L2: Resource Managers and different execution modes of Apache Spark.

To learn more on Spark click here. Let us know your views or feedback on Facebook or Twitter @BigDataDiscuss.


Post a Comment

Manual Categories