Basic Programming guide to begin with Apache Spark

The bare minimum programming required to begin with Apache Spark

When you plan to learn Apache spark the first thing which comes in mind: 

"How Much Programming one should know to begin with learning Apache Spark?"

Its commonly seen that all the Database developers are more inclined to learn big data, and in general they are more comfortable in writing SQL or PLSQL code but not Python, Scala or Java. Sometime people start learning a programming language thinking it as the most critical prerequisite for learning Spark/Big Data and they end up spending lots of time and their enthusiasm in strenuous intricacies of coding.  


It’s obvious that more you learn the programming the better developer you will become. However, this article covers how much programming once should learn to get started with Apache Spark. I will mainly cover Python and Scala, and will discuss the bare minimum programming concepts of these languages which you should know to start with Apache Spark.

These are the topics which you should understand first before starting hands on in Spark: -
1.   Variables
2.   Conditional Statements and Loop
3.   Function/Procedure
4.   Exception
5.    Data Structures
6.   Lambda Functions
7.    Creating/Importing modules, jars(in Scala)
8.   Class and Object
9.   And finally some built in methods (like range(), eval(), exec(), len(), rand(), datetime())

Apart from these, a little understanding of data frame would be required which can be covered while working on spark but to begin with Apache Spark you just need to have basic understanding of the above mentioned topics, doesn’t matter which language you prefer. Once you are done then you are good to start.

If you are naïve in programming then I would suggest you to go with Python or Scala, Python will be very easy as it has relatively faster learning curve. I will prepare separate tutorial for Python and Scala to cover all the topics mentioned above.

So Guys, All the Best and get ready to explore the super-fast data processing power of Spark.  


To learn more on Spark click hereLet us know your views or feedback on Facebook or Twitter @BigdataDiscuss.

0 comments:

Post a Comment

Manual Categories