How to set up local Apache Spark environment (5 ways)

Maciej Szymczyk
ITNEXT

--

Apache Spark is one of the most popular platforms for distributed data processing and analysis. Although it is associated with a server farm, Hadoop and cloud technologies, you can successfully launch it on your machine. In this entry you will learn several ways to configure the Apache Spark development environment.

Assumptions

The base system in this case is Ubuntu Desktop 20.04 LTS.

spark-shell

The first way is to run Spark in the terminal. Let’s start by downloading Apache Spark. You can download it here. After downloading, we have to unpack the package with tar.

wget ftp://ftp.task.gda.pl/pub/www/apache/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop3.2.tgz
tar zxvf spark-3.0.0-bin-hadoop3.2.tgz

Apache Spark is written in Scala, which means that we need a Java Virtual Machine (JVM). For Spark 3.0 it will be Java 11.

sudo apt install default-jre

Now all you have to do is go into the bin directory and run spark-shell

--

--

Software Developer, Big Data Engineer, Blogger (https://wiadrodanych.pl), Amateur Cyclists & Triathlete, @maciej_szymczyk