How to set up local Apache Spark environment (5 ways)

Published in

ITNEXT

4 min readAug 15, 2020

Apache Spark is one of the most popular platforms for distributed data processing and analysis. Although it is associated with a server farm, Hadoop and cloud technologies, you can successfully launch it on your machine. In this entry you will learn several ways to configure the Apache Spark development environment.

Assumptions

The base system in this case is Ubuntu Desktop 20.04 LTS.

spark-shell

The first way is to run Spark in the terminal. Let’s start by downloading Apache Spark. You can download it here. After downloading, we have to unpack the package with tar.

wget ftp://ftp.task.gda.pl/pub/www/apache/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop3.2.tgz
tar zxvf spark-3.0.0-bin-hadoop3.2.tgz

Apache Spark is written in Scala, which means that we need a Java Virtual Machine (JVM). For Spark 3.0 it will be Java 11.

sudo apt install default-jre

Now all you have to do is go into the bin directory and run spark-shell

How to set up local Apache Spark environment (5 ways)

spark-shell

Written by Maciej Szymczyk