This is a short how-to to get apache spark up and running in win10.
- install java jdk( not jre) from here: https://www.oracle.com/java/technologies/javase-jdk14-downloads.html. For the installation path choose some folder that **has no spaces** in it — or it will give you errors afterwards. Not like “C:/Program files (x86)/Java” but something like “C:/Java”
- install anaconda — https://www.anaconda.com/products/individual
anaconda will install python and other packages, we need only python.
Check java is there — go to cmd prompt( win+r , then ‘cmd’)
type “java -version”, this should give:
java version "1.8.0_92"
Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)
Check python is there — go to cmd and type:
python — — version, this should give you :
Python 3.7.6
Now it’s time to go to spark downloads.
Here choose your download options like this:
There is no installer so you need to create folder where spark will be on your computer.
Create a folder(e.g. on Desktop) , name it “spark” and unzip it there.
the full path to apache spark on your machine will be like:
C:\Users\admin\Desktop\Spark\spark-2.4.0-bin-hadoop2.7
Now here is the most important thing:
setting your java_home env path:
- go to system settings
- click on “system”
- click on about like here:
in the upper corner you will see “related settings”
This will take you to “advanced system settings”
Click on environment variables
here create a “java_home” env var — give it a path to the folder where java is installed.
Don’t give the path to the bin! The path should go only up to jdk.
Create SPARK_HOME dir — where the spark is stored on your computer.
In addition you need to install winutils.exe from this git repo:
Choose according to the version number in your spark directory —
if it ends with \spark-2.4.6-bin-hadoop2.6 — go grab ‘hadoop-2.6.0\bin’
Create hadoop/bin folder inside your spark and put winutils.exe file in there.
Create env variable “HADOOP_HOME” like this:
Now you can try running your apache spark and see if it works:
- go to anaconda cmd prompt
- navigate to the spark folder
- type “bin\pyspark”
- this should give you this printed “apache spark” logo:
Now you are good to go!
The most important thing is to install java in a path without spaces or quotes.
Don’t include “bin” in the JAVA_HOME env var.