By using this site, you acknowledge that you have read and understand our Cookie policy, Privacy policy and Terms .
close

Apache Spark installation guides, performance tuning tips, general tutorials, etc.

rss_feed Subscribe RSS

This page summarizes the steps to install Spark 2.2.1 in your Windows environment.

Tools and Environment

  • GIT Bash
  • Command Prompt
  • Windows 10

Download Binary Package

Download the latest binary from the following site:

https://spark.apache.org/downloads.html

In my case, I am saving the file to folder: F:\DataAnalytics.

UnZip binary package

Open Git Bash, and change directory (cd) to the folder where you save the binary package and then unzip:

$ cd F:\DataAnalytics

fahao@Raymond-Alienware MINGW64 /f/DataAnalytics
$ tar -xvzf   spark-2.2.1-bin-hadoop2.7.tgz

In my case, spark is extracted to: F:\DataAnalytics\spark-2.2.1-bin-hadoop2.7

Setup environment variables

JAVA_HOME

Follow section ‘JAVA_HOME environment variable’ in the following page to setup JAVA_HOME

https://kontext.tech/docs/DataAndBusinessIntelligence/p/install-zeppelin-073-in-windows

SPARK_HOME

Setup SPARK_HOME environment variable with value of your spark installation directory.

image

PATH

Added ‘%SPARK_HOME%\bin’ to your path environment variable.

Verify the installation

Verify command

Run the following command in Command Prompt to verify the installation.

%SPARK_HOME%\bin\spark-shell

The screen should be similar to the following screenshot:

image

Run examples

Execute the following command in Command Prompt to run one example provided as part of Spark installation (class SparkPi with param 10).

https://spark.apache.org/docs/latest/

%SPARK_HOME%\bin\run-example.cmd SparkPi 10

The output looks like the following:
image

Spark context UI

As printed out, Spark context Web UI available at http://172.24.144.1:4040.

The following is a screenshot of the UI:

image

Spark developer tools

Refer to the following page if you are interested in any Spark developer tools.

https://spark.apache.org/developer-tools.html

info Last modified by Raymond at 2 years ago
info About author

info License/Terms

More from Kontext

Improve PySpark Performance using Pandas UDF with Apache Arrow

local_offer pyspark local_offer spark local_offer spark-2-x local_offer pandas

visibility 134
comment 0
thumb_up 4
access_time 28 days ago

Apache Arrow is an in-memory columnar data format that can be used in Spark to efficiently transfer data between JVM and Python processes. This currently is most beneficial to Python users that work with Pandas/NumPy data. In this article, ...

open_in_new View

local_offer pyspark local_offer spark-2-x local_offer spark

visibility 15
comment 0
thumb_up 0
access_time 1 month ago

This article shows you how to read and write XML files in Spark. Sample XML file Create a sample XML file named test.xml with the following content: <?xml version="1.0"?> <data> <record id="1"> <rid>1</rid> <nam...

open_in_new View

local_offer pyspark local_offer spark-2-x local_offer spark local_offer python

visibility 16
comment 0
thumb_up 0
access_time 1 month ago

This article shows how to convert a Python dictionary list to a DataFrame in Spark using Python. Example dictionary list data = [{"Category": 'Category A', "ID": 1, "Value": 12.40}, {"Category": 'Category B', "ID": 2, "Value": 30.10}, {"Category": 'Category C', "...

open_in_new View

local_offer pyspark local_offer spark-2-x local_offer spark

visibility 18
comment 0
thumb_up 0
access_time 2 months ago

Sometime it is necessary to pass environment variables to Spark executors. To pass environment variable to executors, use setExecutorEnv function of SparkConf class. Code snippet In the following code snippet, an environment variable name ENV_NAME is set up with value ...

open_in_new View