This page summarizes the steps to install Spark 2.2.1 in your Windows environment.
Download the latest binary from the following site:
In my case, I am saving the file to folder: F:\DataAnalytics.
Open Git Bash, and change directory (cd) to the folder where you save the binary package and then unzip:
$ cd F:\DataAnalytics
fahao@Raymond-Alienware MINGW64 /f/DataAnalytics
$ tar -xvzf spark-2.2.1-bin-hadoop2.7.tgz
In my case, spark is extracted to: F:\DataAnalytics\spark-2.2.1-bin-hadoop2.7
Follow section ‘JAVA_HOME environment variable’ in the following page to setup JAVA_HOME
Setup SPARK_HOME environment variable with value of your spark installation directory.
Added ‘%SPARK_HOME%\bin’ to your path environment variable.
Run the following command in Command Prompt to verify the installation.
The screen should be similar to the following screenshot:
Execute the following command in Command Prompt to run one example provided as part of Spark installation (class SparkPi with param 10).
%SPARK_HOME%\bin\run-example.cmd SparkPi 10
As printed out, Spark context Web UI available at http://172.24.144.1:4040.
The following is a screenshot of the UI:
Refer to the following page if you are interested in any Spark developer tools.
Overview For SQL developers that are familiar with SCD and merge statements, you may wonder how to implement the same in big data platforms, considering database or storages in Hadoop are not designed/optimised for record level updates and inserts. In this post, I’m going to demons...View detail
This post shows how to derive new column in a Spark data frame from a JSON array string column. I am running the code in Spark 2.2.1 though it is compatible with Spark 1.6.0 (with less JSON SQL functions). Prerequisites Refer to the following post to install Spark in Windows. ...View detail
In this page, I’m going to demonstrate how to write and read parquet files in Spark/Scala by using Spark SQLContext class. Reference What is parquet format? Go the following project site to understand more about parquet. ...View detail
Are you a Windows/.NET developer and willing to learn big data concepts and tools in your Windows? If yes, you can follow the links below to install them in your PC. The installations are usually easier to do in Linux/UNIX but they are not difficult to implement in Windows either since the...View detail
This page shows how to import data from SQL Server into Hadoop via Apache Sqoop. Prerequisites Please follow the link below to install Sqoop in your machine if you don’t have one environment ready. ...View detail