access_time 3 years ago languageEnglish
more_vert

Debug PySpark Code in Visual Studio Code

visibility 9,775 comment 0

The page summarizes the steps required to run and debug PySpark (Spark for Python) in Visual Studio Code.

Install Python and pip

Install Python from the official website:

https://www.python.org/downloads/.

The version I am using is 3.6.4 32-bit. Pip is shipped together in this version.

Install Spark standalone edition

Download Spark 2.3.3 from the following page:

https://www.apache.org/dyn/closer.lua/spark/spark-2.3.3/spark-2.3.3-bin-hadoop2.7.tgz

If you don’t know how to install, please follow the following page:

Install Spark 2.2.1 in Windows

*Remember to change the package to version 2.3.3.

There is one bug with the latest Spark version 2.4.0 and thus I am using 2.3.3.

Install pyspark package

Since Spark version is 2.3.3, we need to install the same version for pyspark via the following command:

pip install pyspark==2.3.3

The version needs to be consistent otherwise you may encounter errors for package py4j.

Run PySpark code in Visual Studio Code

You can run PySpark through context menu item Run Python File in Terminal.

image

Alternatively, you can also debug your application in VS Code too as shown in the following screenshot:

image

Run Azure HDInsights PySpark code

You can install extension Azure HDInsight Tools to submit spark jobs in VS Code to your HDInsights cluster.

For more details, refer to the extension page:

https://marketplace.visualstudio.com/items?itemName=mshdinsight.azure-hdinsight

info Last modified by Raymond 2 years ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn or Twitter.

Want to contribute on Kontext to help others?

Learn more