Set Spark Python Versions via PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON

event 2021-09-05 thumb_up 1 visibility 11,535 comment 0 insights toc

more_vert

warning Please login first to view stats information.

Spark configurations
Environment variables
Fix issue about inconsistent driver and executor Python versions

PySpark utilizes Python worker processes to perform transformations. It's important to set the Python versions correctly.

Spark configurations

There are two Spark configuration items to specify Python version since version 2.1.0.

spark.pyspark.driver.python: Python binary executable to use for PySpark in driver. The default is spark.pyspark.python.
spark.pyspark.python: Python binary executable to use for PySpark in both driver and executors.

In most cases, your Spark cluster administrators should have setup these properties correctly and you don't need to worry. For example, the following is the configuration example (spark-defaults.conf) of my local Spark cluster on Windows 10 using Python 2.7 for both driver and executors:

spark.pyspark.python "D:\\Python2.7\\python.exe"
spark.pyspark.driver.python "D:\\Python2.7\\python.exe"

Environment variables

Environment variables can also be used by users if the above properties are not specified in configuration files:

PYSPARK_PYTHON: Python binary executable to use for PySpark in both driver and workers. The default is python3 if available, otherwise python. Property spark.pyspark.python take precedence if it is set.
PYSPARK_DRIVER_PYTHON: Python binary executable to use for PySpark in driver only. The default is PYSPARK_PYTHON. Property spark.pyspark.driver.python take precedence if it is set.

In Windows standalone local cluster, you can use system environment variables to directly set these environment variables. For Linux machines, you can specify it through ~/.bashrc.

The following is one example:

export PYSPARK_PYTHON=/path/to/your/python/executable

warning lf PySpark Python driver and executor properties are already set, the environment variables won't take effect.

Fix issue about inconsistent driver and executor Python versions

If the driver and executor have different Python versions, you may encounter errors like the following:

Exception: Python in worker has different version 2.7 than that in driver 3.8, PySpark cannot run with different minor versions. Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

Refer to page to find out more: Resolve: Python in worker has different version 2.7 than that in driver 3.8...

Set Spark Python Versions via PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON

insights Stats

toc Table of contents

Spark configurations

Environment variables

Fix issue about inconsistent driver and executor Python versions

Log in with external accounts