Fix - TypeError: an integer is required (got type bytes)

visibility 70 access_time 9 days ago languageEnglish timeline Stats
timeline Stats
Page index 6.80

Issue context

When running PySpark 2.4.8 script in Python 3.8 environment with Anaconda, the following issue occurs: TypeError: an integer is required (got type bytes).

The environment is created using the following code:

conda create --name pyspark2.4.8 python=3.8.0
pip install pyspark==2.4.8

The PySpark script has the following content:

from pyspark.sql import SparkSession
from pyspark.sql.functions import to_date

appName = "PySpark Example - Spark 2.x Date Example"
master = "local"
# Create Spark session
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \

data = [{'id': 1, 'dt': '1200-01-01'}, {'id': 2, 'dt': '2022-06-19'}]

df = spark.createDataFrame(data)
df = df.withColumn('dt', to_date(df['dt']))
# Write to HDFS

Fix this issue

The above issue occurred because Spark 2.4.x doesn't work with Python 3.8+ environment at this time. Thus, to fix the issue we just need to downgrade Python version to 3.7.

For example, if you use Anaconda, the following command can be used to create the Python environment:

conda create --name pyspark2.4.8 python=3.7.0
pip install pyspark==2.4.8

For different versions of PySpark, you can find the supported Python version on PyPi index. For instance, version 2.4.8 supported languages are list on the following page: pyspark ยท PyPI. The supported programming language is listed on the left side bar of the page.

For PySpark 3.3.0, the supported programming language list is updated to the following list:

copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

More from Kontext
Connect to PostgreSQL in Spark (PySpark)
visibility 8,217
thumb_up 2
access_time 2 years ago
Connect to PostgreSQL in Spark (PySpark)