Fix PySpark TypeError: field **: **Type can not accept object ** in type <class '*'>

access_time 2 years ago visibility5556 comment 0

When creating Spark date frame using schemas, you may encounter errors about “field **: **Type can not accept object ** in type <class '*'>”.

The actual error can vary, for instances, the following are some examples:

  • field xxx: BooleanType can not accept object 100 in type <class 'int'>
  • field xxx: DecimalType can not accept object 100 in type <class 'int'>
  • field xxx: DecimalType can not accept object 100 in type <class 'str'>
  • field xxx: IntegerType can not accept object ‘String value’ in type <class 'str'>

The error is self-describing - a field in your DataFrame’s schema is defined as a type that is different from the actual data type.

One specific example

Let’s look at the following example:

from pyspark.sql import SparkSession
from pyspark.sql.types import ArrayType, StructField, StructType, StringType, IntegerType, DecimalType

appName = "PySpark Example - Python Array/List to Spark Data Frame"
master = "local"

# Create Spark session
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .getOrCreate()

# List
data = [('Category A', 100, "This is category A"),
        ('Category B', 120, "This is category B"),
        ('Category C', 150, "This is category C")]

# Create a schema for the dataframe
schema = StructType([
    StructField('Category', StringType(), True),
    StructField('Count', DecimalType(), True),
    StructField('Description', StringType(), True)
])

# Convert list to RDD
rdd = spark.sparkContext.parallelize(data)

# Create data frame
df = spark.createDataFrame(rdd,schema)
print(df.schema)
df.show()

The second field is defined as decimal while it is actual integer when we create the data list.

Thus the following error will be thrown out:

TypeError: field Count: DecimalType(10,0) can not accept object 100 in type <class 'int'>

To fix it, we have at least two options.

Option 1 - change the definition of the schema

Since the data is defined as integer, we can change the schema definition to the following:

schema = StructType([
    StructField('Category', StringType(), True),
    StructField('Count', IntegerType(), True),
    StructField('Description', StringType(), True)
])

Option 2 - change the data type of RDD

from decimal import Decimal

data = [('Category A', Decimal(100), "This is category A"),
('Category B', Decimal(120), "This is category B"),
('Category C', Decimal(150), "This is category C")]

Summary

The error you got can be very different; however the approach to fix it can be similar to the above two options. Make a comment if you have a question or an issue you cannot fix.

local_offer lite-log local_offer spark local_offer pyspark
info Last modified by Raymond at 9 months ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Want to publish your article on Kontext?

Learn more

Kontext Column

Created for everyone to publish data, programming and cloud related articles.
Follow three steps to create your columns.


Learn more arrow_forward

More from Kontext

local_offer python local_offer pyspark local_offer pandas local_offer spark-dataframe

visibility 5690
thumb_up 0
access_time 2 years ago

In Spark, it’s easy to convert Spark Dataframe to Pandas dataframe through one line of code: df_pd = df.toPandas() In this page, I am going to show you how to convert a list of PySpark row objects to a Pandas data frame. The following code snippets create a data frame with schema as: root ...

local_offer tutorial local_offer pyspark local_offer spark local_offer how-to local_offer spark-dataframe

visibility 166
thumb_up 0
access_time 2 months ago

Column renaming is a common action when working with data frames. In this article, I will show you how to rename column names in a Spark data frame using Python.  The following code snippet creates a DataFrame from a Python native dictionary list. PySpark SQL types are used to create the ...

local_offer zeppelin local_offer spark local_offer hadoop local_offer rdd local_offer spark-file-operations

visibility 6789
thumb_up 0
access_time 3 years ago

This page provides an example to load text file from HDFS through SparkContext in Zeppelin (sc). The details about this method can be found at: https://spark.apache.org/docs/2.2.1/api/java/org/apache/spark/SparkContext.html#textFile-java.lang.String-int- ...

About column

Apache Spark installation guides, performance tuning tips, general tutorials, etc.

*Spark logo is a registered trademark of Apache Spark.

rss_feed Subscribe RSS