Fix PySpark TypeError: field **: **Type can not accept object ** in type <class '*'>

access_time 2 years ago visibility8044 comment 0

When creating Spark date frame using schemas, you may encounter errors about “field **: **Type can not accept object ** in type <class '*'>”.

The actual error can vary, for instances, the following are some examples:

  • field xxx: BooleanType can not accept object 100 in type <class 'int'>
  • field xxx: DecimalType can not accept object 100 in type <class 'int'>
  • field xxx: DecimalType can not accept object 100 in type <class 'str'>
  • field xxx: IntegerType can not accept object ‘String value’ in type <class 'str'>

The error is self-describing - a field in your DataFrame’s schema is defined as a type that is different from the actual data type.

One specific example

Let’s look at the following example:

from pyspark.sql import SparkSession
from pyspark.sql.types import ArrayType, StructField, StructType, StringType, IntegerType, DecimalType

appName = "PySpark Example - Python Array/List to Spark Data Frame"
master = "local"

# Create Spark session
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .getOrCreate()

# List
data = [('Category A', 100, "This is category A"),
        ('Category B', 120, "This is category B"),
        ('Category C', 150, "This is category C")]

# Create a schema for the dataframe
schema = StructType([
    StructField('Category', StringType(), True),
    StructField('Count', DecimalType(), True),
    StructField('Description', StringType(), True)
])

# Convert list to RDD
rdd = spark.sparkContext.parallelize(data)

# Create data frame
df = spark.createDataFrame(rdd,schema)
print(df.schema)
df.show()

The second field is defined as decimal while it is actual integer when we create the data list.

Thus the following error will be thrown out:

TypeError: field Count: DecimalType(10,0) can not accept object 100 in type <class 'int'>

To fix it, we have at least two options.

Option 1 - change the definition of the schema

Since the data is defined as integer, we can change the schema definition to the following:

schema = StructType([
    StructField('Category', StringType(), True),
    StructField('Count', IntegerType(), True),
    StructField('Description', StringType(), True)
])

Option 2 - change the data type of RDD

from decimal import Decimal

data = [('Category A', Decimal(100), "This is category A"),
('Category B', Decimal(120), "This is category B"),
('Category C', Decimal(150), "This is category C")]

Summary

The error you got can be very different; however the approach to fix it can be similar to the above two options. Make a comment if you have a question or an issue you cannot fix.

info Last modified by Raymond 2 years ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn or Twitter.

Want to publish your article on Kontext?

Learn more

More from Kontext

visibility 1938
thumb_up 0
access_time 2 years ago

I’ve been following Mobius project for a while and have been waiting for this day. .NET for Apache Spark v0.1.0 was just published on 2019-04-25 on GitHub. It provides high performance APIs for programming Apache Spark applications with C# and F#. It is .NET Standard complaint and can run in ...

visibility 1397
thumb_up 0
access_time 6 months ago

This article shows how to 'delete' column from Spark data frame using Python.  Follow article  Convert Python Dictionary List to PySpark DataFrame to construct a dataframe. +----------+---+------+ | Category| ID| Value| +----------+---+------+ |Category A| 1| 12.40| |Category B| ...

visibility 22
thumb_up 0
access_time 2 months ago

Column renaming is a common action when working with data frames. In this article, I will show you how to rename column names in a Spark data frame using Scala.  info This is the Scala version of article:  Change DataFrame Column Names in PySpark The following code snippet creates a ...