PySpark DataFrame Fill Null Values with fillna or na.fill Functions

event 2022-08-18 visibility 847 comment 0 insights
more_vert
insights Stats
Kontext Kontext Code Snippets & Tips

Code snippets and tips for various programming languages/frameworks. All code examples are under MIT or Apache 2.0 license unless specified otherwise. 

Code description

In PySpark, DataFrame.fillna, DataFrame.na.fill and DataFrameNaFunctions.fill are alias of each other. We can use them to fill null values with a constant value. For example, replace all null integer columns with value 0, etc. 

Output:

+--------------+-------+--------+
|       str_col|int_col|bool_col|
+--------------+-------+--------+
|Hello Kontext!|    100|    true|
|Hello Context!|      0|    null|
+--------------+-------+--------+

+--------------+-------+--------+
|       str_col|int_col|bool_col|
+--------------+-------+--------+
|Hello Kontext!|    100|    true|
|Hello Context!|   null|   false|
+--------------+-------+--------+

+--------------+-------+--------+
|       str_col|int_col|bool_col|
+--------------+-------+--------+
|Hello Kontext!|    100|    true|
|Hello Context!|      0|   false|
+--------------+-------+--------+

Code snippet

from pyspark.sql import SparkSession

app_name = "PySpark fillna"
master = "local"

spark = SparkSession.builder \
    .appName(app_name) \
    .master(master) \
    .getOrCreate()

spark.sparkContext.setLogLevel("WARN")

# Create a DataFrame
df = spark.createDataFrame(
    [['Hello Kontext!', 100, True], ['Hello Context!', None, None]],
    ['str_col', 'int_col', 'bool_col'])

# Only fill integer columns
df.fillna(0).show()

# Only fill boolean columns
df.fillna(False).show()

# Fill both at the same time
df.fillna({'int_col': 0, 'bool_col': False}).show()
More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts