Kontext Kontext

PySpark DataFrame Fill Null Values with fillna or na.fill Functions

event 2022-08-18 visibility 680 comment 0 insights
more_vert
insights Stats

Code description

In PySpark, DataFrame.fillna, DataFrame.na.fill and DataFrameNaFunctions.fill are alias of each other. We can use them to fill null values with a constant value. For example, replace all null integer columns with value 0, etc. 

Output:

+--------------+-------+--------+
|       str_col|int_col|bool_col|
+--------------+-------+--------+
|Hello Kontext!|    100|    true|
|Hello Context!|      0|    null|
+--------------+-------+--------+

+--------------+-------+--------+
|       str_col|int_col|bool_col|
+--------------+-------+--------+
|Hello Kontext!|    100|    true|
|Hello Context!|   null|   false|
+--------------+-------+--------+

+--------------+-------+--------+
|       str_col|int_col|bool_col|
+--------------+-------+--------+
|Hello Kontext!|    100|    true|
|Hello Context!|      0|   false|
+--------------+-------+--------+

Code snippet

from pyspark.sql import SparkSession

app_name = "PySpark fillna"
master = "local"

spark = SparkSession.builder \
    .appName(app_name) \
    .master(master) \
    .getOrCreate()

spark.sparkContext.setLogLevel("WARN")

# Create a DataFrame
df = spark.createDataFrame(
    [['Hello Kontext!', 100, True], ['Hello Context!', None, None]],
    ['str_col', 'int_col', 'bool_col'])

# Only fill integer columns
df.fillna(0).show()

# Only fill boolean columns
df.fillna(False).show()

# Fill both at the same time
df.fillna({'int_col': 0, 'bool_col': False}).show()
More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts