PySpark DataFrame Fill Null Values with fillna or na.fill Functions

Code description

In PySpark, DataFrame.fillna, and DataFrameNaFunctions.fill are alias of each other. We can use them to fill null values with a constant value. For example, replace all null integer columns with value 0, etc. 


|       str_col|int_col|bool_col|
|Hello Kontext!|    100|    true|
|Hello Context!|      0|    null|

|       str_col|int_col|bool_col|
|Hello Kontext!|    100|    true|
|Hello Context!|   null|   false|

|       str_col|int_col|bool_col|
|Hello Kontext!|    100|    true|
|Hello Context!|      0|   false|

Code snippet

from pyspark.sql import SparkSession

app_name = "PySpark fillna"
master = "local"

spark = SparkSession.builder \
    .appName(app_name) \
    .master(master) \


# Create a DataFrame
df = spark.createDataFrame(
    [['Hello Kontext!', 100, True], ['Hello Context!', None, None]],
    ['str_col', 'int_col', 'bool_col'])

# Only fill integer columns

# Only fill boolean columns

# Fill both at the same time
df.fillna({'int_col': 0, 'bool_col': False}).show()
