Code description
In PySpark, DataFrame.fillna, DataFrame.na.fill and DataFrameNaFunctions.fill are alias of each other. We can use them to fill null values with a constant value. For example, replace all null integer columns with value 0, etc.
Output:
+--------------+-------+--------+
| str_col|int_col|bool_col|
+--------------+-------+--------+
|Hello Kontext!| 100| true|
|Hello Context!| 0| null|
+--------------+-------+--------+
+--------------+-------+--------+
| str_col|int_col|bool_col|
+--------------+-------+--------+
|Hello Kontext!| 100| true|
|Hello Context!| null| false|
+--------------+-------+--------+
+--------------+-------+--------+
| str_col|int_col|bool_col|
+--------------+-------+--------+
|Hello Kontext!| 100| true|
|Hello Context!| 0| false|
+--------------+-------+--------+
Code snippet
from pyspark.sql import SparkSession
app_name = "PySpark fillna"
master = "local"
spark = SparkSession.builder .appName(app_name) .master(master) .getOrCreate()
spark.sparkContext.setLogLevel("WARN")
# Create a DataFrame
df = spark.createDataFrame(
[['Hello Kontext!', 100, True], ['Hello Context!', None, None]],
['str_col', 'int_col', 'bool_col'])
# Only fill integer columns
df.fillna(0).show()
# Only fill boolean columns
df.fillna(False).show()
# Fill both at the same time
df.fillna({'int_col': 0, 'bool_col': False}).show()