PySpark DataFrame Fill Null Values with fillna or na.fill Functions
insights Stats
warning Please login first to view stats information.
Kontext
Code Snippets & Tips
Code snippets and tips for various programming languages/frameworks. All code examples are under MIT or Apache 2.0 license unless specified otherwise.
Code description
In PySpark, DataFrame.fillna, DataFrame.na.fill and DataFrameNaFunctions.fill are alias of each other. We can use them to fill null values with a constant value. For example, replace all null integer columns with value 0, etc.
Output:
+--------------+-------+--------+ | str_col|int_col|bool_col| +--------------+-------+--------+ |Hello Kontext!| 100| true| |Hello Context!| 0| null| +--------------+-------+--------+ +--------------+-------+--------+ | str_col|int_col|bool_col| +--------------+-------+--------+ |Hello Kontext!| 100| true| |Hello Context!| null| false| +--------------+-------+--------+ +--------------+-------+--------+ | str_col|int_col|bool_col| +--------------+-------+--------+ |Hello Kontext!| 100| true| |Hello Context!| 0| false| +--------------+-------+--------+
Code snippet
from pyspark.sql import SparkSession app_name = "PySpark fillna" master = "local" spark = SparkSession.builder \ .appName(app_name) \ .master(master) \ .getOrCreate() spark.sparkContext.setLogLevel("WARN") # Create a DataFrame df = spark.createDataFrame( [['Hello Kontext!', 100, True], ['Hello Context!', None, None]], ['str_col', 'int_col', 'bool_col']) # Only fill integer columns df.fillna(0).show() # Only fill boolean columns df.fillna(False).show() # Fill both at the same time df.fillna({'int_col': 0, 'bool_col': False}).show()
copyright
This page is subject to Site terms.
comment Comments
No comments yet.