Replace Values via regexp_replace Function in PySpark DataFrame
Code description
PySpark SQL APIs provides regexp_replace
built-in function to replace string values that match with the specified regular expression.
It takes three parameters: the input column of the DataFrame, regular expression and the replacement for matches.
pyspark.sql.functions.regexp_replace(str, pattern, replacement)
Output
The following is the output from this code snippet:
+--------------+-------+----------------+ | str_col|int_col|str_col_replaced| +--------------+-------+----------------+ |Hello Kontext!| 100| Hello kontext!| |Hello Context!| 100| Hello kontext!| +--------------+-------+----------------+
All uppercase 'K' or 'C' are replaced with lowercase 'k'.
Code snippet
from pyspark.sql import SparkSession from pyspark.sql.functions import regexp_replace app_name = "PySpark regex sql functions" master = "local" spark = SparkSession.builder \ .appName(app_name) \ .master(master) \ .getOrCreate() spark.sparkContext.setLogLevel("WARN") # Create a DataFrame df = spark.createDataFrame( [['Hello Kontext!', 100], ['Hello Context!', 100]], ['str_col', 'int_col']) # Replace str_col with regular expressions df = df.withColumn('str_col_replaced', regexp_replace('str_col', r'[C|K]', 'k')) df.show()
copyright
This page is subject to Site terms.
comment Comments
No comments yet.