Use when() and otherwise() with PySpark DataFrame

event 2022-08-25 visibility 2,862 comment 0 insights
more_vert
insights Stats
Kontext Kontext Code Snippets & Tips

Code snippets and tips for various programming languages/frameworks. All code examples are under MIT or Apache 2.0 license unless specified otherwise. 

Code description

In Spark SQL, CASE WHEN clause can be used to evaluate a list of conditions and to return one of the multiple results for each column. The same can be implemented directly using pyspark.sql.functions.when and pyspark.sql.Column.otherwise functions. If otherwise is not used together with when, None will be returned for unmatched conditions. 

Output:

+---+------+
| id|id_new|
+---+------+
|  1|     1|
|  2|   200|
|  3|  3000|
|  4|   400|
|  5|     5|
|  6|   600|
|  7|     7|
|  8|   800|
|  9|  9000|
+---+------+

Code snippet

from pyspark.sql import SparkSession
from pyspark.sql.functions import when

appName = "PySpark when and otherwise Example"
master = "local"

# Create Spark session
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .getOrCreate()

spark.sparkContext.setLogLevel("WARN")

df = spark.range(1, 10)
df = df.withColumn('id_new', when(df.id % 2 == 0, df.id *
                                  100).when(df.id % 3 == 0, df.id*1000).otherwise(df.id))
df.show()
More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts