from pyspark.sql import SparkSession
from pyspark.sql.functions import when
appName = "PySpark when and otherwise Example"
master = "local"
# Create Spark session
spark = SparkSession.builder \
.appName(appName) \
.master(master) \
.getOrCreate()
spark.sparkContext.setLogLevel("WARN")
df = spark.range(1, 10)
df = df.withColumn('id_new', when(df.id % 2 == 0, df.id *
100).when(df.id % 3 == 0, df.id*1000).otherwise(df.id))
df.show()
visibility 817
comment 0
access_time 8 months ago
language English
Use when() and otherwise() with PySpark DataFrame
In Spark SQL, CASE WHEN clause can be used to evaluate a list of conditions and to return one of the multiple results for each column. The same can be implemented directly using pyspark.sql.functions.when
and pyspark.sql.Column.otherwise
functions. If otherwise
is not used together with when
, None will be returned for unmatched conditions.
Output:
+---+------+ | id|id_new| +---+------+ | 1| 1| | 2| 200| | 3| 3000| | 4| 400| | 5| 5| | 6| 600| | 7| 7| | 8| 800| | 9| 9000| +---+------+
Code snippet
copyright
This page is subject to Site terms.
Log in with external accounts
comment Comments
No comments yet.
warning Please login first to view stats information.