Kontext Copilot - An AI-powered assistant for data analytics that runs on your local computer. Learn more
Get started
Use when() and otherwise() with PySpark DataFrame
insights Stats
warning Please login first to view stats information.
Kontext
Code Snippets & Tips
Code snippets and tips for various programming languages/frameworks. All code examples are under MIT or Apache 2.0 license unless specified otherwise.
Code description
In Spark SQL, CASE WHEN clause can be used to evaluate a list of conditions and to return one of the multiple results for each column. The same can be implemented directly using pyspark.sql.functions.when
and pyspark.sql.Column.otherwise
functions. If otherwise
is not used together with when
, None will be returned for unmatched conditions.
Output:
+---+------+ | id|id_new| +---+------+ | 1| 1| | 2| 200| | 3| 3000| | 4| 400| | 5| 5| | 6| 600| | 7| 7| | 8| 800| | 9| 9000| +---+------+
Code snippet
from pyspark.sql import SparkSession from pyspark.sql.functions import when appName = "PySpark when and otherwise Example" master = "local" # Create Spark session spark = SparkSession.builder \ .appName(appName) \ .master(master) \ .getOrCreate() spark.sparkContext.setLogLevel("WARN") df = spark.range(1, 10) df = df.withColumn('id_new', when(df.id % 2 == 0, df.id * 100).when(df.id % 3 == 0, df.id*1000).otherwise(df.id)) df.show()
copyright
This page is subject to Site terms.
comment Comments
No comments yet.