Use expr() Function in PySpark DataFrame

Code description

Spark SQL function expr() can be used to evaluate a SQL expression and returns as a column (pyspark.sql.column.Column). Any operators or functions that can be used in Spark SQL can also be used with DataFrame operations.

This code snippet provides an example of using expr() function directly with DataFrame. It also includes the snippet to derive a column without using this function.

The code snippet assumes a SparkSession object already exists as 'spark'.

Output:

    +---+-----+-----+
    | id|id_v1|id_v2|
    +---+-----+-----+
    |  1|   11|   11|
    |  2|   12|   12|
    |  3|   13|   13|
    |  4|   14|   14|
    |  5|   15|   15|
    |  6|   16|   16|
    |  7|   17|   17|
    |  8|   18|   18|
    |  9|   19|   19|
    +---+-----+-----+

Code snippet

    from pyspark.sql.functions import *
    
    df = spark.range(1,10)
    df = df.withColumn('id_v1', expr("id+10"))
    df = df.withColumn('id_v2', df.id + 10)
    df.show()

Code description

Code snippet

In this article