Kontext Kontext | Code Snippets & Tips

Iterate through PySpark DataFrame Rows via foreach

event 2022-08-19 visibility 2,283 comment 0 insights
insights Stats

Code description

DataFrame.foreach can be used to iterate/loop through each row (pyspark.sql.types.Row) in a Spark DataFrame object and apply a function to all the rows. This method is a shorthand for DataFrame.rdd.foreach.

Note: Please be cautious when using this method especially if your DataFrame is big.


| col1|    col2|
|Hello| Kontext|
|Hello|Big Data|

col1=Hello, col2=Kontext
col1=Hello, col2=Big Data

Code snippet

from pyspark.sql import SparkSession

app_name = "PySpark foreach Example"
master = "local"

spark = SparkSession.builder \
    .appName(app_name) \
    .master(master) \


# Create a DataFrame
df = spark.createDataFrame(
    [['Hello', 'Kontext'], ['Hello', 'Big Data']], ['col1', 'col2'])


def print_row(row):
    print(f'col1={row.col1}, col2={row.col2}')

# Apply print_row to each row
More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts