Code description
DataFrame.foreach
can be used to iterate/loop through each row (pyspark.sql.types.Row
) in a Spark DataFrame object and apply a function to all the rows. This method is a shorthand for DataFrame.rdd.foreach
.
Note: Please be cautious when using this method especially if your DataFrame is big.
Output:
+-----+--------+
| col1| col2|
+-----+--------+
|Hello| Kontext|
|Hello|Big Data|
+-----+--------+
col1=Hello, col2=Kontext
col1=Hello, col2=Big Data
Code snippet
from pyspark.sql import SparkSession
app_name = "PySpark foreach Example"
master = "local"
spark = SparkSession.builder .appName(app_name) .master(master) .getOrCreate()
spark.sparkContext.setLogLevel("WARN")
# Create a DataFrame
df = spark.createDataFrame(
[['Hello', 'Kontext'], ['Hello', 'Big Data']], ['col1', 'col2'])
df.show()
def print_row(row):
print(f'col1={row.col1}, col2={row.col2}')
# Apply print_row to each row
df.foreach(print_row)