Kontext Copilot - An AI assistant for data analytics. Learn more
Expression of Interest
PySpark DataFrame - inner, left and right Joins
insights Stats
warning Please login first to view stats information.
Kontext
Code Snippets & Tips
Code snippets and tips for various programming languages/frameworks. All code examples are under MIT or Apache 2.0 license unless specified otherwise.
Code description
This code snippet shows you how to perform inner, left and right joins with DataFrame.join
API.
def join(self, other, on=None, how=None)
Supported join types
The default join type is inner. The supported values for parameter how
are: inner, cross, outer, full, fullouter, full_outer, left, leftouter, left_outer, right, rightouter, right_outer, semi, leftsemi, left_semi, anti, leftanti and left_anti.
To learn about the these different join types, refer to article Spark SQL Joins with Examples.
Join via multiple columns
If there are more than one column to join, we can specify on
parameter as a list of column name:
df1.join(df2, on=['id','other_column'], how='left')
Output from the code snippet:
+---+----+ | id|attr| +---+----+ | 1| A| | 2| B| +---+----+ +---+--------+ | id|attr_int| +---+--------+ | 1| 100| | 2| 200| | 3| 300| +---+--------+ +---+----+--------+ | id|attr|attr_int| +---+----+--------+ | 1| A| 100| | 2| B| 200| +---+----+--------+ +---+----+--------+ | id|attr|attr_int| +---+----+--------+ | 1| A| 100| | 2| B| 200| +---+----+--------+ +---+----+--------+ | id|attr|attr_int| +---+----+--------+ | 1| A| 100| | 2| B| 200| | 3|null| 300| +---+----+--------+
Code snippet
from pyspark.sql import SparkSession from pyspark import SparkConf app_name = "PySpark - Joins Example" master = "local[8]" conf = SparkConf().setAppName(app_name)\ .setMaster(master) spark = SparkSession.builder.config(conf=conf) \ .getOrCreate() spark.sparkContext.setLogLevel("WARN") df1 = spark.createDataFrame([[1, 'A'], [2, 'B']], ['id', 'attr']) df1.show() df2 = spark.createDataFrame([[1, 100], [2, 200], [3, 300]], ['id', 'attr_int']) df2.show() # Joins df = df1.join(df2, on='id', how='inner') df.show() df = df1.join(df2, on='id', how='left') df.show() df = df1.join(df2, on='id', how='right') df.show()
copyright
This page is subject to Site terms.
comment Comments
No comments yet.