Spark SQL Joins with Examples

event 2022-05-31 visibility 694 comment 0 insights
more_vert
insights Stats
Spark SQL Joins with Examples
Kontext Kontext Spark & PySpark

Apache Spark installation guides, performance tuning tips, general tutorials, etc.

*Spark logo is a registered trademark of Apache Spark.


Spark SQL supports 7 types of joins: [ INNER ] | CROSS | LEFT [ OUTER ] | [ LEFT ] SEMI | RIGHT [ OUTER ] | FULL [ OUTER ] | [ LEFT ] ANTI. This article provides examples about these joins. 

Inner join

As the following diagram shows, inner join returns rows that have matching values in both tables. 

Spark SQL Joins - Inner Join

Code snippet

SELECT A.customer_id, A.type_code, A.register_date, B.type_name 
FROM customer AS A
INNER JOIN customer_type AS B
ON A.type_code = B.type_code

Left join

As the following diagram shows, left join returns all records from left table and matched values from right table (or NULLs if not matching).

Spark SQL Joins - Left Outer Join

Code snippet

SELECT A.customer_id, A.type_code, A.register_date, B.type_name 
FROM customer AS A
LEFT JOIN customer_type AS B
ON A.type_code = B.type_code

Right join

As the following diagram shows, right joins return all records from right table and matched values from left table (or NULLs if not matching).

Spark SQL Joins - Right Outer Join

Code snippet

SELECT A.customer_id, A.type_code, A.register_date, B.type_name 
FROM customer AS A
RIGHT JOIN customer_type AS B
ON A.type_code = B.type_code

Full join

A full join returns all values from both relations, appending NULL values on the side that does not have a match. It is also called full outer join.

Spark SQL Joins - Full Outer Join

Code snippet

SELECT A.customer_id, A.type_code, A.register_date, B.type_name 
FROM customer AS A
FULL JOIN customer_type AS B
ON A.type_code = B.type_code

Left semi join

A semi join returns values from the left side of the relation that has a match with the right. It is also called left semi join.

Spark SQL Joins - Left Semi Join

Code snippet

SELECT A.customer_id, A.type_code, A.register_date, B.type_name 
FROM customer AS A
LEFT SEMI JOIN customer_type AS B
ON A.type_code = B.type_code

Left anti join

An anti join returns returns values from the left relation that has no match with the right. It is also called left anti join.

Spark SQL Joins - Left Anti Join

Code snippet

SELECT A.customer_id, A.type_code, A.register_date, B.type_name 
FROM customer AS A
LEFT ANTI JOIN customer_type AS B
ON A.type_code = B.type_code

Cross join

A cross join returns the Cartesian product of two tables (relations).

Code snippet

SELECT A.customer_id, A.type_code, A.register_date, B.type_name 
FROM customer AS A
CROSS JOIN customer_type AS B
infoA cross join statement has no ON clause.

References

JOIN - Spark 3.2.1 Documentation (apache.org)

More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts