Spark SQL Joins with Examples
Spark SQL supports 7 types of joins: [ INNER ] | CROSS | LEFT [ OUTER ] | [ LEFT ] SEMI | RIGHT [ OUTER ] | FULL [ OUTER ] | [ LEFT ] ANTI. This article provides examples about these joins.
Inner join
As the following diagram shows, inner join returns rows that have matching values in both tables.
Code snippet
SELECT A.customer_id, A.type_code, A.register_date, B.type_name FROM customer AS A INNER JOIN customer_type AS B ON A.type_code = B.type_code
Left join
As the following diagram shows, left join returns all records from left table and matched values from right table (or NULLs if not matching).
Code snippet
SELECT A.customer_id, A.type_code, A.register_date, B.type_name FROM customer AS A LEFT JOIN customer_type AS B ON A.type_code = B.type_code
Right join
As the following diagram shows, right joins return all records from right table and matched values from left table (or NULLs if not matching).
Code snippet
SELECT A.customer_id, A.type_code, A.register_date, B.type_name FROM customer AS A RIGHT JOIN customer_type AS B ON A.type_code = B.type_code
Full join
A full join returns all values from both relations, appending NULL values on the side that does not have a match. It is also called full outer join.
Code snippet
SELECT A.customer_id, A.type_code, A.register_date, B.type_name FROM customer AS A FULL JOIN customer_type AS B ON A.type_code = B.type_code
Left semi join
A semi join returns values from the left side of the relation that has a match with the right. It is also called left semi join.
Code snippet
SELECT A.customer_id, A.type_code, A.register_date, B.type_name FROM customer AS A LEFT SEMI JOIN customer_type AS B ON A.type_code = B.type_code
Left anti join
An anti join returns returns values from the left relation that has no match with the right. It is also called left anti join.
Code snippet
SELECT A.customer_id, A.type_code, A.register_date, B.type_name FROM customer AS A LEFT ANTI JOIN customer_type AS B ON A.type_code = B.type_code
Cross join
A cross join returns the Cartesian product of two tables (relations).
Code snippet
SELECT A.customer_id, A.type_code, A.register_date, B.type_name FROM customer AS A CROSS JOIN customer_type AS B
References
JOIN - Spark 3.2.1 Documentation (apache.org)