Spark SQL Joins with Examples

visibility 21 access_time 29 days ago languageEnglish timeline Stats
timeline Stats
Page index 0.70
more_horiz
Spark SQL Joins with Examples

Spark SQL supports 7 types of joins: [ INNER ] | CROSS | LEFT [ OUTER ] | [ LEFT ] SEMI | RIGHT [ OUTER ] | FULL [ OUTER ] | [ LEFT ] ANTI. This article provides examples about these joins. 

Inner join

As the following diagram shows, inner join returns rows that have matching values in both tables. 


Code snippet

SELECT A.customer_id, A.type_code, A.register_date, B.type_name 
FROM customer AS A
INNER JOIN customer_type AS B
ON A.type_code = B.type_code

Left join

As the following diagram shows, left join returns all records from left table and matched values from right table (or NULLs if not matching).


Code snippet

SELECT A.customer_id, A.type_code, A.register_date, B.type_name 
FROM customer AS A
LEFT JOIN customer_type AS B
ON A.type_code = B.type_code

Right join

As the following diagram shows, right joins return all records from right table and matched values from left table (or NULLs if not matching).


Code snippet

SELECT A.customer_id, A.type_code, A.register_date, B.type_name 
FROM customer AS A
RIGHT JOIN customer_type AS B
ON A.type_code = B.type_code

Full join

A full join returns all values from both relations, appending NULL values on the side that does not have a match. It is also called full outer join.


Code snippet

SELECT A.customer_id, A.type_code, A.register_date, B.type_name 
FROM customer AS A
FULL JOIN customer_type AS B
ON A.type_code = B.type_code

Left semi join

A semi join returns values from the left side of the relation that has a match with the right. It is also called left semi join.


Code snippet

SELECT A.customer_id, A.type_code, A.register_date, B.type_name 
FROM customer AS A
LEFT SEMI JOIN customer_type AS B
ON A.type_code = B.type_code

Left anti join

An anti join returns returns values from the left relation that has no match with the right. It is also called left anti join.


Code snippet

SELECT A.customer_id, A.type_code, A.register_date, B.type_name 
FROM customer AS A
LEFT ANTI JOIN customer_type AS B
ON A.type_code = B.type_code

Cross join

A cross join returns the Cartesian product of two tables (relations).

Code snippet

SELECT A.customer_id, A.type_code, A.register_date, B.type_name 
FROM customer AS A
CROSS JOIN customer_type AS B
infoA cross join statement has no ON clause.

References

JOIN - Spark 3.2.1 Documentation (apache.org)

copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

More from Kontext
Spark Scala: Load Data from MySQL
visibility 1,623
thumb_up 0
access_time 2 years ago
Spark Scala: Load Data from MySQL
Convert string to date in Python / Spark
visibility 2,867
thumb_up 0
access_time 4 years ago