Spark SQL - Map Functions

access_time 6 days ago visibility7 comment 0

In Spark SQL, MapType is designed for key values, which is like dictionary object type in many other programming languages. This article summarize the commonly used map functions in Spark SQL.

map

Function map is used to create a map. 

Example:

spark-sql> select map(1,'a',2,'b',3,'c');
map(1, a, 2, b, 3, c)
{1:"a",2:"b",3:"c"}

map_concat

Function map_contact is used to union two maps.

Example:

spark-sql> select map_concat(map(1,'a',2,'b',3,'c'),map(4,'d'));
map_concat(map(1, a, 2, b, 3, c), map(4, d))
{1:"a",2:"b",3:"c",4:"d"}
warning Warning - if the keys are not unique in each map, error will throw out:  java.lang.RuntimeException: Duplicate map key 1 was found, please check the input data. If you want to remove the duplicated keys, you can set spark.sql.mapKeyDedupPolicy to LAST_WIN so that the key inserted at last takes precedence.

map_entries

This function returns an array of all the items in the map in an unordered manner.

Example:

spark-sql> select map_entries(map(1,'a',2,'b',3,'c',4,'d'));
map_entries(map(1, a, 2, b, 3, c, 4, d))
[{"key":1,"value":"a"},{"key":2,"value":"b"},{"key":3,"value":"c"},{"key":4,"value":"d"}]

map_keys

Function map_keys returns all the keys of a map in an unordered array.

Example:

spark-sql> select map_keys(map(1,'a',2,'b',3,'c',4,'d'));
map_keys(map(1, a, 2, b, 3, c, 4, d))
[1,2,3,4]

map_values

Function map_values returns all the values of a map in an unordered array.

Example:

spark-sql> select map_values(map(1,'a',2,'b',3,'c',4,'d'));
map_values(map(1, a, 2, b, 3, c, 4, d))
["a","b","c","d"]

map_from_entries

This function constructs a map from an array of entries. 

Example:

spark-sql> SELECT map_from_entries(array(struct('A', 1), struct('B', 2), struct('C', 3)));
map_from_entries(array(struct(A, 1), struct(B, 2), struct(C, 3)))
{"A":1,"B":2,"C":3}

copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn or Twitter.

Want to publish your article on Kontext?

Learn more

More from Kontext

visibility 6
thumb_up 0
access_time 5 days ago

Spark SQL provides functions to calculate covariances of a set of number pairs. There are two functions:  covar_pop(expr1, expr2) and covar_samp(expr1, expr2) . The first one calculates population covariance while the second one calculates sample covariance.  Example: SELECT ...

visibility 10
thumb_up 0
access_time 9 days ago

RANK in Spark calculates the rank of a value in a group of values. It returns one plus the number of rows proceeding or equals to the current row in the ordering of a partition. The returned values are not sequential.   The following sample SQL uses RANK function without PARTITION BY ...

visibility 9
thumb_up 0
access_time 9 days ago

DENSE_RANK is similar as  Spark SQL - RANK Window Function . It  calculates the rank of a value in a group of values. It returns one plus the number of rows proceeding or equals to the current row in the ordering of a partition. The returned values are sequential in each window thus no ...