Spark SQL - Map Functions

Raymond Tang Raymond Tang 0 5634 3.45 index 1/9/2021

In Spark SQL, MapType is designed for key values, which is like dictionary object type in many other programming languages. This article summarize the commonly used map functions in Spark SQL.

map

Function mapis used to create a map.

Example:

spark-sql> select map(1,'a',2,'b',3,'c');
map(1, a, 2, b, 3, c)
{1:"a",2:"b",3:"c"}

map\_concat

Function map_contact is used to union two maps.

Example:

spark-sql> select map_concat(map(1,'a',2,'b',3,'c'),map(4,'d'));
map_concat(map(1, a, 2, b, 3, c), map(4, d))
{1:"a",2:"b",3:"c",4:"d"}

warning Warning - if the keys are not unique in each map, error will throw out:  java.lang.RuntimeException: Duplicate map key 1 was found, please check the input data. If you want to remove the duplicated keys, you can set spark.sql.mapKeyDedupPolicy to LAST_WIN so that the key inserted at last takes precedence.

map\_entries

This function returns an array of all the items in the map in an unordered manner.

Example:

spark-sql> select map_entries(map(1,'a',2,'b',3,'c',4,'d'));
map_entries(map(1, a, 2, b, 3, c, 4, d))
[{"key":1,"value":"a"},{"key":2,"value":"b"},{"key":3,"value":"c"},{"key":4,"value":"d"}]

map\_keys

Function map_keysreturns all the keys of a map in an unordered array.

Example:

spark-sql> select map_keys(map(1,'a',2,'b',3,'c',4,'d'));
map_keys(map(1, a, 2, b, 3, c, 4, d))
[1,2,3,4]

map\_values

Function map_values returns all the values of a map in an unordered array.

Example:

spark-sql> select map_values(map(1,'a',2,'b',3,'c',4,'d'));
map_values(map(1, a, 2, b, 3, c, 4, d))
["a","b","c","d"]

map\_from\_entries

This function constructs a map from an array of entries.

Example:

spark-sql> SELECT map_from_entries(array(struct('A', 1), struct('B', 2), struct('C', 3)));
map_from_entries(array(struct(A, 1), struct(B, 2), struct(C, 3)))
{"A":1,"B":2,"C":3}
spark-sql spark-sql-function

Join the Discussion

View or add your thoughts below

Comments