Spark SQL - Extract Value from JSON String
JSON string values can be extracted using built-in Spark functions like get_json_object or json_tuple.
Function get_json_object
Values can be extracted using get_json_object function. The function has two parameters: json_txt and path. The first is the JSON text itself, for example a string column in your Spark DataFrame or Hive table; the second is the JSON path.
Example:
SELECT get_json_object(r.JSON, '$.Attr_INT') AS Attr_INT, get_json_object(r.JSON, '$.ATTR_DOUBLE') AS ATTR_DOUBLE, get_json_object(r.JSON, '$.ATTR_DATE') AS ATTR_DATE from (SELECT '{"Attr_INT":1, "ATTR_DOUBLE":10.201, "ATTR_DATE": "2021-01-01"}' as JSON) r;
Output:
Attr_INT ATTR_DOUBLE ATTR_DATE 1 10.201 2021-01-01
Function json_tuple
This function json_tuple(json_txt, p1, p2, ..., pn) returns a tuple like the function get_json_object, but it takes multiple names. All the input parameters and output column types are string.
Example:
SELECT json_tuple(r.JSON, 'Attr_INT', 'ATTR_DOUBLE', 'ATTR_DATE')
from (SELECT '{"Attr_INT":1, "ATTR_DOUBLE":10.201, "ATTR_DATE": "2021-01-01"}' as JSON) r;
c0 c1 c2 1 10.201 2021-01-01
Function from_json
Values can also be extracted directly using function from_json where JSON string are converted to object first and then are directly referenced in SELECT statement. Refer to article Spark SQL - Convert JSON String to Map for example.