Spark SQL
Spark SQL - UNIX timestamp functions
Function unix_timestamp() returns the UNIX timestamp of current time. You can also specify a input timestamp value. Example: spark-sql> select unix_timestamp(); unix_timestamp(current_timestamp(), yyyy-MM-dd HH:mm:ss) 1610174099 spark-sql> select unix_timestamp(current_timestamp ...
Spark SQL - Date and Timestamp Function
Function current_date() or current_date can be used to return the current date at the start of query evaluation. Example: spark-sql> select current_date(); current_date() 2021-01-09 spark-sql> select current_date; current_date() 2021-01-09 *Brackets are optional for this ...
Spark SQL - LEAD Window Function
Spark LEAD function provides access to a row at a given offset that follows the current row in a window. This analytic function can be used in a SELECT statement to compare values in the current row with values in a following row. This function is like Spark SQL - LAG Window Function .
Spark SQL - LAG Window Function
Spark LAG function provides access to a row at a given offset that comes before the current row in the windows. This function can be used in a SELECT statement to compare values in the current row with values in a previous row. lag(input[, offset[, default]]) OVER ([PARYITION BY ..] ORDER BY ...) ...
Spark SQL - NTILE Window Function
Spark NTILE function divides the rows in each window to 'n' buckets ranging from 1 to at most 'n' (n is the specified parameter). The following sample SQL uses NTILE function to divide records in each window to two buckets. SELECT TXN.*, NTILE(2) OVER (PARTITION BY ...
Spark SQL - DENSE_RANK Window Function
DENSE_RANK is similar as Spark SQL - RANK Window Function . It calculates the rank of a value in a group of values. It returns one plus the number of rows proceeding or equals to the current row in the ordering of a partition. The returned values are sequential in each window thus no ...
Spark SQL - RANK Window Function
RANK in Spark calculates the rank of a value in a group of values. It returns one plus the number of rows proceeding or equals to the current row in the ordering of a partition. The returned values are not sequential. The following sample SQL uses RANK function without PARTITION BY ...
Spark SQL - ROW_NUMBER Window Functions
ROW_NUMBER in Spark assigns a unique sequential number (starting from 1) to each record based on the ordering of rows in each window partition. It is commonly used to deduplicate data. The following sample SQL uses ROW_NUMBER function without PARTITION BY clause: SELECT TXN.*, ROW_NUMBER() OVER ...
In Spark-SQL CLI tool, the result print will omit headings (column names) by default. To display columns, we need to update Spark setting spark.hadoop.hive.cli.print.header. To make the changes for all spark-sql sessions, edit file $SPARK_HOME/conf/spark-defaults.conf . Add the following ...
In Spark SQL, virtual table can be constructed directly from constant values (literals) using SELECT statement. spark-sql> SELECT StrColumn FROM VALUES ('abc'),('def'),('ghi') table1(StrColumn); StrColumn abc def ghi spark-sql> SELECT ID,StrColumn FROM VALUES ...