hive

50 items tagged with "hive"

49 Articles
1 Diagram

Articles

PySpark - Save DataFrame into Hive Table using insertInto

This code snippets provides one example of inserting data into Hive table using PySpark DataFrameWriter.insertInto API. `` DataFrameWriter.insertInto(tableName: str, overwrite: Optional[bool] = None) ` It takes two parameters: tableName - the table to insert data into; overwrite` - whether to overwrite existing data. By default, it won't overwrite existing data. This function uses position-based resolution for columns instead of column names.

2022-08-24
Code Snippets & Tips

Introduction to Hive Bucketed Table

2022-08-24
Hadoop, Hive & HBase

Find Number of Rows of Hive Table via Scala

To find the number of rows/records in a Hive table, we can use Spark SQL count aggregation function: Hive SQL - Aggregate Functions Overview with Examples. This code snippet provide example of Scala code to implement the same. spark-shell is used directly for simplicity. The code snippet can also run Jupyter Notebooks or Zeppelin with Spark kernel. Alternatively, it can be compiled to jar file and then submit as job via spark-submit. !2022082315649-image.png

2022-08-23
Code Snippets & Tips

Start Hive Beeline CLI

This code snippet provides example to start Hive Beeline CLI in Linux. Beeline is the successor of Hive CLI. In the shell scripts, the environment variable $HIVE_HOME is the home folder of Hive installation in the system. In a cluster environment, it usually refers to the Hive client installation on an edge server. Output: `` $HIVE_HOME/bin/beeline -u jdbc:hive2:// Connecting to jdbc:hive2:// Hive Session ID = 65a40cd9-02ce-4965-93b6-cff9db461b70 Connected to: Apache Hive (version 3.1.3) Driver: Hive JDBC (version 3.1.3) Transaction isolation: TRANSACTIONREPEATABLEREAD Beeline version 3.1.3 by Apache Hive 0: jdbc:hive2://> ``

2022-08-20
Code Snippets & Tips

Hive - Retrieve Current User

This code snippet provides example of retrieving current user via current_user() function in HQL (Hive QL) code. Output: `` 0: jdbc:hive2://> select currentuser();OK+----------+| c0 |+----------+| kontext |+----------+ ``

2022-08-20
Code Snippets & Tips

Configure HiveServer2 to Enable Transactions (ACID Support)

2022-08-20
Hadoop, Hive & HBase

Hive SQL - Merge Statement on ACID Tables

Hive supports standard ANSI SQL MERGE statement from version 2.2. However it can be only be applied to tables that support ACID transactions. To learn more about ACID support in Hive, refer to article: Hive ACID Inserts, Updates and Deletes with ORC. Sample table This code snippet merges into a sample table named testdb.crudtable. It has two records before the merge. !20220819124209-image.png The staging table was created using the following statements: `` create table crudtablestg (id int, value string, op string); insert into crudtablestg values (1,'AA','U'),(2,'B','D'),(3,'C', 'I'); ` It has one additional column named op to indicate the delta changes: U - updates D - deletes I - inserts (i.e. new records) Syntax ` MERGE INTO AS T USING AS S ON WHEN MATCHED [AND ] THEN UPDATE SET WHEN MATCHED [AND ] THEN DELETE WHEN NOT MATCHED [AND ] THEN INSERT VALUES `` Output After the merge, record 1 is updated; record 2 is deleted and record 3 is inserted into the table.

2022-08-19
Code Snippets & Tips

Spark Insert Data into Hive Tables

2022-08-17
Spark & PySpark

Hive ACID Inserts, Updates and Deletes with ORC

2022-08-17
Hadoop, Hive & HBase

Hive SQL - Union data with UNION ALL and UNION DISTINCT

2022-07-23
Code Snippets & Tips

Hive SQL - Analytics with GROUP BY and GROUPING SETS, Cubes, Rollups

2022-07-23
Hadoop, Hive & HBase

Hive SQL - Data Sampling using TABLESAMPLE

2022-07-23
Code Snippets & Tips

Extract Values from XML Column in Hive Tables

2022-07-23
Hadoop, Hive & HBase

Hive SQL - Virtual Columns

2022-07-23
Hadoop, Hive & HBase

Hive SQL - Cluster By and Distribute By

2022-07-10
Hadoop, Hive & HBase

Hive SQL - Differences between Order By and Sort By

2022-07-10
Hadoop, Hive & HBase

Hive SQL - Aggregate Functions Overview with Examples

2022-07-10
Hadoop, Hive & HBase

List Tables in Hive Database

2022-07-08
Code Snippets & Tips

PySpark - Read from Hive Tables

2022-07-08
Spark & PySpark

Hive - Create External Table for Multiline CSV Files

2022-06-01
Hadoop, Hive & HBase

Create Partitioned Hive Table

2021-12-23
Code Snippets & Tips

Spark - 保存DataFrame为Hive数据表

2021-10-13
Spark 中文

Hive - Rename Table

2021-09-10
Code Snippets & Tips

Python: Load Data from Hive

2021-01-06
Hadoop, Hive & HBase

Apache Hive 3.1.2 Installation on Linux Guide

2020-12-27
Hadoop, Hive & HBase

Error: Failed to load class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver

2020-12-27
Spark & PySpark

Create Temporary Table - Hive SQL

2020-08-25
Hadoop, Hive & HBase

Create Table as SELECT - Hive SQL

2020-08-25
Hadoop, Hive & HBase

Create Bucketed Sorted Table - Hive SQL

2020-08-25
Hadoop, Hive & HBase

Create Partitioned Table - Hive SQL

2020-08-25
Hadoop, Hive & HBase

Create Table Stored as CSV, TSV, JSON Format - Hive SQL

2020-08-25
Hadoop, Hive & HBase

Create Table with Parquet, Orc, Avro - Hive SQL

2020-08-25
Hadoop, Hive & HBase

Create, Drop, and Truncate Table - Hive SQL

2020-08-24
Hadoop, Hive & HBase

Create, Drop, Alter and Use Database - Hive SQL

2020-08-24
Hadoop, Hive & HBase

Apache Hive 3.1.2 Installation on Windows 10

2020-08-10
Hadoop, Hive & HBase

Hive: Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V

2020-04-20
Hadoop, Hive & HBase

Differences between Hive External and Internal (Managed) Tables

2020-02-22
Hadoop, Hive & HBase

Schema Merging (Evolution) with Parquet in Spark and Hive

2020-02-02
Spark & PySpark

Select from dual in SQL / Hive

In Oracle database, you can select from dual table if you only want to return a one row result set. In many other databases, the query engine supports select directly from constant values without specifying a table name.

2019-11-18
Code Snippets & Tips

Select top N records in SQL / Hive

In different databases, the syntax of selecting top N records are slightly different. They may also differ from ISO standards.

2019-11-18
Code Snippets & Tips

Big Data Tools on Windows via Windows Subsystem for Linux (WSL)

2019-05-19
Sqoop

Apache Hive 3.1.1 Installation on Windows 10 using Windows Subsystem for Linux

2019-05-18
Hadoop, Hive & HBase

.NET for Apache Spark Preview with Examples

2019-04-26
Spark & PySpark

HiveServer2 Cannot Connect to Hive Metastore Resolutions/Workarounds

2019-04-15
Hadoop, Hive & HBase

Configure a SQL Server Database as Remote Hive Metastore

2019-04-14
Hadoop, Hive & HBase

Connect to Hive via HiveServer2 JDBC Driver

2019-04-14
Java Programming

Read Data from Hive in Spark 1.x and 2.x

2019-04-04
Spark & PySpark

Spark - Save DataFrame to Hive Table

2019-03-27
Spark & PySpark

Apache Hive 3.0.0 Installation on Windows 10 Step by Step Guide

2019-03-25
Hadoop, Hive & HBase

Diagrams