Raymond Raymond / Spark & PySpark

PySpark - Read from Hive Tables

event 2022-07-08 visibility 2,730 comment 0 insights toc
insights Stats

Spark provides flexible APIs to read data from various data sources including Hive databases. In article Spark - Save DataFrame to Hive Table, it provides guidance about writing Spark DataFrame to Hive tables; this article will provides you examples of reading data from Hive using PySpark.



  1. Spark - If you don't have Spark environment, you can follow these articles to install Spark in your machine. 
  2. Hive - Similarly, follow hive installation articles to install Hive. 

Sample table

Create a sample Hive table using the following HQL:

create table test_db.test_table(id int, attr string);
insert into test_db.test_table(id, attr) values (1,'a'), (2,'b'),(3,'c');

The statements create a table with three records:

select * from test_db.test_table;
1       a
2       b
3       c

Read data from Hive

Now we can create a PySpark script (read-hive.py) to read from Hive table.

from pyspark.sql import SparkSession

appName = "PySpark Example - Read Hive"
master = "local"

# Create Spark session
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .enableHiveSupport() \


# Create DF by reading from Hive
df = spark.sql("select * from test_db.test_table")

enableHiveSupport will force Spark to use Hive data data catalog instead of in-memory catalog. You will be able to see logs of connecting Hive metastore thrift service like the following:

2022-07-08T19:43:23,205 INFO [Thread-5] hive.metastore - Trying to connect to metastore with URI thrift://
2022-07-08T19:43:23,225 INFO [Thread-5] hive.metastore - Opened a connection to metastore, current connections: 1
2022-07-08T19:43:23,253 INFO [Thread-5] hive.metastore - Connected to metastore.

Run the script using the following command:

spark-submit read-hive.py


StructType([StructField('id', IntegerType(), True), StructField('attr', StringType(), True)])
| id|attr|
|  1|   a|
|  2|   b|
|  3|   c|

With Hive support enabled, you can also use Hive built-in SQL functions that don't exist in Spark SQL functions. 

More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts