Read Data from Hive in Spark 1.x and 2.x

visibility 2,227 access_time 2 years ago languageEnglish

Spark 2.x

Form Spark 2.0, you can use Spark session builder to enable Hive support directly.

The following example (Python) shows how to implement it.

from pyspark.sql import SparkSession

appName = "PySpark Hive Example"
master = "local"

# Create Spark session with Hive supported.
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .enableHiveSupport() \
    .getOrCreate()

# Read data using SQL
df = spark.sql("show databases")
df.show()

Spark 1.x

In previous versions, you need to use HiveContext to connect to Hive to manipulate data in Hive databases.

To initialize a HiveContext, you need to fist create a SparkContext. 

from pyspark import SparkContext, SparkConf, HiveContext

appName = "JSON Parse Example"
master = "local"
conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)

# Construct a HiveContext object
sqlContext = HiveContext(sc)

# Read data using SQL
df = sqlContext.sql("show databases")
df.show()
info Last modified by Raymond 2 years ago copyright This page is subject to Site terms.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

timeline Stats
Page index 1.96
More from Kontext
Scala - Add Constant Column to Spark Data Frame
visibility 758
thumb_up 0
access_time 2 years ago
Create Partitioned Hive Table
visibility 55
thumb_up 0
access_time 5 months ago