Read Data from Hive in Spark 1.x and 2.x

visibility 2,232 access_time 2 years ago languageEnglish timeline Stats
timeline Stats
Page index 1.96

Spark 2.x

Form Spark 2.0, you can use Spark session builder to enable Hive support directly.

The following example (Python) shows how to implement it.

from pyspark.sql import SparkSession

appName = "PySpark Hive Example"
master = "local"

# Create Spark session with Hive supported.
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .enableHiveSupport() \
    .getOrCreate()

# Read data using SQL
df = spark.sql("show databases")
df.show()

Spark 1.x

In previous versions, you need to use HiveContext to connect to Hive to manipulate data in Hive databases.

To initialize a HiveContext, you need to fist create a SparkContext. 

from pyspark import SparkContext, SparkConf, HiveContext

appName = "JSON Parse Example"
master = "local"
conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)

# Construct a HiveContext object
sqlContext = HiveContext(sc)

# Read data using SQL
df = sqlContext.sql("show databases")
df.show()
info Last modified by Raymond 2 years ago copyright This page is subject to Site terms.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

More from Kontext
Turn off INFO logs in Spark
visibility 11,469
thumb_up 0
access_time 2 years ago
Save DataFrame as CSV File in Spark
visibility 48,113
thumb_up 2
access_time 2 years ago