Read Data from Hive in Spark 1.x and 2.x

visibility 1,895 comment 0 access_time 11m languageEnglish

Spark 2.x

Form Spark 2.0, you can use Spark session builder to enable Hive support directly.

The following example (Python) shows how to implement it.

from pyspark.sql import SparkSession

appName = "PySpark Hive Example"
master = "local"

# Create Spark session with Hive supported.
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .enableHiveSupport() \
    .getOrCreate()

# Read data using SQL
df = spark.sql("show databases")
df.show()

Spark 1.x

In previous versions, you need to use HiveContext to connect to Hive to manipulate data in Hive databases.

To initialize a HiveContext, you need to fist create a SparkContext. 

from pyspark import SparkContext, SparkConf, HiveContext

appName = "JSON Parse Example"
master = "local"
conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)

# Construct a HiveContext object
sqlContext = HiveContext(sc)

# Read data using SQL
df = sqlContext.sql("show databases")
df.show()
info Last modified by Raymond 11m copyright This page is subject to Site terms.
Related series

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts