Spark 2.x
Form Spark 2.0, you can use Spark session builder to enable Hive support directly.
The following example (Python) shows how to implement it.
from pyspark.sql import SparkSession
appName = "PySpark Hive Example"
master = "local"
# Create Spark session with Hive supported.
spark = SparkSession.builder \
.appName(appName) \
.master(master) \
.enableHiveSupport() \
.getOrCreate()
# Read data using SQL
df = spark.sql("show databases")
df.show()
Spark 1.x
In previous versions, you need to use HiveContext to connect to Hive to manipulate data in Hive databases.
To initialize a HiveContext, you need to fist create a SparkContext.
from pyspark import SparkContext, SparkConf, HiveContext
appName = "JSON Parse Example"
master = "local"
conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)
# Construct a HiveContext object
sqlContext = HiveContext(sc)
# Read data using SQL
df = sqlContext.sql("show databases")
df.show()