Spark 2.x
Form Spark 2.0, you can use Spark session builder to enable Hive support directly.
The following example (Python) shows how to implement it.
from pyspark.sql import SparkSession appName = "PySpark Hive Example" master = "local" # Create Spark session with Hive supported. spark = SparkSession.builder \ .appName(appName) \ .master(master) \ .enableHiveSupport() \ .getOrCreate() # Read data using SQL df = spark.sql("show databases") df.show()
Spark 1.x
In previous versions, you need to use HiveContext to connect to Hive to manipulate data in Hive databases.
To initialize a HiveContext, you need to fist create a SparkContext.
from pyspark import SparkContext, SparkConf, HiveContext appName = "JSON Parse Example" master = "local" conf = SparkConf().setAppName(appName).setMaster(master) sc = SparkContext(conf=conf) # Construct a HiveContext object sqlContext = HiveContext(sc) # Read data using SQL df = sqlContext.sql("show databases") df.show()