By using this site, you acknowledge that you have read and understand our Cookie policy, Privacy policy and Terms .

Spark 2.x

Form Spark 2.0, you can use Spark session builder to enable Hive support directly.

The following example (Python) shows how to implement it.

from pyspark.sql import SparkSession

appName = "PySpark Hive Example"
master = "local"

# Create Spark session with Hive supported.
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .enableHiveSupport() \
    .getOrCreate()

# Read data using SQL
df = spark.sql("show databases")
df.show()

Spark 1.x

In previous versions, you need to use HiveContext to connect to Hive to manipulate data in Hive databases.

To initialize a HiveContext, you need to fist create a SparkContext. 

from pyspark import SparkContext, SparkConf, HiveContext

appName = "JSON Parse Example"
master = "local"
conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)

# Construct a HiveContext object
sqlContext = HiveContext(sc)

# Read data using SQL
df = sqlContext.sql("show databases")
df.show()
info Last modified by Raymond at 13 months ago * This page is subject to Site terms.

More from Kontext

PySpark Read Multiple Lines Records from CSV

local_offer pyspark local_offer spark-2-x local_offer python

visibility 6
thumb_up 0
access_time 1 day ago

CSV is a common format used when extracting and exchanging data between systems and platforms. Once CSV file is ingested into HDFS, you can easily read them as DataFrame in Spark. However there are a few options you need to pay attention to especially if you source file: Has records ac...

open_in_new View

local_offer pyspark local_offer spark-2-x local_offer teradata local_offer SQL Server

visibility 45
thumb_up 0
access_time 12 days ago

In my previous article about  Connect to SQL Server in Spark (PySpark) , I mentioned the ways t...

open_in_new View

local_offer hive local_offer hdfs

visibility 56
thumb_up 0
access_time 2 months ago

In Hive, there are two types of tables can be created - internal and external table. Internal tables are also called managed tables. Different features are available to different types. This article lists some of the common differences.  Internal table By default, Hive creates ...

open_in_new View

Spark Read from SQL Server Source using Windows/Kerberos Authentication

local_offer pyspark local_offer SQL Server local_offer spark-2-x

visibility 134
thumb_up 0
access_time 2 months ago

In this article, I am going to show you how to use JDBC Kerberos authentication to connect to SQL Server sources in Spark (PySpark). I will use  Kerberos connection with principal names and password directly that requires  ...

open_in_new View

info About author

Kontext Column

Kontext Column

Created for everyone to publish data, programming and cloud related articles. Follow three steps to create your columns.

Learn more arrow_forward
info Follow us on Twitter to get the latest article updates. Follow us