Read Data from Hive in Spark 1.x and 2.x

access_time 2 years ago visibility513 comment 0

Spark 2.x

Form Spark 2.0, you can use Spark session builder to enable Hive support directly.

The following example (Python) shows how to implement it.

from pyspark.sql import SparkSession

appName = "PySpark Hive Example"
master = "local"

# Create Spark session with Hive supported.
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .enableHiveSupport() \
    .getOrCreate()

# Read data using SQL
df = spark.sql("show databases")
df.show()

Spark 1.x

In previous versions, you need to use HiveContext to connect to Hive to manipulate data in Hive databases.

To initialize a HiveContext, you need to fist create a SparkContext. 

from pyspark import SparkContext, SparkConf, HiveContext

appName = "JSON Parse Example"
master = "local"
conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)

# Construct a HiveContext object
sqlContext = HiveContext(sc)

# Read data using SQL
df = sqlContext.sql("show databases")
df.show()
info Last modified by Administrator at 2 months ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Kontext Column

Created for everyone to publish data, programming and cloud related articles.
Follow three steps to create your columns.


Learn more arrow_forward

More from Kontext

local_offer python local_offer spark local_offer hadoop local_offer pyspark

visibility 1381
thumb_up 0
access_time 2 years ago

In one of my previous articles about Password Security Solution for Sqoop , I mentioned creating credential using hadoop credential command. The credentials are stored in JavaKeyStoreProvider. Credential providers are used to separate the use of sensitive tokens, secrets and passwords from the ...

Spark Structured Streaming - Read from and Write into Kafka Topics

local_offer spark local_offer kafka

visibility 100
thumb_up 0
access_time 20 days ago

Spark structured streaming provides rich APIs to read from and write to Kafka topics. When reading from Kafka, Kafka sources can be created for both streaming and batch queries. When writing into Kafka, Kafka sinks can be created as destination for both streaming and batch queries too.  ...

local_offer zeppelin local_offer spark local_offer linux local_offer WSL local_offer big-data-on-wsl

visibility 2714
thumb_up 0
access_time 2 years ago

This page summarizes the steps to install Zeppelin version 0.7.3 on Windows 10 via Windows Subsystem for Linux (WSL). When running Zeppelin in Ubuntu, the server may pick up one host address that is not accessible, for example 169.254.148.100, and the the remote interpreter connection cannot be ...

About column

Apache Spark installation guides, performance tuning tips, general tutorials, etc.

*Spark logo is a registered trademark of Apache Spark.

rss_feed Subscribe RSS