from pyspark.sql import SparkSession
appName = "PySpark Hive Bucketing Example"
master = "local"
# Create Spark session with Hive supported.
spark = SparkSession.builder \
.appName(appName) \
.master(master) \
.enableHiveSupport() \
.getOrCreate()
# prepare sample data for inserting into hive table
data = []
countries = ['CN', 'AU']
for i in range(0, 1000):
data.append([int(i), 'U'+str(i), countries[i % 2]])
df = spark.createDataFrame(data, ['user_id', 'key', 'country'])
df.show()
# Save df to Hive table test_db.bucket_table
df.write.mode('append').insertInto('test_db.bucket_table')
visibility 901
comment 0
access_time 8 months ago
language English
PySpark - Save DataFrame into Hive Table using insertInto
This code snippets provides one example of inserting data into Hive table using PySpark DataFrameWriter.insertInto
API.
DataFrameWriter.insertInto(tableName: str, overwrite: Optional[bool] = None)
It takes two parameters: tableName
- the table to insert data into; overwrite
- whether to overwrite existing data. By default, it won't overwrite existing data.
This function uses position-based resolution for columns instead of column names.
Code snippet
info Last modified by Kontext 8 months ago
copyright
This page is subject to Site terms.
Log in with external accounts
comment Comments
No comments yet.
warning Please login first to view stats information.
code
PySpark DataFrame - Add Row Number via row_number() Function
article
Save DataFrame to SQL Databases via JDBC in PySpark
article
PySpark DataFrame - Filter Records using where and filter Function
article
Run Multiple Python Scripts PySpark Application with yarn-cluster Mode
article
Differences between spark.sql.shuffle.partitions and spark.default.parallelism
Read more (115)