Kontext Copilot - An AI assistant for data analytics. Learn more
Expression of Interest
PySpark - Save DataFrame into Hive Table using insertInto
insights Stats
warning Please login first to view stats information.
Kontext
Code Snippets & Tips
Code snippets and tips for various programming languages/frameworks. All code examples are under MIT or Apache 2.0 license unless specified otherwise.
Code description
This code snippets provides one example of inserting data into Hive table using PySpark DataFrameWriter.insertInto
API.
DataFrameWriter.insertInto(tableName: str, overwrite: Optional[bool] = None)
It takes two parameters: tableName
- the table to insert data into; overwrite
- whether to overwrite existing data. By default, it won't overwrite existing data.
This function uses position-based resolution for columns instead of column names.
Code snippet
from pyspark.sql import SparkSession appName = "PySpark Hive Bucketing Example" master = "local" # Create Spark session with Hive supported. spark = SparkSession.builder \ .appName(appName) \ .master(master) \ .enableHiveSupport() \ .getOrCreate() # prepare sample data for inserting into hive table data = [] countries = ['CN', 'AU'] for i in range(0, 1000): data.append([int(i), 'U'+str(i), countries[i % 2]]) df = spark.createDataFrame(data, ['user_id', 'key', 'country']) df.show() # Save df to Hive table test_db.bucket_table df.write.mode('append').insertInto('test_db.bucket_table')
info Last modified by Kontext 3 years ago
copyright
This page is subject to Site terms.
comment Comments
No comments yet.