PySpark - Save DataFrame into Hive Table using insertInto
Code description
This code snippets provides one example of inserting data into Hive table using PySpark DataFrameWriter.insertInto
API.
DataFrameWriter.insertInto(tableName: str, overwrite: Optional[bool] = None)
It takes two parameters: tableName
- the table to insert data into; overwrite
- whether to overwrite existing data. By default, it won't overwrite existing data.
This function uses position-based resolution for columns instead of column names.
Code snippet
from pyspark.sql import SparkSession appName = "PySpark Hive Bucketing Example" master = "local" # Create Spark session with Hive supported. spark = SparkSession.builder \ .appName(appName) \ .master(master) \ .enableHiveSupport() \ .getOrCreate() # prepare sample data for inserting into hive table data = [] countries = ['CN', 'AU'] for i in range(0, 1000): data.append([int(i), 'U'+str(i), countries[i % 2]]) df = spark.createDataFrame(data, ['user_id', 'key', 'country']) df.show() # Save df to Hive table test_db.bucket_table df.write.mode('append').insertInto('test_db.bucket_table')
info Last modified by Kontext 3 years ago
copyright
This page is subject to Site terms.
comment Comments
No comments yet.