Spark & PySpark
ColumnApache Spark installation guides, performance tuning tips, general tutorials, etc.
*Spark logo is a registered trademark of Apache Spark.
Apache Spark installation guides, performance tuning tips, general tutorials, etc.
*Spark logo is a registered trademark of Apache Spark.
Hello, how would it be if I don't have a database created? Instead, I create the dataframes in the following way:
import findspark
findspark.init()
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.functions import broadcast
from pyspark.sql.types import *
spark = SparkSession.builder.getOrCreate()
emp = [(1, "AAA", "dept1", 1000),
(2, "BBB", "dept1", 1100),
(3, "CCC", "dept1", 3000),
(4, "DDD", "dept1", 1500),
(5, "EEE", "dept2", 8000),
(6, "FFF", "dept2", 7200),
(7, "GGG", "dept3", 7100),
(None, None, None, 7500),
(9, "III", None, 4500),
(10, None, "dept5", 2500)]
dept = [("dept1", "Department - 1"),
("dept2", "Department - 2"),
("dept3", "Department - 3"),
("dept4", "Department - 4")
]
df = spark.createDataFrame(emp, ["id", "name", "dept", "salary"])
deptdf = spark.createDataFrame(dept, ["id", "name"])
From this what is the way to save the dataframe as a hive table.