person luc access_time 4 months ago
Re: PySpark: some questions for a beginner
I though that Spark Session was used when I wanted to work with dataframe
If I only need RDD, I use SparkContext.
Isn't that true anymore or was I wrong the whole time?
If wand to create 2 sparks sessions can I do this:
app_name = "PySpark Delta Lake - SCD2 Full Merge Example" master = "local"
app_name2 = "PySpark Delta Lake - SCD2 Full Merge Example 2" # Create Spark session with Delta extension builder1 = SparkSession.builder.appName(app_name) \ .master(master)
builder2 = SparkSession.builder.appName(app_name2) \ .master(master) spark1 = builder1.getOrCreate()
spark2 = builder2.getOrCreate()
From Spark 2.0, SparkSession is recommended as it encapsulates most of the APIs incl. SparkContext ones. You can still use SparkContext though if you prefer:
The example you provided will end up with one session.