PySpark: some questions for a beginner
insights Stats
Hello
i'm trying to understand how Spark wprks and I'm learning PySpark.
I know know Python and the Pandas library.
I understand that if I want to read a big cvs file with Pandas usin dataframe, it may not work (or it will take a long time to read).
As such PySpark is an alternative.
I read some artcicles and I understaoof the first thing to do is to create a SparkContext.
I understant the SparkContext will manage the cluster which will read the csv file and transform datas.
So I hade this code in a juptyter notebook# Import de SparkContext du module pyspark
from pyspark import SparkContext
sc = SparkContext('local')
sc
if i execute this code twice,t he 2nd time I will get an error because I cant' have 2 spark contexts.
Why can't i have 2 sparks contexts?
I xanted to try this:# Import de SparkContext du module pyspark
from pyspark import SparkContext
sc1 = SparkContext('local')
sc2 = SparkContext('local')
thank you
person luc access_time 2 years ago
thank you
I though that Spark Session was used when I wanted to work with dataframe
If I only need RDD, I use SparkContext.
Isn't that true anymore or was I wrong the whole time?
If wand to create 2 sparks sessions can I do this:
app_name = "PySpark Delta Lake - SCD2 Full Merge Example" master = "local"
app_name2 = "PySpark Delta Lake - SCD2 Full Merge Example 2" # Create Spark session with Delta extension builder1 = SparkSession.builder.appName(app_name) \ .master(master)
builder2 = SparkSession.builder.appName(app_name2) \ .master(master) spark1 = builder1.getOrCreate()
spark2 = builder2.getOrCreate()
thank you
I though that Spark Session was used when I wanted to work with dataframe
If I only need RDD, I use SparkContext.
Isn't that true anymore or was I wrong the whole time?
If wand to create 2 sparks sessions can I do this:
app_name = "PySpark Delta Lake - SCD2 Full Merge Example" master = "local"
app_name2 = "PySpark Delta Lake - SCD2 Full Merge Example 2" # Create Spark session with Delta extension builder1 = SparkSession.builder.appName(app_name) \ .master(master)
builder2 = SparkSession.builder.appName(app_name2) \ .master(master) spark1 = builder1.getOrCreate()
spark2 = builder2.getOrCreate()
person Raymond access_time 2 years ago
Hi Luc,
Welcome to Kontext!
First, the API you are using is the old approach of establish session. For Spark 2 or 3, you can use the following approach to create SparkSession:
app_name = "PySpark Delta Lake - SCD2 Full Merge Example" master = "local" # Create Spark session with Delta extension builder = SparkSession.builder.appName(app_name) \ .master(master) spark = builder.getOrCreate()
This will ensure one active session.
Spark session allows you to interactively run your code hence the session will not be gone. You can stop SparkContext via stop
function:
pyspark.SparkContext.stop — PySpark 3.3.1 documentation (apache.org)
If you want to create two Spark Sessions, you can submit two Spark jobs separately. The following diagram might be helpful to you:
Also can you explain more why you need two sessions in one script?
Hi Luc,
Welcome to Kontext!
First, the API you are using is the old approach of establish session. For Spark 2 or 3, you can use the following approach to create SparkSession:
app_name = "PySpark Delta Lake - SCD2 Full Merge Example" master = "local" # Create Spark session with Delta extension builder = SparkSession.builder.appName(app_name) \ .master(master) spark = builder.getOrCreate()
This will ensure one active session.
Spark session allows you to interactively run your code hence the session will not be gone. You can stop SparkContext via stop
function:
pyspark.SparkContext.stop — PySpark 3.3.1 documentation (apache.org)
If you want to create two Spark Sessions, you can submit two Spark jobs separately. The following diagram might be helpful to you:
Also can you explain more why you need two sessions in one script?
From Spark 2.0, SparkSession is recommended as it encapsulates most of the APIs incl. SparkContext ones. You can still use SparkContext though if you prefer:
The example you provided will end up with one session.