[ad_1]
I wrote a script that create a spark session, use it and clean it like:
def myTestClass:
"""
My test Class
"""
def __init__(self):
"""
This function is called while creating an instance of thi
"""
self.s = _build_session(self)
def __del__(self):
"""
This function is called while cleaning an instance of this object.
"""
self.s.stop()
def _build_session(self):
"""
Create a spark session.
"""
return (
SparkSession.builder.master("yarn")
.config("spark.ui.port", "XXXX")
.config("spark.submit.deployMode", "client")
.config("spark.pyspark.driver.python", "/path/to/python3")
.config("spark.pyspark.python", "/path/to/python3")
.config("spark.dynamicAllocation.enabled", "True")
.config("spark.dynamicAllocation.maxExecutors", "12")
.config("spark.yarn.executor.memoryOverhead", "8g")
.config("spark.executor.memory", "16g")
.config("spark.executor.cores", "8")
.config("spark.driver.memory", "16g")
.config('spark.driver.cores', "8")
.config("spark.shuffle.service.enabled", "True")
.config("spark.hadoop.fs.hdfs.impl.disable.cache", "true")
.getOrCreate()
)
obj = myTestClass()
# Computing
# Deleting
- No multiprocessing
This class works well.
- Multiprocessing
While providing this class to a complex processus that computes using multiprocessing, multiple instances of sparkSession are created. Unfortunatelly, I get the following error:
WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor).
This may indicate an error, since only one SparkContext may be running in this JVM. The other SparkContext was created at:
blablabla
I learned that Zepellin or Livy provides “shared-sparkSession” and it’s probably a better option but I can I re-code all of what I have done.
I tried to include SparkContext.stop() in the function del but it does not worked.
[ad_2]