java.lang.NoSuchMethodError: PoolConfig.setMinEvictableIdleTime
insights Stats
Apache Spark installation guides, performance tuning tips, general tutorials, etc.
*Spark logo is a registered trademark of Apache Spark.
Context
When using structured streaming to sink Kafka messages into HDFS using Spark, I am hitting this error:
java.lang.NoSuchMethodError: org.apache.spark.sql.kafka010.consumer.InternalKafkaConsumerPool$PoolConfig.setMinEvictableIdleTime(Ljava/time/Duration;)V
The environment I am using:
- Spark: 3.3.0 (Scala 2.12)
- Kafka: kafka_2.13-3.2.0 (Kafka 3.2.0, Scala 2.13).
- org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.0 (for Spark 3.3.0, Kafka broker 0.10.0+, Scala 2.12)
The error occurred when Spark trying to establish an internal Kafka consumer to read messages in the topic.
Look into the details
The error happens to class PoolConfig
where method setMinEvictableIdleTime
doesn't exist. This class is part of Apache Commons Pool library (commons-pool2).
From Maven central, the following versions are used by org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.0: 2.11.1.
The exception is raised from line 186: InternalKafkaConsumerPool.scala#L186 in the file.
For class PoolConfig
, it is inherited from BaseObjectPoolConfig
. In the base class, method setMinEvictableIdleTime
was added from version 2.10.0. Before that version, method setMinEvictableIdleTimeMillis
was used.
Thus I am thinking - it might be because of the older version of commons-pool2 is used. However, from the Spark job logs, I can tell that version 2.11.1 was loaded:
2022-08-26T23:38:09,085 INFO [Thread-6] org.apache.spark.executor.Executor - Fetching spark://localhost:39883/jars/org.apache.commons_commons-pool2-2.11.1.jar with timestamp 1661521085729
2022-08-26T23:38:09,086 INFO [Thread-6] org.apache.spark.util.Utils - Fetching spark://localhost:39883/jars/org.apache.commons_commons-pool2-2.11.1.jar to /tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/fetchFileTemp5942499760833953456.tmp
2022-08-26T23:38:09,089 INFO [Thread-6] org.apache.spark.util.Utils - /tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/fetchFileTemp5942499760833953456.tmp has been previously copied to /tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/org.apache.commons_commons-pool2-2.11.1.jar
2022-08-26T23:38:09,094 INFO [Thread-6] org.apache.spark.executor.Executor - Adding file:/tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/org.apache.commons_commons-pool2-2.11.1.jar to class loader
Then I looked into Spark (3.3.0) jars folder and I can find a version of 1.5.4 for commons-pool: commons-pool-1.5.4.jar.
Resolution
I then manually downloaded commons-pool2
version 2.11.1
into Spark jars folder:
spark-3.3.0/jars$ wget https://repo1.maven.org/maven2/org/apache/commons/commons-pool2/2.11.1/commons-pool2-2.11.1.jar
spark-3.3.0/jars$ ls | grep commons-pool
commons-pool-1.5.4.jar
commons-pool2-2.11.1.jar
Rerun my Spark structure streaming application, the issue is then resolved.
person Matthias access_time 3 months ago
This is a great solution....I could solve my problem with this. I wonder why the issue is not more common?
This is a great solution....I could solve my problem with this. I wonder why the issue is not more common?
When using commercial solutions, people would config a shared group of jars across big data stack and the versions would generally be aligned.