java.lang.NoSuchMethodError: PoolConfig.setMinEvictableIdleTime

Raymond Raymond event 2022-08-27 visibility 1,102 comment 2
more_vert

Context

When using structured streaming to sink Kafka messages into HDFS using Spark, I am hitting this error: 

java.lang.NoSuchMethodError: org.apache.spark.sql.kafka010.consumer.InternalKafkaConsumerPool$PoolConfig.setMinEvictableIdleTime(Ljava/time/Duration;)V

The environment I am using:

  • Spark: 3.3.0 (Scala 2.12)
  • Kafka: kafka_2.13-3.2.0 (Kafka 3.2.0, Scala 2.13).
  • org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.0 (for Spark 3.3.0, Kafka broker 0.10.0+, Scala 2.12)

The error occurred when Spark trying to establish an internal Kafka consumer to read messages in the topic.

Look into the details

The error happens to class PoolConfig where method setMinEvictableIdleTime doesn't exist. This class is part of Apache Commons Pool library (commons-pool2). 

From Maven central, the following versions are used by org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.02.11.1.

The exception is raised from line 186: InternalKafkaConsumerPool.scala#L186 in the file.

For class PoolConfig, it is inherited from BaseObjectPoolConfig. In the base class, method setMinEvictableIdleTime was added from version 2.10.0. Before that version, method setMinEvictableIdleTimeMillis was used. 

Thus I am thinking - it might be because of the older version of commons-pool2 is used. However, from the Spark job logs, I can tell that version 2.11.1 was loaded:

2022-08-26T23:38:09,085 INFO [Thread-6] org.apache.spark.executor.Executor - Fetching spark://localhost:39883/jars/org.apache.commons_commons-pool2-2.11.1.jar with timestamp 1661521085729
2022-08-26T23:38:09,086 INFO [Thread-6] org.apache.spark.util.Utils - Fetching spark://localhost:39883/jars/org.apache.commons_commons-pool2-2.11.1.jar to /tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/fetchFileTemp5942499760833953456.tmp
2022-08-26T23:38:09,089 INFO [Thread-6] org.apache.spark.util.Utils - /tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/fetchFileTemp5942499760833953456.tmp has been previously copied to /tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/org.apache.commons_commons-pool2-2.11.1.jar
2022-08-26T23:38:09,094 INFO [Thread-6] org.apache.spark.executor.Executor - Adding file:/tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/org.apache.commons_commons-pool2-2.11.1.jar to class loader

Then I looked into Spark (3.3.0) jars folder and I can find a version of 1.5.4 for commons-pool: commons-pool-1.5.4.jar

Resolution

I then manually downloaded commons-pool2 version 2.11.1 into Spark jars folder:

spark-3.3.0/jars$ wget https://repo1.maven.org/maven2/org/apache/commons/commons-pool2/2.11.1/commons-pool2-2.11.1.jar
spark-3.3.0/jars$ ls | grep commons-pool
commons-pool-1.5.4.jar
commons-pool2-2.11.1.jar

Rerun my Spark structure streaming application, the issue is then resolved. 

warning Warning - I am not 100% sure whether replacing this library will cause issues to Spark. At the moment, I have not hit any issues. So please be cautious while adopting this method. 
More from Kontext
comment Comments
M Matthias Abele

Matthias access_time 4 months ago link more_vert

This is a great solution....I could solve my problem with this. I wonder why the issue is not more common?

Raymond Raymond

Raymond access_time 4 months ago link more_vert

When using commercial solutions, people would config a shared group of jars across big data stack and the versions would generally be aligned.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts