java.lang.NoSuchMethodError: PoolConfig.setMinEvictableIdleTime

When using structured streaming to sink Kafka messages into HDFS using Spark, I am hitting this error: 

java.lang.NoSuchMethodError: org.apache.spark.sql.kafka010.consumer.InternalKafkaConsumerPool$PoolConfig.setMinEvictableIdleTime(Ljava/time/Duration;)V

The environment I am using:

  • Spark: 3.3.0 (Scala 2.12)
  • Kafka: kafka_2.13-3.2.0 (Kafka 3.2.0, Scala 2.13).
  • org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.0 (for Spark 3.3.0, Kafka broker 0.10.0+, Scala 2.12)

The error occurred when Spark trying to establish an internal Kafka consumer to read messages in the topic.

Look into the details

The error happens to class PoolConfig where method setMinEvictableIdleTime doesn't exist. This class is part of Apache Commons Pool library (commons-pool2). 

From Maven central, the following versions are used by org.apache.spark:spark-sql-kafka-0-10_2.12:

The exception is raised from line 186: InternalKafkaConsumerPool.scala#L186 in the file.

For class PoolConfig, it is inherited from BaseObjectPoolConfig. In the base class, method setMinEvictableIdleTime was added from version 2.10.0. Before that version, method setMinEvictableIdleTimeMillis was used. 

Thus I am thinking - it might be because of the older version of commons-pool2 is used. However, from the Spark job logs, I can tell that version 2.11.1 was loaded:

2022-08-26T23:38:09,085 INFO [Thread-6] org.apache.spark.executor.Executor - Fetching spark://localhost:39883/jars/org.apache.commons_commons-pool2-2.11.1.jar with timestamp 1661521085729
2022-08-26T23:38:09,086 INFO [Thread-6] org.apache.spark.util.Utils - Fetching spark://localhost:39883/jars/org.apache.commons_commons-pool2-2.11.1.jar to /tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/fetchFileTemp5942499760833953456.tmp
2022-08-26T23:38:09,089 INFO [Thread-6] org.apache.spark.util.Utils - /tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/fetchFileTemp5942499760833953456.tmp has been previously copied to /tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/org.apache.commons_commons-pool2-2.11.1.jar
2022-08-26T23:38:09,094 INFO [Thread-6] org.apache.spark.executor.Executor - Adding file:/tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/org.apache.commons_commons-pool2-2.11.1.jar to class loader

Then I looked into Spark (3.3.0) jars folder and I can find a version of 1.5.4 for commons-pool: commons-pool-1.5.4.jar


I then manually downloaded commons-pool2 version 2.11.1 into Spark jars folder:

spark-3.3.0/jars$ wget https://repo1.maven.org/maven2/org/apache/commons/commons-pool2/2.11.1/commons-pool2-2.11.1.jar
spark-3.3.0/jars$ ls | grep commons-pool

Rerun my Spark structure streaming application, the issue is then resolved. 

warning Warning - I am not 100% sure whether replacing this library will cause issues to Spark. At the moment, I have not hit any issues. So please be cautious while adopting this method. 
