access_time 2 years ago languageEnglish
more_vert

Spark 3.0.1: Connect to HBase 2.4.1

visibility 2,600 comment 13
Spark doesn't include built-in HBase connectors. We can use HBase Spark connector or other third party connectors to connect to HBase in Spark. If you don't have Spark or HBase available to use, you can follow these articles to configure them. Apache Spark 3.0.1 Installation on Linux or WSL ...
info Last modified by Administrator 5 months ago
thumb_up 4

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

comment Comments
5 months ago link more_vert
Raymond Raymond
web_assetArticles 583
imageDiagrams 40
forumThreads 9
commentComments 218
loyaltyKontext Points 6293
account_circleProfile
#1561 Re: Spark 3.0.1: Connect to HBase 2.4.1

Just to follow up on this one as I didn't hear back from you. Have you resolved this problem?

format_quote

person cansın access_time 5 months ago
Re: Spark 3.0.1: Connect to HBase 2.4.1

Hi Raymond thanks for the article.

I have managed to create my own jar and connect to shell with following command:
spark-shell --jars hbase-connectors/spark/hbase-spark/target/hbase-spark-1.0.1-SNAPSHOT.jar

but when I write my imports I get following error: 

scala> import org.apache.hadoop.hbase.spark.HBaseContext

import org.apache.hadoop.hbase.spark.HBaseContext

scala> import org.apache.hadoop.hbase.HBaseConfiguration

<console>:24: error: object HBaseConfiguration is not a member of package org.apache.hadoop.hbase

       import org.apache.hadoop.hbase.HBaseConfiguration

Do you have any idea that what what might be wrong?


5 months ago link more_vert
Raymond Raymond
web_assetArticles 583
imageDiagrams 40
forumThreads 9
commentComments 218
loyaltyKontext Points 6293
account_circleProfile
#1559 Re: Spark 3.0.1: Connect to HBase 2.4.1

Hi cansın,

What is your version of HBase?

And also can you specify the full path to your spark-hbase connector jar file? For example, in the example I provided in this article, I am using ~/

spark-shell --jars ~/hbase-connectors/spark/hbase-spark/target/hbase-spark-1.0.1-SNAPSHOT.jar


format_quote

person cansın access_time 5 months ago
Re: Spark 3.0.1: Connect to HBase 2.4.1

Hi Raymond thanks for the article.

I have managed to create my own jar and connect to shell with following command:
spark-shell --jars hbase-connectors/spark/hbase-spark/target/hbase-spark-1.0.1-SNAPSHOT.jar

but when I write my imports I get following error: 

scala> import org.apache.hadoop.hbase.spark.HBaseContext

import org.apache.hadoop.hbase.spark.HBaseContext

scala> import org.apache.hadoop.hbase.HBaseConfiguration

<console>:24: error: object HBaseConfiguration is not a member of package org.apache.hadoop.hbase

       import org.apache.hadoop.hbase.HBaseConfiguration

Do you have any idea that what what might be wrong?


5 months ago link more_vert
C
cansın
web_assetArticles 0
imageDiagrams 0
forumThreads 0
commentComments 1
loyaltyKontext Points 1
#1558 Re: Spark 3.0.1: Connect to HBase 2.4.1

Hi Raymond thanks for the article.

I have managed to create my own jar and connect to shell with following command:
spark-shell --jars hbase-connectors/spark/hbase-spark/target/hbase-spark-1.0.1-SNAPSHOT.jar

but when I write my imports I get following error: 

scala> import org.apache.hadoop.hbase.spark.HBaseContext

import org.apache.hadoop.hbase.spark.HBaseContext

scala> import org.apache.hadoop.hbase.HBaseConfiguration

<console>:24: error: object HBaseConfiguration is not a member of package org.apache.hadoop.hbase

       import org.apache.hadoop.hbase.HBaseConfiguration

Do you have any idea that what what might be wrong?


7 months ago link more_vert
Administrator Administrator
web_assetArticles 63
imageDiagrams 3
forumThreads 4
commentComments 6
loyaltyKontext Points 671
account_circleProfile
#1548 Re: Spark 3.0.1: Connect to HBase 2.4.1

Please contact us via: Contact us and we will try to arrange a Teams session for you.

format_quote

person Pavan Kumar access_time 7 months ago
Re: Spark 3.0.1: Connect to HBase 2.4.1

Yes, they all are in current directory. Can we connect if possible?

7 months ago link more_vert
PK
Pavan Kumar
web_assetArticles 0
imageDiagrams 0
forumThreads 0
commentComments 5
loyaltyKontext Points 5
#1541 Re: Spark 3.0.1: Connect to HBase 2.4.1

Yes, they all are in current directory. Can we connect if possible?

format_quote

person Raymond access_time 7 months ago
Re: Spark 3.0.1: Connect to HBase 2.4.1

Are all those jars included in the current directory where you initiated the spark-shell?

You can manually put them into \jars directory in your Spark installation. 

7 months ago link more_vert
Raymond Raymond
web_assetArticles 583
imageDiagrams 40
forumThreads 9
commentComments 218
loyaltyKontext Points 6293
account_circleProfile
#1540 Re: Spark 3.0.1: Connect to HBase 2.4.1

Are all those jars included in the current directory where you initiated the spark-shell?

You can manually put them into \jars directory in your Spark installation. 

format_quote

person Pavan Kumar access_time 7 months ago
Re: Spark 3.0.1: Connect to HBase 2.4.1

I think there is no issue with the build. But I'm unable to connect to Hbase from Spark. I'm using a docker environment where Zookeeper, HDFS, Spark, and HBase run in different containers in the same network.

Here are the jars I'm using.

spark-shell --jars hbase-spark-protocol-shaded-1.0.0.7.2.12.0-291.jar,htrace-core4-4.2.0-incubating.jar,hbase-shaded-protobuf-3.5.1.jar,protobuf-java-2.5.0.jar,hbase-protocol-2.4.8.jar,hbase-shaded-miscellaneous-3.5.1.jar,hbase-mapreduce-2.4.8.jar,hbase-server-2.4.8.jar,hbase-client-2.4.8.jar,hbase-common-2.4.8.jar,hbase-spark-1.0.1-SNAPSHOT.jar,hadoop-common-2.8.5.jar --files hbase-site.xml

I have almost all the required jars but still seeing below error. I tried my best to debug the isue but didn't find a way to get rid of this. Please advise me how to resolve this or redirect me if there is any detailed documentation about prerequisites.

java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/shaded/protobuf/generated/MasterProtos$MasterService$BlockingInterface

  at java.lang.ClassLoader.defineClass1(Native Method)

  at java.lang.ClassLoader.defineClass(ClassLoader.java:757)

  at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

7 months ago link more_vert
PK
Pavan Kumar
web_assetArticles 0
imageDiagrams 0
forumThreads 0
commentComments 5
loyaltyKontext Points 5
#1537 Re: Spark 3.0.1: Connect to HBase 2.4.1

I think there is no issue with the build. But I'm unable to connect to Hbase from Spark. I'm using a docker environment where Zookeeper, HDFS, Spark, and HBase run in different containers in the same network.

Here are the jars I'm using.

spark-shell --jars hbase-spark-protocol-shaded-1.0.0.7.2.12.0-291.jar,htrace-core4-4.2.0-incubating.jar,hbase-shaded-protobuf-3.5.1.jar,protobuf-java-2.5.0.jar,hbase-protocol-2.4.8.jar,hbase-shaded-miscellaneous-3.5.1.jar,hbase-mapreduce-2.4.8.jar,hbase-server-2.4.8.jar,hbase-client-2.4.8.jar,hbase-common-2.4.8.jar,hbase-spark-1.0.1-SNAPSHOT.jar,hadoop-common-2.8.5.jar --files hbase-site.xml

I have almost all the required jars but still seeing below error. I tried my best to debug the isue but didn't find a way to get rid of this. Please advise me how to resolve this or redirect me if there is any detailed documentation about prerequisites.

java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/shaded/protobuf/generated/MasterProtos$MasterService$BlockingInterface

  at java.lang.ClassLoader.defineClass1(Native Method)

  at java.lang.ClassLoader.defineClass(ClassLoader.java:757)

  at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

format_quote

person Raymond access_time 7 months ago
Re: Spark 3.0.1: Connect to HBase 2.4.1

I'm glad to hear that. Since the HBase minor version is slightly different, it might be possible that the package will cause unexpected problem though the possibility is low since the major version is the same. Please let me know if you encounter issue like that. 
7 months ago link more_vert
Raymond Raymond
web_assetArticles 583
imageDiagrams 40
forumThreads 9
commentComments 218
loyaltyKontext Points 6293
account_circleProfile
#1536 Re: Spark 3.0.1: Connect to HBase 2.4.1
I'm glad to hear that. Since the HBase minor version is slightly different, it might be possible that the package will cause unexpected problem though the possibility is low since the major version is the same. Please let me know if you encounter issue like that. 
format_quote

person Pavan Kumar access_time 7 months ago
Re: Spark 3.0.1: Connect to HBase 2.4.1

Awesome!
That worked. Thanks again for your help, Raymond.

7 months ago link more_vert
PK
Pavan Kumar
web_assetArticles 0
imageDiagrams 0
forumThreads 0
commentComments 5
loyaltyKontext Points 5
#1535 Re: Spark 3.0.1: Connect to HBase 2.4.1

Awesome!
That worked. Thanks again for your help, Raymond.

format_quote

person Raymond access_time 7 months ago
Re: Spark 3.0.1: Connect to HBase 2.4.1

Hi Pavan,

The issue you encountered is the same one I mentioned in the article due to incompatible version of the HBase and connector code. 

For HBase version, I have to use 2.2.4 as the latest hbase-connector code was based on that version. 

So please try the following command:

mvn -Dspark.version=3.1.1 -Dscala.version=2.12.10 -Dscala.binary.version=2.12 -Dhbase.version=2.2.4 -Dhadoop.profile=3.0 -Dhadoop-three.version=3.2.1 -DskipTests -Dcheckstyle.skip -U clean package

The built package should still work with HBase 2.4.7.


Regards,

Raymond

7 months ago link more_vert
Raymond Raymond
web_assetArticles 583
imageDiagrams 40
forumThreads 9
commentComments 218
loyaltyKontext Points 6293
account_circleProfile
#1534 Re: Spark 3.0.1: Connect to HBase 2.4.1

Hi Pavan,

The issue you encountered is the same one I mentioned in the article due to incompatible version of the HBase and connector code. 

For HBase version, I have to use 2.2.4 as the latest hbase-connector code was based on that version. 

So please try the following command:

mvn -Dspark.version=3.1.1 -Dscala.version=2.12.10 -Dscala.binary.version=2.12 -Dhbase.version=2.2.4 -Dhadoop.profile=3.0 -Dhadoop-three.version=3.2.1 -DskipTests -Dcheckstyle.skip -U clean package

The built package should still work with HBase 2.4.7.


Regards,

Raymond

format_quote

person Pavan Kumar access_time 7 months ago
Re: Spark 3.0.1: Connect to HBase 2.4.1

Thanks for pointing that @Raymond. My Hadoop, Spark, Scala, and Hbase versions are 3.2.1, 3.1.1,2.12, and 2.4.7 respectively.

Maven build:

mvn -Dspark.version=3.1.1 -Dscala.version=2.12.10 -Dscala.binary.version=2.12 -Dhbase.version=2.4.7 -Dhadoop.profile=3.0 -Dhadoop-three.version=3.2.1 -DskipTests -Dcheckstyle.skip -U clean package

I have upgraded Maven and the issue is resolved. But seeing a compilation error as below.

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project hbase-kafka-proxy: Compilation failure

[ERROR] /home/ec2-user/git/spark-hbase/hbase-connectors/kafka/hbase-kafka-proxy/src/main/java/org/apache/hadoop/hbase/kafka/KafkaTableForBridge.java:[53,8] org.apache.hadoop.hbase.kafka.KafkaTableForBridge is not abstract and does not override abstract method getRegionLocator() in org.apache.hadoop.hbase.client.Table

I would be so grateful if you could help me with what I need to learn to resolve such issues.

Thank you so much for your help.

timeline Stats
Page index 5.98