Install HBase in WSL - Pseudo-Distributed Mode

Install HBase in WSL - Pseudo-Distributed Mode

visibility 675 comment 0 access_time 9m languageEnglish

HBase is the short name for Hadoop database. HBase is a distributed non-SQL database like Google Bigtable, which can utilizes distributed file system like HDFS. HBase can run in two modes - standalone and distributed. In this tutorial, I will show you how to install a pseudo-distributed HBase on WSL (Windows Subsystem for Linux) using HDFS. 

A pseudo-distributed cluster run all daemon services in a single host.


Prerequisites

WSL

Please ensure you have WSL enabled on your Windows 10 system.  Follow Install Windows Subsystem for Linux on a Non-System Drive to install WSL on a non-C drive.  This tutorial utilizes Ubuntu distro. You could also replicate these steps on Ubuntu Linux system directly.

Hadoop

Hadoop is required as HDFS will be used in the following configuration for storing HBase data. Refer to the following post to install one if you don't have Hadoop environment to work with 

Install Hadoop 3.2.0 on Windows 10 using Windows Subsystem for Linux (WSL)

Download HBase

1) Download HBase from a mirror site: Apache Download Mirrors. Make sure the version is compatible with the Hadoop version: Apache HBase ™ Reference Guide

warning Hadoop 3.3.0 is not compatible with HBase 2.4.1 based on my testing thus this tutorial is using Hadoop 3.2.0.

For example, I am using the following command to download the released binary into my WSL user home folder:

wget https://apache.mirror.digitalpacific.com.au/hbase/2.4.1/hbase-2.4.1-bin.tar.gz

The version downloaded is 2.4.1.

2) Extract the downloaded file using the following command:

tar xzvf hbase-2.4.1-bin.tar.gz -C ~/hadoop

3) Change directory to the extracted folder:

cd hadoop/hbase-2.4.1

The folder includes these files/subfolders:

~/hadoop/hbase-2.4.1$ ls
CHANGES.md  LEGAL  LICENSE.txt  NOTICE.txt  README.txt  RELEASENOTES.md  bin  conf  docs  hbase-webapps  lib

Configure HBase

1) Open configuration file conf/hbase-site.xml.

2) Change hbase.cluster.distributed property to true:

  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>

3) Configure hbase.rootdir:

<property>
  <name>hbase.rootdir</name>
  <value>hdfs://localhost:19000/hbase</value>
</property>

Remember to update to your own HDFS address and port number accordingly to match with your HDFS configurations. 

4) Add the following configuration into the same file:

  <property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>10231</value>
  </property>

This is to use port 10231 instead of the default 2181 HBase client port to avoid port conflicts issues with HyperV or other services running on host Windows 10 system.

5) Edit config file conf/hbase-env.sh.

Add the following content:

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64

Otherwise the following error may occur when starting HBase daemons:

127.0.0.1: +======================================================================+
127.0.0.1: |                    Error: JAVA_HOME is not set                       |
127.0.0.1: +----------------------------------------------------------------------+
127.0.0.1: | Please download the latest Sun JDK from the Sun Java web site        |
127.0.0.1: |     > http://www.oracle.com/technetwork/java/javase/downloads        |
127.0.0.1: |                                                                      |
127.0.0.1: | HBase requires Java 1.8 or later.                                    |
127.0.0.1: +======================================================================+

Start Hadoop daemons

Ensure Hadoop daemons like HDFS and YARN are running.

For example, the following command starts DFS services (namenode and datanode) and YARN node manager and resource manager.

$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh

Use jps command ensure the services are running:

jps
9377 NodeManager
1395 SecondaryNameNode
9795 Jps
916 NameNode
9191 ResourceManager
1103 DataNode

Start HBase daemon

1) Start HBase daemon service using the following command 

~/hadoop/hbase-2.4.1/bin/start-hbase.sh

Type yes when asked:

Are you sure you want to continue connecting (yes/no)?

The output looks like the following:

The authenticity of host '127.0.0.1 (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:JfmFAU2mPqPVVenHNhbxyJ1zoEKRJQkgNwgpAWSGqyw.
Are you sure you want to continue connecting (yes/no)? yes
127.0.0.1: Warning: Permanently added '127.0.0.1' (ECDSA) to the list of known hosts.
......
running master, logging to /home/tangr/hadoop/hbase-2.4.1/bin/../logs/hbase-tangr-master-raymond-pc.out
: running regionserver, logging to /home/tangr/hadoop/hbase-2.4.1/bin/../logs/hbase-tangr-regionserver-raymond-pc.out

When asked, allow WSL java process to communicate with networks (otherwise ZooKeeper service won't be able to start):

You can also configure the firewall rule directly if the above prompt window doesn't show up:


Then socket connection is refused, try to temporarily disable your firewall to see if it works. 

2) Verify using jps command:

9377 NodeManager
1395 SecondaryNameNode
9795 Jps
916 NameNode
7205 HQuorumPeer
9191 ResourceManager
7550 HRegionServer
1103 DataNode
7343 HMaster

As shown, there are severalmore services are running: HRegionServer, HMaster, HQuorumPeer, etc.

3) Verify HBase HDFS folder.

The following folders will be initialized in HDFS when the service starts successfully:

$ hadoop fs -ls /hbase
Found 12 items
drwxr-xr-x   - tangr supergroup          0 2021-02-04 23:08 /hbase/.hbck
drwxr-xr-x   - tangr supergroup          0 2021-02-04 23:08 /hbase/.tmp
drwxr-xr-x   - tangr supergroup          0 2021-02-04 23:08 /hbase/MasterData
drwxr-xr-x   - tangr supergroup          0 2021-02-04 23:08 /hbase/WALs
drwxr-xr-x   - tangr supergroup          0 2021-02-04 23:08 /hbase/archive
drwxr-xr-x   - tangr supergroup          0 2021-02-04 23:08 /hbase/corrupt
drwxr-xr-x   - tangr supergroup          0 2021-02-04 23:08 /hbase/data
-rw-r--r--   1 tangr supergroup         42 2021-02-04 23:08 /hbase/hbase.id
-rw-r--r--   1 tangr supergroup          7 2021-02-04 23:08 /hbase/hbase.version
drwxr-xr-x   - tangr supergroup          0 2021-02-04 23:08 /hbase/mobdir
drwxr-xr-x   - tangr supergroup          0 2021-02-04 23:08 /hbase/oldWALs
drwx--x--x   - tangr supergroup          0 2021-02-04 23:08 /hbase/staging

4) Check the Web UI: http://localhost:16010.

The UI looks like the following:

Practice HBase commands

1) Connect to HBase using Shell:

~/hadoop/hbase-2.4.1/bin/hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/tangr/hadoop/hadoop-3.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/tangr/hadoop/hbase-2.4.1/lib/client-facing-thirdparty/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.4.1, rb4d9639f66fccdb45fea0244202ffbd755341260, Fri Jan 15 10:58:57 PST 2021
Took 0.0019 seconds
hbase:001:0>

2) Practice the following commands in the HBase shell:

  • create 'test_table', 'cf'
  • list 'test_table'
  • describe 'test_table'
  • put 'test_table', 'row1', 'cf:a', 'value1'
  • put 'test_table', 'row2', 'cf:b', 'value B'
  • put 'test_table', 'row3', 'cf:c', 'value 3'
  • scan 'test_table'
  • get 'test_table', 'row1'
  • drop 'test_table'
  • disable 'test_table'
  • drop 'test_table'

The output looks like the following screenshot:


3) Quit the shell by running this command:

quit
# or
exit

Stop HBase

Use the following command to Stop HBase daemon services if you don't want to use HBase in WSL:

~/hadoop/hbase-2.4.1$ bin/stop-hbase.sh
stopping hbase............

Other notes

You may see warnings like the following:

../hadoop-3.2.0/libexec/hadoop-functions.sh: line 2364: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_USER: bad substitution
../hadoop-3.2.0/libexec/hadoop-functions.sh: line 2459: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_OPTS: bad substitution

These can be ignored.

References

Apache HBase ™ Reference Guide

Enjoy Hadoop 3.2.0 with HBase 2.4.1.

info Last modified by Raymond 9m copyright This page is subject to Site terms.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Tags
More from Kontext