Install HBase in WSL - Pseudo-Distributed Mode
insights Stats
Articles about Apache Hadoop, Hive and HBase installation, performance tuning and general tutorials.
*The yellow elephant logo is a registered trademark of Apache Hadoop.
HBase is the short name for Hadoop database. HBase is a distributed non-SQL database like Google Bigtable, which can utilizes distributed file system like HDFS. HBase can run in two modes - standalone and distributed. In this tutorial, I will show you how to install a pseudo-distributed HBase on WSL (Windows Subsystem for Linux) using HDFS.
A pseudo-distributed cluster run all daemon services in a single host.
Prerequisites
WSL
Please ensure you have WSL enabled on your Windows 10 system. Follow Install Windows Subsystem for Linux on a Non-System Drive to install WSL on a non-C drive. This tutorial utilizes Ubuntu distro. You could also replicate these steps on Ubuntu Linux system directly.
Hadoop
Hadoop is required as HDFS will be used in the following configuration for storing HBase data. Refer to the following post to install one if you don't have Hadoop environment to work with
Install Hadoop 3.2.0 on Windows 10 using Windows Subsystem for Linux (WSL)
Download HBase
1) Download HBase from a mirror site: Apache Download Mirrors. Make sure the version is compatible with the Hadoop version: Apache HBase ™ Reference Guide.
For example, I am using the following command to download the released binary into my WSL user home folder:
wget https://apache.mirror.digitalpacific.com.au/hbase/2.4.1/hbase-2.4.1-bin.tar.gz
The version downloaded is 2.4.1.
2) Extract the downloaded file using the following command:
tar xzvf hbase-2.4.1-bin.tar.gz -C ~/hadoop
3) Change directory to the extracted folder:
cd hadoop/hbase-2.4.1
The folder includes these files/subfolders:
~/hadoop/hbase-2.4.1$ ls CHANGES.md LEGAL LICENSE.txt NOTICE.txt README.txt RELEASENOTES.md bin conf docs hbase-webapps lib
Configure HBase
1) Open configuration file conf/hbase-site.xml.
2) Change hbase.cluster.distributed property to true:
<property> <name>hbase.cluster.distributed</name> <value>true</value> </property>
3) Configure hbase.rootdir:
<property> <name>hbase.rootdir</name> <value>hdfs://localhost:19000/hbase</value> </property>
Remember to update to your own HDFS address and port number accordingly to match with your HDFS configurations.
4) Add the following configuration into the same file:
<property> <name>hbase.zookeeper.property.clientPort</name> <value>10231</value> </property>
This is to use port 10231 instead of the default 2181 HBase client port to avoid port conflicts issues with HyperV or other services running on host Windows 10 system.
5) Edit config file conf/hbase-env.sh.
Add the following content:
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
Otherwise the following error may occur when starting HBase daemons:
127.0.0.1: +======================================================================+ 127.0.0.1: | Error: JAVA_HOME is not set | 127.0.0.1: +----------------------------------------------------------------------+ 127.0.0.1: | Please download the latest Sun JDK from the Sun Java web site | 127.0.0.1: | > http://www.oracle.com/technetwork/java/javase/downloads | 127.0.0.1: | | 127.0.0.1: | HBase requires Java 1.8 or later. | 127.0.0.1: +======================================================================+
Start Hadoop daemons
Ensure Hadoop daemons like HDFS and YARN are running.
For example, the following command starts DFS services (namenode and datanode) and YARN node manager and resource manager.
$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh
Use jps command ensure the services are running:
jps 9377 NodeManager 1395 SecondaryNameNode 9795 Jps 916 NameNode 9191 ResourceManager 1103 DataNode
Start HBase daemon
1) Start HBase daemon service using the following command
~/hadoop/hbase-2.4.1/bin/start-hbase.sh
Type yes when asked:
Are you sure you want to continue connecting (yes/no)?
The output looks like the following:
The authenticity of host '127.0.0.1 (127.0.0.1)' can't be established. ECDSA key fingerprint is SHA256:JfmFAU2mPqPVVenHNhbxyJ1zoEKRJQkgNwgpAWSGqyw. Are you sure you want to continue connecting (yes/no)? yes 127.0.0.1: Warning: Permanently added '127.0.0.1' (ECDSA) to the list of known hosts. ...... running master, logging to /home/tangr/hadoop/hbase-2.4.1/bin/../logs/hbase-tangr-master-raymond-pc.out : running regionserver, logging to /home/tangr/hadoop/hbase-2.4.1/bin/../logs/hbase-tangr-regionserver-raymond-pc.out
When asked, allow WSL java process to communicate with networks (otherwise ZooKeeper service won't be able to start):
You can also configure the firewall rule directly if the above prompt window doesn't show up:
Then socket connection is refused, try to temporarily disable your firewall to see if it works.
2) Verify using jps command:
9377 NodeManager 1395 SecondaryNameNode 9795 Jps 916 NameNode 7205 HQuorumPeer 9191 ResourceManager 7550 HRegionServer 1103 DataNode 7343 HMaster
As shown, there are severalmore services are running: HRegionServer, HMaster, HQuorumPeer, etc.
3) Verify HBase HDFS folder.
The following folders will be initialized in HDFS when the service starts successfully:
$ hadoop fs -ls /hbase Found 12 items drwxr-xr-x - tangr supergroup 0 2021-02-04 23:08 /hbase/.hbck drwxr-xr-x - tangr supergroup 0 2021-02-04 23:08 /hbase/.tmp drwxr-xr-x - tangr supergroup 0 2021-02-04 23:08 /hbase/MasterData drwxr-xr-x - tangr supergroup 0 2021-02-04 23:08 /hbase/WALs drwxr-xr-x - tangr supergroup 0 2021-02-04 23:08 /hbase/archive drwxr-xr-x - tangr supergroup 0 2021-02-04 23:08 /hbase/corrupt drwxr-xr-x - tangr supergroup 0 2021-02-04 23:08 /hbase/data -rw-r--r-- 1 tangr supergroup 42 2021-02-04 23:08 /hbase/hbase.id -rw-r--r-- 1 tangr supergroup 7 2021-02-04 23:08 /hbase/hbase.version drwxr-xr-x - tangr supergroup 0 2021-02-04 23:08 /hbase/mobdir drwxr-xr-x - tangr supergroup 0 2021-02-04 23:08 /hbase/oldWALs drwx--x--x - tangr supergroup 0 2021-02-04 23:08 /hbase/staging
4) Check the Web UI: http://localhost:16010.
The UI looks like the following:
Practice HBase commands
1) Connect to HBase using Shell:
~/hadoop/hbase-2.4.1/bin/hbase shell SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/tangr/hadoop/hadoop-3.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/tangr/hadoop/hbase-2.4.1/lib/client-facing-thirdparty/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase Shell Use "help" to get list of supported commands. Use "exit" to quit this interactive shell. For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell Version 2.4.1, rb4d9639f66fccdb45fea0244202ffbd755341260, Fri Jan 15 10:58:57 PST 2021 Took 0.0019 seconds hbase:001:0>
2) Practice the following commands in the HBase shell:
- create 'test_table', 'cf'
- list 'test_table'
- describe 'test_table'
- put 'test_table', 'row1', 'cf:a', 'value1'
- put 'test_table', 'row2', 'cf:b', 'value B'
- put 'test_table', 'row3', 'cf:c', 'value 3'
- scan 'test_table'
- get 'test_table', 'row1'
- drop 'test_table'
- disable 'test_table'
- drop 'test_table'
The output looks like the following screenshot:
3) Quit the shell by running this command:
quit # or exit
Stop HBase
Use the following command to Stop HBase daemon services if you don't want to use HBase in WSL:
~/hadoop/hbase-2.4.1$ bin/stop-hbase.sh stopping hbase............
Other notes
You may see warnings like the following:
../hadoop-3.2.0/libexec/hadoop-functions.sh: line 2364: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_USER: bad substitution ../hadoop-3.2.0/libexec/hadoop-functions.sh: line 2459: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_OPTS: bad substitution
These can be ignored.
References
Apache HBase ™ Reference Guide
Enjoy Hadoop 3.2.0 with HBase 2.4.1.