Install Apache Sqoop in Windows
This page summarizes the steps required to install Apache Sqoop (v1.4.7) in Windows 10 environment.
What is Sqoop
Sqoop is an ETL tool for Hadoop,which is designed to efficiently transfer data between structured (RDBMS), semi-structured (Cassandra, Hbase and etc.) and unstructured data sources (HDFS).
Project site
Prerequisites
Hadoop
In this tutorial, I am going to install Sqoop in the same server that I configured Hadoop. Follow the link below to setup Hadoop if you have not done that:
* This is only required if you want to run some Sqoop scripts to test and also Hadoop related environment variables are setup as part of the above guide.
Installation guide
The documentation for Sqoop 1.4.7 is available in the following link:
Download binary package
Download from the following link:
I am downloading the file sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz.
UnZip binary package
Open Git Bash, and change directory (cd) to the folder where you save the binary package and then unzip:
$ cd F:\DataAnalytics
fahao@Raymond-Alienware MINGW64 /f/DataAnalytics
$ tar -xvzf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
In my machine, the content is extracted to “F:\DataAnalytics\sqoop-1.4.7.bin__hadoop-2.6.0”.
Setup environment variables
Make sure the following environment variable is setup:
- SQOOP_HOME: pointing to your Sqoop folder in the previous step.
Configure
Run the following command in Git Bash to configure Sqoop.
cd $SQOOP_HOME\\bin
./configure-sqoop
You may get the following warnings depends on whether you have installed the related frameworks in your machine.
Warning: F:\DataAnalytics\sqoop-1.4.7.bin__hadoop-2.6.0/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: F:\DataAnalytics\sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exi st! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: F:\DataAnalytics\sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exi st! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: F:\DataAnalytics\sqoop-1.4.7.bin__hadoop-2.6.0/../zookeeper does not ex ist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
Verify installation
Run the following command in Command Prompt to verify your installation:
%SQOOP_HOME%\bin\sqoop.cmd version
It should generate similar output as the following:
F:\DataAnalytics\sqoop-1.4.7.bin__hadoop-2.6.0\bin>%SQOOP_HOME%\bin\sqoop.cmd version
Warning: HBASE_HOME and HBASE_VERSION not set.
Warning: HCAT_HOME not set
Warning: HCATALOG_HOME does not exist HCatalog imports will fail.
Please set HCATALOG_HOME to the root of your HCatalog installation.
Warning: ACCUMULO_HOME not set.
Warning: ZOOKEEPER_HOME not set.
Warning: HBASE_HOME does not exist HBase imports will fail.
Please set HBASE_HOME to the root of your HBase installation.
Warning: ACCUMULO_HOME does not exist Accumulo imports will fail.
Please set ACCUMULO_HOME to the root of your Accumulo installation.
Warning: ZOOKEEPER_HOME does not exist Accumulo imports will fail.
Please set ZOOKEEPER_HOME to the root of your Zookeeper installation.
2018-04-22 23:55:56,197 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Sqoop 1.4.7
git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
Compiled by maugli on Thu Dec 21 15:59:58 STD 2017
Now, we have Sqoop installed in the same Windows machine of Hadoop.
Next step, I am going to show you how to use Sqoop to import data from RDBMS into HDFS.