Configure Sqoop in a Edge Node of Hadoop Cluster
- References
- Apache Sqoop
- SSH and firewall configuration
- Step-by-step guide
- Configure alias
- Configure firewall
- Create a new system user
- Configure ssh
- Configure environment variables
- Install Hadoop client in edge node
- Install Sqoop in the edge node
- Verify Sqoop in edge node
- Verify through command
- Use Sqoop to import data from RDBMS to HDFS
- Error debugs
- Sqoop requires a JDK that can compile Java code.
- Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
- Container is running beyond memory limits
- Error: java.lang.RuntimeException: java.lang.RuntimeException: com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host…
- Summary
This page continues with the following documentation about configuring a Hadoop multi-nodes cluster via adding a new edge node to configure administration or client tools.
In this page, I’m going to show you how to add a edge node into the previous cluster and then install Sqoop in that node. The architecture of the cluster looks like the following diagram:
* In the edge node, you can install other client tools or administration tools beside Sqoop. This page focuses on Sqoop and please stay tuned for guidance about configuring other tools.
References
Apache Sqoop
The steps for configuring Sqoop in a Edge node are very similar to configure it in a single node server.
Please refer the above page for more detailed steps.
SSH and firewall configuration
In my previous page about Configure Hadoop 3.1.0 in a Multi Node Cluster, it already documents every detailed information about how to setup firewall rules and SSH.
This document will only provides high-level guidance.
Step-by-step guide
Configure alias
Edit /etc/hosts in all the three servers to make sure the following configurations exist:
192.168.192.132 hdp-master
192.168.192.133 hdp-slave-1
192.168.192.134 hdp-edge-1
Configure firewall
Configure the internet interface of server hdp-edge-1 as ‘Internal Zone’.
Create a new system user
Use the following commands to create a new user in edge node server:
tangf@192-168-192-134:~> sudo useradd -m hadoop
We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:#1) Respect the privacy of others.
#2) Think before you type.
#3) With great power comes great responsibility.[sudo] password for root:
tangf@192-168-192-134:~> sudo passwd hadoop
New password:
BAD PASSWORD: it is based on a dictionary word
BAD PASSWORD: is too simple
Retype new password:
passwd: password updated successfully
Configure ssh
Enable ssh
Follow the same steps in the following link section to enable ssh in the edge node server:
Generate ssh key
In hdp-master and hdp-slave-1 server, use the following command to copy the public key generated previously to hdp-edge-1 server:
ssh-copy-id -i ~/.ssh/hdp-key.pub hadoop@hdp-edge-1
Verify
You can verify SSH configuration by running the following command in hdp-master:
hadoop@hdp-master:/home/tangf> ssh hadoop@hdp-edge-1 -i ~/.ssh/hdp-key
Have a lot of fun...
hadoop@hdp-edge-1:~>
Now, all the three servers can SSH to each other without providing the password.
Configure environment variables
Follow the following link section to configure environment variables in hdp-edge-1 (edge node server):
The commands you need to run are:
tangf@192-168-192-134:~> su hadoop
Password:
hadoop@hdp-edge-1:/home/tangf> whoami
hadoop
hadoop@hdp-edge-1:/home/tangf> vi ~/.profile
hadoop@hdp-edge-1:/home/tangf> vi ~/.bashrc
hadoop@hdp-edge-1:/home/tangf> source ~/.bashrc
hadoop@hdp-edge-1:/home/tangf> source ~/.profile
Install Hadoop client in edge node
Create Hadoop folder in the edge node
Run the following commands to create hadoop folder in the edge node server (hdp-egde-1):
tangf@192-168-192-134:/> su root
Password:
hdp-edge-1:/ # whoami
root
hdp-edge-1:/ # mkdir /hadoop
hdp-edge-1:/ # cd /hadoop
hdp-edge-1:/hadoop # chown -R hadoop /hadoop
hdp-edge-1:/hadoop #
Copy folder
Similarly to Create folder, run the following commands in hdp-master server to copy Hadoop files to the edge node server:
su hadoop
cd /
scp -rv -i "~/.ssh/hdp-key" /hadoop hadoop@hdp-edge-1:/
Install Sqoop in the edge node
Now, it is time to install Sqoop in the edge node server. The installation approach is similar to the following link though I’m going to use command line to download the binary package directly.
Download and unzip binary package
Run the following commands in the edge node server (hdp-edge-1) to download Sqoop binary package and unzip:
su hadoop
cd /hadoop
wget http://mirror.ventraip.net.au/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
tar -xvzf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
Configure environment variables
Edit ~/.profile and ~/.bashrc in the edge node server to add the following lines:
export SQOOP_HOME=/hadoop/sqoop-1.4.7.bin__hadoop-2.6.0
export PATH=${PATH}:${SQOOP_HOME}/bin
The commands are:
hadoop@hdp-edge-1:/hadoop> vi ~/.profile
hadoop@hdp-edge-1:/hadoop> vi ~/.bashrc
hadoop@hdp-edge-1:/hadoop> source ~/.profile
hadoop@hdp-edge-1:/hadoop> source ~/.bashrc
Run Sqoop config command
Run the following command in the edge node server:
hadoop@hdp-edge-1:/hadoop/sqoop-1.4.7.bin__hadoop-2.6.0> $SQOOP_HOME/bin/configure-sqoop
Warning: /hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
There are warning messages because I have not installed hbase, hcatalog, accumulo or zookeeper in this edge node.
Alternative approach- Build Sqoop
If you don’t want to use the pre-built version, you can build Sqoop. To do that, you need to install Apache Ant in the server.
Download the source code using the following command:
wget http://mirror.ventraip.net.au/apache/sqoop/1.4.7/sqoop-1.4.7.tar.gz
Ant can be installed using the following commands:
hadoop@hdp-edge-1:/hadoop> wget https://www.apache.org/dist/ant/binaries/apache-ant-1.10.3-bin.tar.gz
hadoop@hdp-edge-1:/hadoop> tar -xzvf apache-ant-1.10.3-bin.tar.gz
hadoop@hdp-edge-1:/hadoop>export ANT_OPTS="-Xmx256M"
hadoop@hdp-edge-1:/hadoop>export ANT_HOME=/hadoop/apache-ant-1.10.3/
hadoop@hdp-edge-1:/hadoop>export PATH=${ANT_HOME}/bin:${JAVA_HOME}/bin:${PATH}
Once Ant is installed, run the following commands to build Sqoop:
ant package
Wait until the build is completed.
Once the build is completed, please changed the environment variables accordingly:
export SQOOP_HOME=/hadoop/sqoop-1.4.7
export PATH=${PATH}:${SQOOP_HOME}/bin
Verify Sqoop in edge node
Verify through command
Run the following command in the edge server to verify Sqoop:
hadoop@hdp-edge-1:/hadoop/sqoop-1.4.7.bin__hadoop-2.6.0> sqoop version
Warning: /hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
2018-05-13 01:17:27,372 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Sqoop 1.4.7
git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
Compiled by maugli on Thu Dec 21 15:59:58 STD 2017
Use Sqoop to import data from RDBMS to HDFS
Create the script file
To very Sqoop, I am going to run the following command:
The command is extracted from Load Data into HDFS from SQL Server via Sqoop.
Create a shell script file named run_sqoop_example.sh with the following content:
sqoop import --connect "jdbc:sqlserver://192.168.192.1;databaseName=Mercury_Db" \
--username zeppelin \
--password zeppelin \
--query "SELECT [BlogId], [SubTitle], [Title], [UniqueName] FROM [context].[Blogs] WHERE \$CONDITIONS" \
--split-by BlogId \
--columns "BlogId,SubTitle,Title,UniqueName" \
--num-mappers 2 \
--map-column-java "BlogId=Integer,SubTitle=String,Title=String,UniqueName=String" \
--target-dir "sqoop_test/blogs" \
--delete-target-dir \
--as-parquetfile
The commands to create this file:
hadoop@hdp-edge-1:/hadoop> touch run_sqoop_example.sh
hadoop@hdp-edge-1:/hadoop> vi run_sqoop_example.sh
hadoop@hdp-edge-1:/hadoop> chmod +x run_sqoop_example.sh
Run the script file
Run the script file to load data from SQL Server to HDFS.
hadoop@hdp-edge-1:/hadoop> ./run_sqoop_example.sh
While it is executing, you can monitor the job through YARN web UI (the following screenshots are just samples):
The standard output from the above command line:
hadoop@hdp-edge-1:/hadoop> ./run_sqoop_example.sh
Warning: /hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
2018-05-13 15:31:15,055 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
2018-05-13 15:31:15,087 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
2018-05-13 15:31:15,184 INFO manager.SqlManager: Using default fetchSize of 1000
2018-05-13 15:31:15,184 INFO tool.CodeGenTool: Beginning code generation
2018-05-13 15:31:15,565 INFO manager.SqlManager: Executing SQL statement: SELECT [BlogId], [SubTitle], [Title], [UniqueName] FROM [context].[Blogs] WHERE (1 = 0)
2018-05-13 15:31:15,665 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop/hadoop-3.1.0
Note: /tmp/sqoop-hadoop/compile/b954d25164a27a937bfddc85ae230d67/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2018-05-13 15:31:20,950 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/b954d25164a27a937bfddc85ae230d67/QueryResult.jar
2018-05-13 15:31:22,396 INFO tool.ImportTool: Destination directory sqoop_test/blogs deleted.
2018-05-13 15:31:22,398 INFO mapreduce.ImportJobBase: Beginning query import.
2018-05-13 15:31:22,399 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2018-05-13 15:31:22,430 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
2018-05-13 15:31:22,527 INFO manager.SqlManager: Executing SQL statement: SELECT [BlogId], [SubTitle], [Title], [UniqueName] FROM [context].[Blogs] WHERE (1 = 0)
2018-05-13 15:31:22,529 INFO manager.SqlManager: Executing SQL statement: SELECT [BlogId], [SubTitle], [Title], [UniqueName] FROM [context].[Blogs] WHERE (1 = 0)
2018-05-13 15:31:22,748 WARN spi.Registration: Not loading URI patterns in org.kitesdk.data.spi.hive.Loader
2018-05-13 15:31:23,237 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2018-05-13 15:31:23,464 INFO client.RMProxy: Connecting to ResourceManager at hdp-master/192.168.192.132:8032
2018-05-13 15:31:24,165 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1526188414553_0003
2018-05-13 15:31:27,445 INFO db.DBInputFormat: Using read commited transaction isolation
2018-05-13 15:31:27,445 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(BlogId), MAX(BlogId) FROM (SELECT [BlogId], [SubTitle], [Title], [UniqueName] FROM [context].[Blogs] WHERE (1 = 1) ) AS t1
2018-05-13 15:31:27,455 INFO db.IntegerSplitter: Split size: 2; Num splits: 2 from: 100 to: 104
2018-05-13 15:31:27,572 INFO mapreduce.JobSubmitter: number of splits:2
2018-05-13 15:31:27,795 INFO Configuration.deprecation: mapred.task.timeout is deprecated. Instead, use mapreduce.task.timeout
2018-05-13 15:31:27,796 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-05-13 15:31:28,218 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1526188414553_0003
2018-05-13 15:31:28,220 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-05-13 15:31:28,675 INFO conf.Configuration: resource-types.xml not found
2018-05-13 15:31:28,675 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-05-13 15:31:28,811 INFO impl.YarnClientImpl: Submitted application application_1526188414553_0003
2018-05-13 15:31:28,852 INFO mapreduce.Job: The url to track the job: http://hdp-master:8088/proxy/application_1526188414553_0003/
2018-05-13 15:31:28,853 INFO mapreduce.Job: Running job: job_1526188414553_0003
2018-05-13 15:31:34,973 INFO mapreduce.Job: Job job_1526188414553_0003 running in uber mode : false
2018-05-13 15:31:34,993 INFO mapreduce.Job: map 0% reduce 0%
2018-05-13 15:31:43,087 INFO mapreduce.Job: map 50% reduce 0%
2018-05-13 15:31:51,153 INFO mapreduce.Job: map 100% reduce 0%
2018-05-13 15:31:52,176 INFO mapreduce.Job: Job job_1526188414553_0003 completed successfully
2018-05-13 15:31:52,325 INFO mapreduce.Job: Counters: 32
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=448572
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=14859
HDFS: Number of bytes written=7344
HDFS: Number of read operations=100
HDFS: Number of large read operations=0
HDFS: Number of write operations=20
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=22902
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=11451
Total vcore-milliseconds taken by all map tasks=11451
Total megabyte-milliseconds taken by all map tasks=12046452
Map-Reduce Framework
Map input records=5
Map output records=5
Input split bytes=213
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=254
CPU time spent (ms)=2490
Physical memory (bytes) snapshot=313700352
Virtual memory (bytes) snapshot=5118722048
Total committed heap usage (bytes)=65142784
Peak Map Physical memory (bytes)=165302272
Peak Map Virtual memory (bytes)=2559361024
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
2018-05-13 15:31:52,331 INFO mapreduce.ImportJobBase: Transferred 7.1719 KB in 29.0861 seconds (252.4916 bytes/sec)
2018-05-13 15:31:52,335 INFO mapreduce.ImportJobBase: Retrieved 5 records.
Verify in HDFS
Run the following commands in any of these three servers to verify:
hadoop@hdp-edge-1:~> hdfs dfs -ls /user/hadoop/sqoop_test/blogs
Found 4 items
drwxr-xr-x - hadoop supergroup 0 2018-05-13 15:31 /user/hadoop/sqoop_test/blogs/.metadata
drwxr-xr-x - hadoop supergroup 0 2018-05-13 15:31 /user/hadoop/sqoop_test/blogs/.signals
-rw-r--r-- 2 hadoop supergroup 1750 2018-05-13 15:31 /user/hadoop/sqoop_test/blogs/7249d8a1-4464-4303-bfdd-9258a992be0a.parquet
-rw-r--r-- 2 hadoop supergroup 2182 2018-05-13 15:31 /user/hadoop/sqoop_test/blogs/a6d83f56-5f46-45a7-9d74-5fced4d6a0a5.parquet
As we can see there are two parquet files created (since we are using two map tasks when running the Sqoop command line.)
Error debugs
Sqoop requires a JDK that can compile Java code.
If you get the following error when running the above sample script, you need to install JDK in your server.
2018-05-13 11:25:47,110 ERROR orm.CompilationManager: It seems as though you are running sqoop with a JRE.
2018-05-13 11:25:47,110 ERROR orm.CompilationManager: Sqoop requires a JDK that can compile Java code.
2018-05-13 11:25:47,110 ERROR orm.CompilationManager: Please install a JDK and set $JAVA_HOME to use it.
2018-05-13 11:25:47,111 ERROR tool.ImportTool: Import failed: java.io.IOException: Could not start Java compiler.
at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:196)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:501)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
To install JDK, download from the following page:
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
And then upload the file into the /hadoop folder of the edge server.
Run the following commands to install:
hadoop@hdp-edge-1:/hadoop> tar -xvzf jdk-8u171-linux-x64.tar.gz
hadoop@hdp-edge-1:/hadoop/jdk1.8.0_171> which java
/usr/bin/java
hadoop@hdp-edge-1:/hadoop/jdk1.8.0_171> which javaws
/usr/bin/javaws
hadoop@hdp-edge-1:/hadoop/jdk1.8.0_171> sudo chmod -x /usr/bin/java*
[sudo] password for root:hadoop@hdp-edge-1:/hadoop/jdk1.8.0_171> cd ..
hadoop@hdp-edge-1:/hadoop> sudo mv jdk1.8.0_171 /opt/
hadoop@hdp-edge-1:/opt> sudo ln -s jdk1.8.0_171 java
hadoop@hdp-edge-1:/opt> export PATH=$PATH:/opt/java/bin
hadoop@hdp-edge-1:/opt> which java
/opt/java/bin/java
hadoop@hdp-edge-1:/opt> java -version
java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)hadoop@hdp-edge-1:/hadoop> export JAVA_HOME=/opt/java
To make JAVA_HOME environment permanent, please ~/.profile and ~/.bashrc in the edge node server to add the following lines:
export JAVA_HOME=/opt/java
export PATH=${PATH}:${JAVA_HOME}/bin
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
Make sure the following settings exists in mapred-site.xml:
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/hadoop/hadoop-3.1.0</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/hadoop/hadoop-3.1.0</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/hadoop/hadoop-3.1.0</value>
</property>
The path value is the Hadoop distribution path.
Container is running beyond memory limits
To resolve this issue, you need configure the memory allocation related parameters in YARN (yarn-site.xml).
For example, the following YARN configuration is based on the environment I am using:
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>526</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4086</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4.1</value>
</property>
If you want to understand more about these resource configurations, please refer to Configure YARN and MapReduce Resources in Hadoop Cluster .
Error: java.lang.RuntimeException: java.lang.RuntimeException: com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host…
You may encounter the following error if your node servers have no access to the RSBMS server specified:
Error: java.lang.RuntimeException: java.lang.RuntimeException: com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host 192.168.192.1, port 1433 has failed. Error: "Connection timed out: no further information. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
at org.apache.sqoop.mapreduce.db.DBInputFormat.setDbConf(DBInputFormat.java:170)
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:161)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Because all the map and reduce tasks run in the node servers, you need to ensure the firewall for the corresponding IP and port is open.
For my case, I just need to configure one inbound firewall rule in my SQL Server machine:
Summary
Sqoop means SQL t Hadoop. Once it is installed in the edge node server, we can use it to import data from RDBMS to HDFS, HIVE and etc., vice versa.
Sqoop is using MapReduce to submit these jobs to YARN and then YARN allocate resources to execute the application. This kind of suggests that Sqoop will utilize the resources in your cluster.
Have a lot of fun with Sqoop.