Secure Password Protection for Sqoop Jobs
insights Stats
Apache Sqoop, a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
Password and other secrets need to be protected properly when running Sqoop jobs. In Sqoop, there are multiple approaches to pass in passwords for connecting to RDBMS. Each of them provides different security level.
Options
Option 1 - clear password through --password argument
sqoop [subcommand] --username user --password pwd
This is the weakest approach as password is exposed directly in the command line.
Option 2 - interactive password through -P argument
sqoop [subcommand] --username user -P
Password needs to be manually input interactively. You cannot use this approach to schedule the job.
Option 3 - storing password in file through --password-file argument
sqoop [subcommand] --username user --password-file mypasswordfile.path
Password is still clearly stored in a file which is weak though better than option 1.
Recommended option - Hadoop credential
From Hadoop 2.2.0, we can use hadoop credential command to create password alias and then use it in Sqoop or other tools.
Generate the password in Java key store
Java key store is one of the supported providers.
#Store the password in HDFS
hadoop credential create mydatabase.password -provider jceks://hdfs/user/hue/mypwd.jceks
# Store the password locally
hadoop credential create mydatabase.password -provider jceks://file/home/user/mypwd.jceks
Use the password alias
sqoop [subcommand] \
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/hue/mypwd.jceks\
--verbose \
--username user \
--password-alias mydatabase.password \….
In this way, clear password is not exposed directly.