By using this site, you acknowledge that you have read and understand our Cookie policy, Privacy policy and Terms .

In one of my previous articles about Password Security Solution for Sqoop, I mentioned creating credential using hadoop credential command. The credentials are stored in JavaKeyStoreProvider. Credential providers are used to separate the use of sensitive tokens, secrets and passwords from the details of their storage and management.

The following command lines create a credential named mydatabase.password in both local JCEKS file and also in HDFS.

#Store the password in HDFS

hadoop credential create mydatabase.password -provider jceks://hdfs/user/hue/mypwd.jceks

# Store the password locally

hadoop credential create mydatabase.password -provider jceks://file/home/user/mypwd.jceks

For running jobs in clusters like YARN, it is important to create the credential in HDFS so that it can be accessed by all worker nodes in the cluster.

Once the credential is created, you can easily use it in Sqoop by passing in the credential name as parameter. However, if you want to access the credential in Spark, what should you do? If you are using Scala, you can easily reference the Hadoop java libraries for credential. However, if you use Python as programming language, it won’t be that straightforward.

Sample code to retrieve Hadoop credential in PySpark

from pyspark.sql import SparkSession

appName = "PySpark Hadoop Credential Example"
master = "local"

# Create Spark session
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .getOrCreate()

# Replace the credential provider path accordingly
credential_provider_path = 'jceks://hdfs/user/hue/.jceks' 
credential_name = 'mydatabase.password'

# Retrive credential/password from Hadoop credential
conf = spark.sparkContext._jsc.hadoopConfiguration()
conf.set('hadoop.security.credential.provider.path',credential_provider_path)
credential_raw = conf.getPassword(credential_name)
credential_str = ''
for i in range(credential_raw.__len__()):
    credential_str = credential_str + str(credential_raw.__getitem__(i))

# Now you can use credential_str, for example, use it as database password in JDBC to load data from databases into Spark data frame.

Access to the credential provider file

Anyone who has access to your credential provider file can also use the same approach to retrieve the credential value from the provider. So it is important to manage the access to the credential file so that only allowed users can access it.

More details about Hadoop credential API

Refer to the official page to learn more about Hadoop credential APIs: CredentialProvider API Guide.

info Last modified by Raymond at 10 months ago * This page is subject to Site terms.

More from Kontext

Pandas DataFrame Plot - Scatter and Hexbin Chart

local_offer plot local_offer pandas local_offer jupyter-notebook local_offer python

visibility 7
thumb_up 0
access_time 4 days ago

 In this article I'm going to show you some examples about plotting scatter and hexbin chart with Pandas DataFrame. I'm using Jupyter Notebook as IDE/code execution environment.  Hexbin chart &nbs...

open_in_new View open_in_new Code snippets

Pandas DataFrame Plot - Area Chart

local_offer plot local_offer jupyter-notebook local_offer python local_offer pandas

visibility 3
thumb_up 0
access_time 4 days ago

This article provides examples about plotting area chart using  pandas.DataFrame.plot  or  pandas.core.groupby.DataFrameGroupBy.plot   function. ...

open_in_new View open_in_new Code snippets

Pandas DataFrame Plot - Pie Chart

local_offer plot local_offer pandas local_offer jupyter-notebook local_offer python

visibility 9
thumb_up 0
access_time 4 days ago

This article provides examples about plotting pie chart using  pandas.DataFrame.plot  function. Prerequisites The data I'm going to use is the same as the other article  ...

open_in_new View open_in_new Code snippets

local_offer python

visibility 9
thumb_up 0
access_time 4 days ago

In my previous article about  Convert string to date in Python / Spark , I showed how to use Spark udf to conver...

open_in_new View open_in_new Code snippets

info About author

Kontext dark theme mode

Dark theme mode

Dark theme mode is available on Kontext.

Learn more arrow_forward
Kontext Column

Kontext Column

Created for everyone to publish data, programming and cloud related articles. Follow three steps to create your columns.

Learn more arrow_forward
info Follow us on Twitter to get the latest article updates. Follow us