In my article Connect to Teradata database through Python, I demonstrated about how to use Teradata python package or Teradata ODBC driver to connect to Teradata. In this article, I’m going to show you how to connect to Teradata through JDBC drivers so that you can load data directly into PySpark data frames.

PySpark code

Create a PySpark script file named teradata-jdbc.py with the following code:

from pyspark.sql import SparkSession

appName = "PySpark Teradata Example"
master = "local"

# Create Spark session
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .getOrCreate()

driver = 'com.teradata.jdbc.TeraDriver'

# Define the function to load data from Teradata


def load_data(driver, jdbc_url, sql, user, password):
    return spark.read \
        .format('jdbc') \
        .option('driver', driver) \
        .option('url', jdbc_url) \
        .option('dbtable', '({sql}) as src'.format(sql=sql)) \
        .option('user', user) \
        .option('password', password) \
        .load()

sql = "select * from mydb.mytable"
url = "jdbc:teradata://myserver/Database=mydb,LOGMECH=LDAP"
user = "dbc"
password = "dbc"

df_td = load_data(driver,url,sql,user,password)
df_td.show(10)

Some details about the code snippets

In the above example, JDBC connection string is configured to use LDAP as login mechanism. You can also change it to TD2 so that you can use a Teradata database username and password to connect.

Depends on the version of your Spark, you may be able to directly use query parameter to pass in your SQL query instead of dbtable. query and dbtable parameters cannot be specified at the same time. In lower version of Spark, you can pass in your SQL as a subquery as I did in the above examples.

Run the code

Now you can run the code with the follow command in Spark:

spark2-submit --jars 'your/path/to/teradata/jdbc/drivers/*' teradata-jdbc.py

You need to specify the JARs for Teradata JDBC drivers if you have not done that in your Spark configurations. Two JARs are required:

  • tdgssconfig.jar
  • terajdbc4.jar

You can also use different version of Teradata JDBC drivers.

Run the code in a cluster

If you are going to run the code in a cluster or workflow tools like Oozie, you can copy these JAR files into HDFS and then pass in the library path or jars arguments as HDFS file paths. In this way, all the workloads can load Teradata JDBC drivers successfully.

SQL Server, Oracle, MySQL…

You can also use similar approach to connect to your SQL Server, Oracle, MySQL or any other JDBC supported databases.

Have fun!

info Last modified by Raymond at 12 months ago * This page is subject to Site terms.

More from Kontext

local_offer teradata local_offer python

visibility 198
thumb_up 0
access_time 1 month ago

Pandas is commonly used by Python users to perform data operations. In many scenarios, the results need to be saved to a storage like Teradata. This article shows you how to do that easily using JayDeBeApi or  ...

open_in_new View open_in_new Spark + PySpark

local_offer teradata local_offer SQL

visibility 29
thumb_up 0
access_time 2 months ago

In SQL Server, we can use TRUNCATE statement to clear all the records in a table and it usually performs better compared with DELETE statements as no transaction log for each individual row deletion. The syntax looks like the following: TRUNCATE TABLE { database_name.schema_name.tab...

open_in_new View open_in_new Code snippets

local_offer teradata local_offer python local_offer Java

visibility 127
thumb_up 0
access_time 2 months ago

Python JayDeBeApi module allows you to connect from Python to Teradata databases using Java JDBC drivers. In article Connect to Teradata database through Python , I showed ho...

open_in_new View open_in_new Python Programming

local_offer teradata local_offer SQL

visibility 124
thumb_up 0
access_time 2 months ago

OREPLACE functions in Teradata can be used to replace or remove characters from a string. OREPACE is Teradata's extension to ASNI SQL. The usual REPLACE function is not available. ANSI SQL REPLACE function REPLACE function is commonly implemented in many other SQL databases such as ...

open_in_new View open_in_new Code snippets

info About author

Dark theme mode

Dark theme mode is available on Kontext.

Learn more arrow_forward

Kontext Column

Created for everyone to publish data, programming and cloud related articles. Follow three steps to create your columns.


Learn more arrow_forward