Load CSV into Teradata via TPT

access_time 9 days ago visibility18 comment 0

TPT (Teradata Parallel Transporter) is a client suit that support multiple instances of data extracting, loading, and updating functions in a scalable, high-speed, parallel-processing environment. It utilize other client tools like FastLoad, FastExport, MultiLoad, etc.  It is written with Java thus can run in most of the operating systems incl. Windows. On Windows, it also provides wizard GUI tools to help create TPT jobs easily.

This article provides examples of using TPT to load CSV/TSV files into Teradata. It also provides some resolutions to some common errors.

Teradata environment

Follow the following series to install Teradata and utility tools if you don't have a Teradata environment to work with.

Install Teradata Tools & Utilities

Ensure all utilities are installed. 

Create a sample CSV file

Create a sample CSV file named test_tpt.csv with the following content:

ID, Attr, Value, CreatedDate, Comments
1,A,10,2020-09-13,'This is a comment with ,'
2,B,20,2020-09-13,This is a comment without comma.
3,C,30,2020-09-13,This is a comment without comma.
4,D,40,2020-09-13,This is a comment without comma.

The file contains 5 attributes that need to be loaded into Teradata.

Save the file using ASCII encoding. 

Create the destination table in Teradata

The table needs to exist in Teradata first before it can be loaded. Run the following SQL statement to create the table:

CREATE SET TABLE TestDb.TPT_TEST_CSV
(
ID VARCHAR(10) NOT NULL,
"Attr" VARCHAR(1),
"Value" VARCHAR(10),
CreatedDate VARCHAR(10),
Comments VARCHAR(255)
)
PRIMARY INDEX(ID);

Create TPT job using wizard

Now we can start to create TPT job using wizards.

1) Open tool Teradata Parallel Transporter Wizard.

It will open two windows: Command Prompt window and TPT Java GUI window as the following screenshots shows:

2) Click menu Edit and then New to start Job Wizard.

3) Input Job details.

  • Job Name: LOAD_TPT_TEST_CSV
  • Description: Job to load tpt_test.csv file into Teradata TestDb database.
  • Charset: ASCII (since we are not loading files with international characters).

4) Click Next button.

5) Specify source details accordingly.

  • Source type: File
  • File Specification: Format = DELIMITED;  Delimiter: ,


6) Click Next button.

7) Define columns accordingly. As the source is delimited file, only Varchar type can be implemented here. Ensure the size is big enough to store all the values in the source file.


8) Click Next button.

9) Config job destination details accordingly.

  • Destination Type: Teradata Table
  • Input Teradata connection details:

infoIn enterprise environment, authentication mechanism is usually required to be LDAP for better security. Please change the connection details accordingly. 
  • Select target table as the one created previously.


10) Select operator type. For this case, select Load Operator. For this operator, the destination table must be empty when the job starts otherwise the job will fail.

infoTPT tool is very powerful to support UPSERT and Stream operators too. These options can be used to stream data into Teradata table from a file that is constantly being updated and or can be used to update existing records or insert new records only. 

11) Click Next button.

12) Map columns from source to target:


warning The data types must be matching exactly otherwise you won't be able to complete this step.

13) Click Next button.

14) Finish job design. Untick 'Run Job Now' checkbox. 


15) Click Finish button to complete the job creation wizard. 

Run the job

In TPT main window, you will be able to see the job created. You can further edit the job too. 


1) Click button Run job to run the job.

2) Input an instance job name:

3) Click OK button to submit the job.

4) Wait for the job to be completed. For this case, the job will fail due the following error:

TPT10508: RDBMS error 3707: Syntax error, expected something like a name or a Unicode delimited identifier or an 'UDFCALLNAME' keyword between ',' and the 'Value' keyword.

The root cause is because the generated INSERT statement is not valid as there is keywords in table column names:

'INSERT INTO TestDb.TPT_TEST_CSV (ID,Attr,Value,CreatedDate,Comments) VALUES (:Col_ID,:Col_Attr,:Col_Value,:Col_CreatedDate,:Col_Comments);'

To fix this error, we need to edit the job.

Fix the job

1) Create another table using the following statement:

CREATE SET TABLE TestDb.TPT_TEST_CSV_2
(
ID VARCHAR(10) NOT NULL,
Col_Attr VARCHAR(1),
Col_Value VARCHAR(10),
CreatedDate VARCHAR(10),
Comments VARCHAR(255)
)
PRIMARY INDEX(ID);
2) Click Edit button in TPT window.
3) Navigate to step 'Job Destination' and select the newly created table:
4) Click Next button till the step of column mapping.
5) Map columns accordingly.

6) Click Finish button to save all the changes.

Run job again

Now we can run the job again.
1) Click Run button to submit job and input it a job name:
2) Click OK button to submit the job.
3) Wait for the job to be completed. For this case, the job will fail again due the following error:
W_0_o_LOAD_TPT_TEST_CSV[1]: TPT19134 !ERROR! Fatal data error processing file 'F:\tmp\test_tpt.csv'. Delimited Data Parsing error: Column length overflow(s) in row 1.

The above error occurred as the CSV file has a header line. We need to either remove the header line from the file or let TPT know to skip the header line.

Skip header line in CSV file

1) Click submit button again

2) Input a job name.

3) Before submit the job add an attribute in Producer to skip header line:

INTEGER SkipRows=1

The job fails with a new error:

W_0_o_LOAD_TPT_TEST_CSV[1]: TPT19134 !ERROR! Fatal data error processing file 'F:\tmp\test_tpt.csv'. Delimited Data Parsing error: Too many columns in row 2.

The error occurred because there is a delimiter in the second row of the CSV file.

Skip delimiter in double quotes

To resolve the issue, we just need to pass in additional attributes when submit the jobs.

1) Resubmit the third job run.

2) Ensure the following attributes are added to the Producer operator:

VARCHAR OpenQuoteMark = '"',
VARCHAR CloseQuoteMark = '"',
VARCHAR QuotedData='Optional'

3) Wait until the job is completed.

This time the job will complete successfully as the following screenshot shows:

Verify data in database

Run the following statement to verify the data:

SEL * FROM TestDb.TPT_TEST_CSV_2;

The output looks like the following:

The four records are loaded into database successfully. 

TPT script

The following is the generated TPT script by the wizard with changes we made to resolve errors:

USING CHARACTER SET ASCII
DEFINE JOB LOAD_TPT_TEST_CSV
DESCRIPTION 'Job to load tpt_test.csv file into Teradata TestDb database.'
(
	DEFINE OPERATOR W_1_o_LOAD_TPT_TEST_CSV
	TYPE LOAD
	SCHEMA *
	ATTRIBUTES
	(
		VARCHAR UserName, 
		VARCHAR UserPassword, 
		VARCHAR LogonMech, 
		VARCHAR LogonMechData, 
		VARCHAR LogTable, 
		VARCHAR TargetTable, 
		INTEGER BufferSize, 
		INTEGER ErrorLimit, 
		INTEGER MaxSessions, 
		INTEGER MinSessions, 
		INTEGER TenacityHours, 
		INTEGER TenacitySleep, 
		VARCHAR AccountID, 
		VARCHAR DateForm, 
		VARCHAR ErrorTable1, 
		VARCHAR ErrorTable2, 
		VARCHAR NotifyExit, 
		VARCHAR NotifyExitIsDLL, 
		VARCHAR NotifyLevel, 
		VARCHAR NotifyMethod, 
		VARCHAR NotifyString, 
		VARCHAR PauseAcq, 
		VARCHAR PrivateLogName, 
		VARCHAR TdpId, 
		VARCHAR TraceLevel, 
		VARCHAR WorkingDatabase
	);

	DEFINE SCHEMA W_0_s_LOAD_TPT_TEST_CSV
	(
		Col_ID VARCHAR(10),
		Col_Attr VARCHAR(1),
		Col_Value VARCHAR(10),
		Col_CreatedDate VARCHAR(10),
		Col_Comments VARCHAR(255)
	);

	DEFINE OPERATOR W_0_o_LOAD_TPT_TEST_CSV
	TYPE DATACONNECTOR PRODUCER
	SCHEMA W_0_s_LOAD_TPT_TEST_CSV
	ATTRIBUTES
	(
		VARCHAR FileName, 
		VARCHAR Format, 
		VARCHAR OpenMode, 
		INTEGER BlockSize, 
		INTEGER BufferSize, 
		INTEGER RetentionPeriod, 
		INTEGER RowsPerInstance, 
		INTEGER SecondarySpace, 
		INTEGER UnitCount, 
		INTEGER VigilElapsedTime, 
		INTEGER VigilWaitTime, 
		INTEGER VolumeCount, 
		VARCHAR AccessModuleName, 
		VARCHAR AccessModuleInitStr, 
		VARCHAR DirectoryPath, 
		VARCHAR ExpirationDate, 
		VARCHAR IndicatorMode, 
		VARCHAR PrimarySpace, 
		VARCHAR PrivateLogName, 
		VARCHAR RecordFormat, 
		VARCHAR RecordLength, 
		VARCHAR SpaceUnit, 
		VARCHAR TextDelimiter, 
		VARCHAR VigilNoticeFileName, 
		VARCHAR VigilStartTime, 
		VARCHAR VigilStopTime, 
		VARCHAR VolSerNumber, 
		VARCHAR UnitType
,
		INTEGER SkipRows=1,
		 VARCHAR OpenQuoteMark = '"',
		 VARCHAR CloseQuoteMark = '"',
		 VARCHAR QuotedData='Optional'
	);

	APPLY
		(
			'INSERT INTO TestDb.TPT_TEST_CSV_2 (ID,Col_Attr,Col_Value,CreatedDate,Comments) VALUES (:Col_ID,:Col_Attr,:Col_Value,:Col_CreatedDate,:Col_Comments);'
		)
	TO OPERATOR
	(
		W_1_o_LOAD_TPT_TEST_CSV[1]

		ATTRIBUTES
		(
			UserName = 'dbc', 
			UserPassword = 'dbc', 
			LogonMech = 'TD2', 
			LogTable = 'TestDb.TPT_TEST_CSV_2_log', 
			TargetTable = 'TestDb.TPT_TEST_CSV_2', 
			TdpId = '192.168.119.128'
		)
	)
	SELECT * FROM OPERATOR
	(
		W_0_o_LOAD_TPT_TEST_CSV[1]

		ATTRIBUTES
		(
			FileName = 'test_tpt.csv', 
			Format = 'DELIMITED', 
			OpenMode = 'Read', 
			DirectoryPath = 'F:\tmp', 
			IndicatorMode = 'N', 
			TextDelimiter = ','
		)
	);
);
With TPT scripts, we can directly run the job using tbuild command line.
1) Save the script as file load_tpt_csv.tpt.
2) Run the following command to run the job:
tbuild -f load_tpt_csv.tpt

The command line can be scheduled to run regularly.

info Last modified by Administrator at 3 days ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Kontext Column

Created for everyone to publish data, programming and cloud related articles.
Follow three steps to create your columns.


Learn more arrow_forward

More from Kontext

local_offer teradata local_offer SQL

visibility 9
thumb_up 0
access_time 2 days ago

This article demonstrates how to create volatile table in a Teradata procedure, perform DML actions (INSERT, DELETE, UPDATE) against it and then return the result set dynamically from the temporary table in the procedure.

local_offer teradata local_offer python local_offer Java local_offer python-database

visibility 919
thumb_up 0
access_time 6 months ago

Python JayDeBeApi module allows you to connect from Python to Teradata databases using Java JDBC drivers. In article Connect to Teradata database through Python , I showed ho...

local_offer teradata local_offer C# local_offer .NET

visibility 7747
thumb_up 0
access_time 6 years ago

In this post, I will demonstrate how to connect to Teradata database via .NET Data Provider for Teradata using C#. Prerequisites Install the .NET Data Provider for Teradata from the following link: ...

About column

Teradata

Tutorials and information about Teradata.

rss_feed Subscribe RSS