Load CSV into Teradata via TPT

Raymond Raymond event 2020-09-13 visibility 4,929
more_vert

TPT (Teradata Parallel Transporter) is a client suit that support multiple instances of data extracting, loading, and updating functions in a scalable, high-speed, parallel-processing environment. It utilize other client tools like FastLoad, FastExport, MultiLoad, etc.  It is written with Java thus can run in most of the operating systems incl. Windows. On Windows, it also provides wizard GUI tools to help create TPT jobs easily.

This article provides examples of using TPT to load CSV/TSV files into Teradata. It also provides some resolutions to some common errors.

Teradata environment

Follow the following series to install Teradata and utility tools if you don't have a Teradata environment to work with.

Install Teradata Tools & Utilities

Ensure all utilities are installed. 

Create a sample CSV file

Create a sample CSV file named test_tpt.csv with the following content:

ID, Attr, Value, CreatedDate, Comments
1,A,10,2020-09-13,'This is a comment with ,'
2,B,20,2020-09-13,This is a comment without comma.
3,C,30,2020-09-13,This is a comment without comma.
4,D,40,2020-09-13,This is a comment without comma.

The file contains 5 attributes that need to be loaded into Teradata.

Save the file using ASCII encoding. 

Create the destination table in Teradata

The table needs to exist in Teradata first before it can be loaded. Run the following SQL statement to create the table:

CREATE SET TABLE TestDb.TPT_TEST_CSV
(
ID VARCHAR(10) NOT NULL,
"Attr" VARCHAR(1),
"Value" VARCHAR(10),
CreatedDate VARCHAR(10),
Comments VARCHAR(255)
)
PRIMARY INDEX(ID);

Create TPT job using wizard

Now we can start to create TPT job using wizards.

1) Open tool Teradata Parallel Transporter Wizard.

It will open two windows: Command Prompt window and TPT Java GUI window as the following screenshots shows:

2020091311643-image.png

2020091311658-image.png

2) Click menu Edit and then New to start Job Wizard.

2020091311748-image.png

3) Input Job details.

  • Job Name: LOAD_TPT_TEST_CSV
  • Description: Job to load tpt_test.csv file into Teradata TestDb database.
  • Charset: ASCII (since we are not loading files with international characters).

2020091312008-image.png

4) Click Next button.

5) Specify source details accordingly.

  • Source type: File
  • File Specification: Format = DELIMITED;  Delimiter: ,

2020091312103-image.png

6) Click Next button.

7) Define columns accordingly. As the source is delimited file, only Varchar type can be implemented here. Ensure the size is big enough to store all the values in the source file.

2020091320345-image.png

8) Click Next button.

9) Config job destination details accordingly.

  • Destination Type: Teradata Table
  • Input Teradata connection details:

2020091312553-image.png

infoIn enterprise environment, authentication mechanism is usually required to be LDAP for better security. Please change the connection details accordingly. 
  • Select target table as the one created previously.

2020091314221-image.png

10) Select operator type. For this case, select Load Operator. For this operator, the destination table must be empty when the job starts otherwise the job will fail.

2020091312903-image.png

infoTPT tool is very powerful to support UPSERT and Stream operators too. These options can be used to stream data into Teradata table from a file that is constantly being updated and or can be used to update existing records or insert new records only. 

11) Click Next button.

12) Map columns from source to target:

2020091320511-image.png

warning The data types must be matching exactly otherwise you won't be able to complete this step.

13) Click Next button.

14) Finish job design. Untick 'Run Job Now' checkbox. 

2020091320634-image.png

15) Click Finish button to complete the job creation wizard. 

Run the job

In TPT main window, you will be able to see the job created. You can further edit the job too. 

2020091320803-image.png

1) Click button Run job to run the job.

2) Input an instance job name:

2020091314800-image.png

3) Click OK button to submit the job.

4) Wait for the job to be completed. For this case, the job will fail due the following error:

TPT10508: RDBMS error 3707: Syntax error, expected something like a name or a Unicode delimited identifier or an 'UDFCALLNAME' keyword between ',' and the 'Value' keyword.

The root cause is because the generated INSERT statement is not valid as there is keywords in table column names:

'INSERT INTO TestDb.TPT_TEST_CSV (ID,Attr,Value,CreatedDate,Comments) VALUES (:Col_ID,:Col_Attr,:Col_Value,:Col_CreatedDate,:Col_Comments);'

To fix this error, we need to edit the job.

Fix the job

1) Create another table using the following statement:

CREATE SET TABLE TestDb.TPT_TEST_CSV_2
(
ID VARCHAR(10) NOT NULL,
Col_Attr VARCHAR(1),
Col_Value VARCHAR(10),
CreatedDate VARCHAR(10),
Comments VARCHAR(255)
)
PRIMARY INDEX(ID);
2) Click Edit button in TPT window.
3) Navigate to step 'Job Destination' and select the newly created table:
2020091315745-image.png
4) Click Next button till the step of column mapping.
5) Map columns accordingly.
2020091321316-image.png
6) Click Finish button to save all the changes.

Run job again

Now we can run the job again.
1) Click Run button to submit job and input it a job name:
2020091320030-image.png
2) Click OK button to submit the job.
3) Wait for the job to be completed. For this case, the job will fail again due the following error:
W_0_o_LOAD_TPT_TEST_CSV[1]: TPT19134 !ERROR! Fatal data error processing file 'F:\tmp\test_tpt.csv'. Delimited Data Parsing error: Column length overflow(s) in row 1.

The above error occurred as the CSV file has a header line. We need to either remove the header line from the file or let TPT know to skip the header line.

Skip header line in CSV file

1) Click submit button again

2) Input a job name.

3) Before submit the job add an attribute in Producer to skip header line:

INTEGER SkipRows=1

2020091323601-image.png

The job fails with a new error:

W_0_o_LOAD_TPT_TEST_CSV[1]: TPT19134 !ERROR! Fatal data error processing file 'F:\tmp\test_tpt.csv'. Delimited Data Parsing error: Too many columns in row 2.

The error occurred because there is a delimiter in the second row of the CSV file.

Skip delimiter in double quotes

To resolve the issue, we just need to pass in additional attributes when submit the jobs.

1) Resubmit the third job run.

2) Ensure the following attributes are added to the Producer operator:

VARCHAR OpenQuoteMark = '"',
VARCHAR CloseQuoteMark = '"',
VARCHAR QuotedData='Optional'

2020091325004-image.png

3) Wait until the job is completed.

This time the job will complete successfully as the following screenshot shows:

2020091325047-image.png

Verify data in database

Run the following statement to verify the data:

SEL * FROM TestDb.TPT_TEST_CSV_2;

The output looks like the following:

2020091325255-image.png

The four records are loaded into database successfully. 

TPT script

The following is the generated TPT script by the wizard with changes we made to resolve errors:

USING CHARACTER SET ASCII
DEFINE JOB LOAD_TPT_TEST_CSV
DESCRIPTION 'Job to load tpt_test.csv file into Teradata TestDb database.'
(
	DEFINE OPERATOR W_1_o_LOAD_TPT_TEST_CSV
	TYPE LOAD
	SCHEMA *
	ATTRIBUTES
	(
		VARCHAR UserName, 
		VARCHAR UserPassword, 
		VARCHAR LogonMech, 
		VARCHAR LogonMechData, 
		VARCHAR LogTable, 
		VARCHAR TargetTable, 
		INTEGER BufferSize, 
		INTEGER ErrorLimit, 
		INTEGER MaxSessions, 
		INTEGER MinSessions, 
		INTEGER TenacityHours, 
		INTEGER TenacitySleep, 
		VARCHAR AccountID, 
		VARCHAR DateForm, 
		VARCHAR ErrorTable1, 
		VARCHAR ErrorTable2, 
		VARCHAR NotifyExit, 
		VARCHAR NotifyExitIsDLL, 
		VARCHAR NotifyLevel, 
		VARCHAR NotifyMethod, 
		VARCHAR NotifyString, 
		VARCHAR PauseAcq, 
		VARCHAR PrivateLogName, 
		VARCHAR TdpId, 
		VARCHAR TraceLevel, 
		VARCHAR WorkingDatabase
	);

	DEFINE SCHEMA W_0_s_LOAD_TPT_TEST_CSV
	(
		Col_ID VARCHAR(10),
		Col_Attr VARCHAR(1),
		Col_Value VARCHAR(10),
		Col_CreatedDate VARCHAR(10),
		Col_Comments VARCHAR(255)
	);

	DEFINE OPERATOR W_0_o_LOAD_TPT_TEST_CSV
	TYPE DATACONNECTOR PRODUCER
	SCHEMA W_0_s_LOAD_TPT_TEST_CSV
	ATTRIBUTES
	(
		VARCHAR FileName, 
		VARCHAR Format, 
		VARCHAR OpenMode, 
		INTEGER BlockSize, 
		INTEGER BufferSize, 
		INTEGER RetentionPeriod, 
		INTEGER RowsPerInstance, 
		INTEGER SecondarySpace, 
		INTEGER UnitCount, 
		INTEGER VigilElapsedTime, 
		INTEGER VigilWaitTime, 
		INTEGER VolumeCount, 
		VARCHAR AccessModuleName, 
		VARCHAR AccessModuleInitStr, 
		VARCHAR DirectoryPath, 
		VARCHAR ExpirationDate, 
		VARCHAR IndicatorMode, 
		VARCHAR PrimarySpace, 
		VARCHAR PrivateLogName, 
		VARCHAR RecordFormat, 
		VARCHAR RecordLength, 
		VARCHAR SpaceUnit, 
		VARCHAR TextDelimiter, 
		VARCHAR VigilNoticeFileName, 
		VARCHAR VigilStartTime, 
		VARCHAR VigilStopTime, 
		VARCHAR VolSerNumber, 
		VARCHAR UnitType
,
		INTEGER SkipRows=1,
		 VARCHAR OpenQuoteMark = '"',
		 VARCHAR CloseQuoteMark = '"',
		 VARCHAR QuotedData='Optional'
	);

	APPLY
		(
			'INSERT INTO TestDb.TPT_TEST_CSV_2 (ID,Col_Attr,Col_Value,CreatedDate,Comments) VALUES (:Col_ID,:Col_Attr,:Col_Value,:Col_CreatedDate,:Col_Comments);'
		)
	TO OPERATOR
	(
		W_1_o_LOAD_TPT_TEST_CSV[1]

		ATTRIBUTES
		(
			UserName = 'dbc', 
			UserPassword = 'dbc', 
			LogonMech = 'TD2', 
			LogTable = 'TestDb.TPT_TEST_CSV_2_log', 
			TargetTable = 'TestDb.TPT_TEST_CSV_2', 
			TdpId = '192.168.119.128'
		)
	)
	SELECT * FROM OPERATOR
	(
		W_0_o_LOAD_TPT_TEST_CSV[1]

		ATTRIBUTES
		(
			FileName = 'test_tpt.csv', 
			Format = 'DELIMITED', 
			OpenMode = 'Read', 
			DirectoryPath = 'F:\tmp', 
			IndicatorMode = 'N', 
			TextDelimiter = ','
		)
	);
);
With TPT scripts, we can directly run the job using tbuild command line.
1) Save the script as file load_tpt_csv.tpt.
2) Run the following command to run the job:
tbuild -f load_tpt_csv.tpt

The command line can be scheduled to run regularly.

More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts