Teradata FastLoad with Optional Quotes
In article Teradata FastLoad - Load CSV File, it shows how to load CSV into Teradata. The input file is very basic CSV file. This article expands on that to provide examples about loading CSV files with optional quoted fields to skip delimiters. It also shows how to load files with multi-character quotes.
Example CSV file
Create a sample CSV file named test_fastload3.csv with the following content:
ID, Attr, Value, CreatedDate, Comments 1,A,10,2020-09-13,"This is a comment with comma," 2,B,20,2020-09-13,This is a comment without comma. 3,C,30,2020-09-13,This is a comment without comma. 4,D,40,2020-09-13,This is a comment without comma.
Column 'Comments' in the first record is double quoted as there is a comma in the content.
Optional double quoted fields
FastLoad command SET RECORD has options available to specify quote characters. It also provides options about mandatory and optional quotes.
Create a FastLoad script file named load-csv3.fastload. The content looks like the following:
LOGON 192.168.119.128/DBC,DBC; DATABASE TestDb; DROP TABLE FASTLOAD_CSV3; DROP TABLE FASTLOAD_CSV3_ERR_1; DROP TABLE FASTLOAD_CSV3_ERR_2; CREATE SET TABLE FASTLOAD_CSV3 ( ID VARCHAR(10) NOT NULL, Col_Attr VARCHAR(1), Col_Value VARCHAR(10), CreatedDate VARCHAR(10), Comments VARCHAR(255) ) UNIQUE PRIMARY INDEX(ID); BEGIN LOADING FASTLOAD_CSV3 ERRORFILES FASTLOAD_CSV3_ERR_1, FASTLOAD_CSV3_ERR_2; SET RECORD VARTEXT DELIMITER ',' QUOTE OPTIONAL '"'; RECORD 2; DEFINE ID (VARCHAR(10)), Col_Attr (VARCHAR(1)), Col_Value(VARCHAR(10)), CreatedDate (VARCHAR(10)), Comments (VARCHAR(255)) FILE = test_fastload3.csv; INSERT INTO FASTLOAD_CSV3 (ID, Col_Attr, Col_Value, CreatedDate, Comments) VALUES (:ID, :Col_Attr, :Col_Value, :CreatedDate, :Comments) ; END LOADING; LOGOFF ;
SET RECORD command is set as this:
SET RECORD VARTEXT DELIMITER ',' QUOTE OPTIONAL '"';
Quote character is specified as double quotes (") and it is optional. If all fields are all quoted, the value can be specified as YES:
SET RECORD VARTEXT DELIMITER ',' QUOTE YES '"';
Run FastLoad script
Command fastload can be used to run the script file:
fastload < load-csv3.fastload
Output:
... 0008 SET RECORD VARTEXT DELIMITER ',' QUOTE OPTIONAL '"'; **** 23:30:14 Now set to read 'Variable-Length Text' records **** 23:30:14 Delimiter character(s) is set to ',' **** 23:30:14 Field values are optionally quoted with '"' **** 23:30:14 Command completed successfully **** 23:06:55 Starting record number set to : 2 ... **** 23:30:14 END LOADING COMPLETE Total Records Read = 5 - skipped by RECORD command = 1 - sent to the RDBMS = 4 Total Error Table 1 = 0 ---- Table has been dropped Total Error Table 2 = 0 ---- Table has been dropped Total Inserts Applied = 4 Total Duplicate Rows = 0 Start: Sun Sep 13 23:30:14 2020 End : Sun Sep 13 23:30:14 2020
Verify the data
Run the following SQL statement to verify the result:
SELECT * FROM TestDb.FASTLOAD_CSV3;
The result set looks like the following screenshot:
Multi-characters quotes
FastLoad can also handle fields are quoted with multiple character. For example, column 'Comments' in the first record is quoted with string '"|"'.
ID, Attr, Value, CreatedDate, Comments 1,A,10,2020-09-13,"|"This is a comment with comma,"|" 2,B,20,2020-09-13,This is a comment without comma. 3,C,30,2020-09-13,This is a comment without comma. 4,D,40,2020-09-13,This is a comment without comma.
To load this file, SET RECORD command can be changed to the following:
SET RECORD VARTEXT DELIMITER ',' QUOTE OPTIONAL '"|"';
Run the FastLoad script and it will parse the command differently:
0008 SET RECORD VARTEXT DELIMITER ',' QUOTE OPTIONAL '"|"'; **** 23:35:19 Now set to read 'Variable-Length Text' records **** 23:35:19 Delimiter character(s) is set to ',' **** 23:35:19 Field values are optionally quoted with '"|"' **** 23:35:19 Command completed successfully
Different characters for opening and closing quotes
FastLoad can also load delimited files with different opening and closing quote characters.
For example the following file is optionally quoted by '[...]'.
ID, Attr, Value, CreatedDate, Comments 1,A,10,2020-09-13,[This is a comment with comma,] 2,B,20,2020-09-13,This is a comment without comma. 3,C,30,2020-09-13,This is a comment without comma. 4,D,40,2020-09-13,This is a comment without comma.
Change SET RECORD command to the following:
SET RECORD VARTEXT DELIMITER ',' QUOTE OPTIONAL '[' ']';
The output for SET RECORD command looks like the following lines:
0008 SET RECORD VARTEXT DELIMITER ',' QUOTE OPTIONAL '[' ']'; **** 23:40:23 Now set to read 'Variable-Length Text' records **** 23:40:23 Delimiter character(s) is set to ',' **** 23:40:23 Field values are optionally quoted with opening quote '[' and closing quote ']' **** 23:40:23 Command completed successfully