Raymond Raymond

Teradata Hash Functions

event 2020-12-26 visibility 2,955 comment 0 insights toc
more_vert
insights Stats

Teradata hash functions are used to decide data distributions for each row in Teradata tables. When data is inserted into a table, row hash of each record will be calculated first and then the bucket will be calculated, at last the AMP number and fallback AMP number (if fallback is enabled) will be calculated to decide which AMP the data will be stored on. 

HASHROW

Function HASHROW can be used to determine row distributions by the proposed primary index column(s).

SELECT id, HASHROW(id) FROM TestDb.test_table;

Example output:

	id	HASHROW(id)
1 5 60-9D-17-15
2 5 60-9D-17-15
3 3 6D-27-DA-A6
4 1 79-B6-9E-37
5 6 DA-53-B5-4C
6 6 DA-53-B5-4C
7 4 E6-E2-78-DD
8 4 E6-E2-78-DD
9 2 F3-71-3C-6E
For the same value(s), the hash result will be the same. 

HASHBUCKET

Function HASHBUCKET is used to calculate the hash bucket of a hash row value.

SELECT id, HASHROW(id) AS ROW_ID, HASHBUCKET(ROW_ID) AS BUCKET FROM TestDb.test_table;

Example output:

	id	ROW_ID	        BUCKET
1	5	60-9D-17-15	395,729
2	5	60-9D-17-15	395,729
3	3	6D-27-DA-A6	447,101
4	1	79-B6-9E-37	498,537
5	6	DA-53-B5-4C	894,267
6	6	DA-53-B5-4C	894,267
7	4	E6-E2-78-DD	945,703
8	4	E6-E2-78-DD	945,703
9	2	F3-71-3C-6E	997,139
For same hash row value, the bucket will be the same.

HASHAMP

Function HASHAMP is used to decide the AMP number (vproc) of each hash bucket.

SELECT id, HASHROW(id) AS ROW_ID, HASHBUCKET(ROW_ID) AS BUCKET, HASHAMP(BUCKET) AS "AMP#" FROM TestDb.test_table;

Sample output:

	id	ROW_ID	        BUCKET	AMP#
1	1	79-B6-9E-37	498,537	0
2	2	F3-71-3C-6E	997,139	0
3	3	6D-27-DA-A6	447,101	0
4	4	E6-E2-78-DD	945,703	0
5	4	E6-E2-78-DD	945,703	0
6	5	60-9D-17-15	395,729	0
7	5	60-9D-17-15	395,729	0
8	6	DA-53-B5-4C	894,267	0
9	6	DA-53-B5-4C	894,267	0
*Note - there are only two AMPs in the Teradata instance the sample code runs on.

HASHBAKAMP

Function HASHBAKAMP is used to calculate the fallback AMP number of each bucket. When the primary bucket AMP is down, the query can fall back to retrieve data from the fallback AMP. 

SELECT id, HASHROW(id) AS ROW_ID, HASHBUCKET(ROW_ID) AS BUCKET, HASHAMP(BUCKET) AS "AMP#",
HASHBAKAMP(BUCKET) AS "Fallback AMP#"
FROM TestDb.test_table;

Sample output:
	id	ROW_ID	BUCKET	AMP#	Fallback AMP#
1	5	60-9D-17-15	395,729	0	1
2	5	60-9D-17-15	395,729	0	1
3	3	6D-27-DA-A6	447,101	0	1
4	1	79-B6-9E-37	498,537	0	1
5	6	DA-53-B5-4C	894,267	0	1
6	6	DA-53-B5-4C	894,267	0	1
7	4	E6-E2-78-DD	945,703	0	1
8	4	E6-E2-78-DD	945,703	0	1
9	2	F3-71-3C-6E	997,139	0	1

Multiple columns

If there are multiple columns, the sequence of columns are not important. For example, the hash values will be the same for the following two expressions. 

SELECT HASHROW('A','B'), HASHROW('B','A')
More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts