Teradata SUBSTRING / SUBSTR and REGEXP_SUBSTR Functions

Raymond Raymond event 2020-03-15 visibility 6,574
more_vert

Extract sub string from a string is a common operation in data analytics. In Teradata, function SUBSTRING (SUBSTR) and REGEXP_SUBSTR are provided to achieve that. SUBSTR is used to extract string from a specified location while REGEXP_SUBSTR is used to extract string using regular expressions.

Code snippets

SUBSTR/SUBSTRING

-- returns World!
select	substring('Hello World!' from 7);
-- returns World
select	substring('Hello World!' from 7 for 5);

-- returns World!
select	substr('Hello World!', 7);
-- returns World
select	substr('Hello World!',7,5);

REGEXP_SUBSTR

-- returns 1234
select	REGEXP_SUBSTR('Hello 1234 World 5678 ! Kontext.tech 1010 !!!','[\d]+');
-- returns 5678
select	REGEXP_SUBSTR('Hello 1234 World 5678 ! Kontext.tech 1010 !!!','[\d]+',1,2);
-- return null
select	REGEXP_SUBSTR('Hello 1234 World 5678 ! Kontext.tech 1010 !!!','[\d]+',1,4);

-- returns Hello
select	REGEXP_SUBSTR('Hello World! HeLLO World!!!','Hello',1,1,'i');
-- returns HeLLO
select	REGEXP_SUBSTR('Hello World! HeLLO World!!!','Hello',1,2,'i');
-- returns null
select	REGEXP_SUBSTR('Hello World! HeLLO World!!!','Hello',1,2,'c');

In the above examples, occurrence argument can be used to return different values if there are multiple matches. The last argument can be used to specify match arguments. It can be the following values:

  • 'i' = case-insensitive matching
  • 'c' = case sensitive matching
  • 'n' = the period character (match any character) can match the newline character.
  • 'm' = multiple line match
  • 'l' = returns NULL if source string size is more than maximum allowed size (currently 16 MB). This can be used to avoid errors when match with big string.
  • 'x' = ignore whitespace.

The argument can contain more than one character. For example:

select	REGEXP_SUBSTR('Hello World! HeLLO World!!!','Hello',1,2,'im');

By default, the match is case sensitive and a period does not match the newline character and source string is treated as a single line.

More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts