C# Regex - Remove Heading Tags

Raymond Raymond event 2020-09-22 visibility 1,336
more_vert

C# regular expressions can be used to match and replace certain text patterns from a string variable.

Remove heading tags

The following regular expression can be used to remove all heading tags incl. h1 to h9 from HTML text string.

<[hH][1-9][^>]*>[^<]*</[hH][1-9]\s*>

Code snippet

var html = "Your HTML string...";
var regex = new Regex(@"<[hH][1-9][^>]*>[^<]*</[hH][1-9]\s*>", RegexOptions.Compiled | RegexOptions.Multiline
var replacedHtml = regex.Replace(html, "");

Example

Assuming the following is the input string:

Headings:
<h3>Heading h3</h3>    
<h4>LINQ to SQL - Select N Random Records</h4>

After replacement, the output looks like the following:

Headings:
    

Remove tags only

To keep all the text content but to remove all HTML tags, use the following regular expression:

<[^>]*>

Example 

For the above input HTML, the output looks like the following:

Headings:
Heading h3    
LINQ to SQL - Select N Random Records

More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts