C# Regular Expression (Regex) Examples in .NET
More Advanced Regular Expression Syntax
In this example, the Regular Expression pattern matches one or more word characters
followed by a carriage return then a new line.
This article continues from Learn Regular Expression (Regex) syntax with C# and
.NET and covers character escapes,
match grouping, some C# code examples, matching boundaries and RegexOptions.
.NET and covers character escapes,
match grouping, some C# code examples, matching boundaries and RegexOptions.
Matching special characters with character escapes
Special characters such as Tab and carriage return are matched using character escapes.
The syntax is similar to C and C#. The common character escapes are listed below.
The syntax is similar to C and C#. The common character escapes are listed below.
Special Character
Description
\t
Matches a tab
\r
Matches a carriage return
\n
Matches a new line
\u0020
Matches a Unicode character
using hexadecimal representation.
Exactly four digits must be specified.
using hexadecimal representation.
Exactly four digits must be specified.
In this example, the Regular Expression pattern matches one or more word characters
followed by a carriage return then a new line.
Text: an anaconda ate Anna Jones Regex: \w+\r\n Match: ate
Depending on your operating system you might have to combine the
new line sequence for your platform. For Microsoft Windows systems you should generally
use
is a carriage return then line feed (CRLF). To simply match the end of a line or
string use the dollar sign (
\r
and \n
character escapes to create the correctnew line sequence for your platform. For Microsoft Windows systems you should generally
use
\r\n
whichis a carriage return then line feed (CRLF). To simply match the end of a line or
string use the dollar sign (
$
).Match Grouping
Groups perform a few different functions. They allow the quantifiers (such as plus
and star) to be applied to sections of the match instead of just individual characters.
and star) to be applied to sections of the match instead of just individual characters.
A group is specified by the round brackets
If you want to match the round bracket characters you must use the escape character
before the bracket e.g.
or
(
and )
.If you want to match the round bracket characters you must use the escape character
before the bracket e.g.
\(
or
\)
.This regex matches '
optionally followed by '
then starts a group and matches one or more of any character that is not a full
stop/period (
closes the group then matches '
http://
'optionally followed by '
www.
'then starts a group and matches one or more of any character that is not a full
stop/period (
.
)closes the group then matches '
.com
'.Text: http://www.yahoo.com/index.html and http://yahoo.com Regex: http://(www\.)?([^\.]+)\.com Matches: http://www.yahoo.com http://yahoo.com
The question mark after the group
it optional.
(www\.)
applies to the whole group makingit optional.
An example in C#
The regular expression classes are in the
System.Text.RegularExpressions
namespace.using System.Text.RegularExpressions;
The
represents a regular expression. A regular expression pattern must be specified
when creating a
object. The pattern cannot be changed.
Regex
classrepresents a regular expression. A regular expression pattern must be specified
when creating a
Regex
object. The pattern cannot be changed.
Regex exp = new Regex( @"http://(www\.)?([^\.]+)\.com", RegexOptions.IgnoreCase); string InputText = "http://www.yahoo.com/";
The
class stores a list of successful matches found by applying the regular expression
pattern to an input string.
MatchCollection
class stores a list of successful matches found by applying the regular expression
pattern to an input string.
MatchCollection MatchList = exp.Matches(InputText); Match FirstMatch = MatchList[0]; Console.WriteLine(FirstMatch.Value);
The
represents a group within the regex pattern. Each
Group
classrepresents a group within the regex pattern. Each
Match
object has aGroups
collection.Group GroupCurrent; for (int i = 1; i < FirstMatch.Groups.Count; i++) { GroupCurrent = FirstMatch.Groups[i];
The
property on the group can be used to check if the
Success
property on the group can be used to check if the
Group
matched or not.if (GroupCurrent.Success) { Console.WriteLine("\tMatched:" + GroupCurrent.Value); } else { Console.WriteLine("\tGroup didn't match"); } }
Groups within a Match can be referenced by number or by name (see below).
if (MatchList.Count > 0) { if (MatchList[1].Success) { Console.WriteLine("Group 1 matched"); } }
Matches also allow sections of the match to be used in replacement expressions when
using
using
Regex.Replace()
.Named Groups
Groups can be named to allow easier identification with the following syntax.
(?<NameOfGroup>expression)
Matching boundaries between words
To match a boundary between a word character (
The match will occur at the first or last character in words separated by any nonalphanumeric
characters. For example, the following Regular Expression matches one or more word
characters followed by a word boundary followed by a hyphen (-) followed by another
word boundary followed by one or more word characters.
\w
) and a non-word character (\W
) use\b
.The match will occur at the first or last character in words separated by any nonalphanumeric
characters. For example, the following Regular Expression matches one or more word
characters followed by a word boundary followed by a hyphen (-) followed by another
word boundary followed by one or more word characters.
Text: Anna Jones and John William-Scott went to lunch- with an anaconda Regex: \w+\b-\b\w+ Options: IgnoreCase Matches: Anna Jones and John William-Scott went to lunch- with an anaconda William-Scott
Use
that a match must not occur on a
\B
to specifythat a match must not occur on a
\b
boundary.Regular Expression Options
Regular Expression Options can be used in the constructor for the
Regex
class.RegexOptions.None - Specifies that no options are set.
RegexOptions.IgnoreCase - Specifies case-insensitive matching.
RegexOptions.Multiline - Multiline mode. Changes the meaning
of ^ and $ so they match at the beginning and end, respectively, of any line, and
not just the beginning and end of the entire string.
of ^ and $ so they match at the beginning and end, respectively, of any line, and
not just the beginning and end of the entire string.
RegexOptions.Singleline - Specifies single-line mode. Changes
the meaning of the dot (
so it matches every character (instead of every character except
the meaning of the dot (
.
)so it matches every character (instead of every character except
\n
).RegexOptions.ExplicitCapture - Specifies that the only valid captures
are groups that are explicitly named or in the form
are groups that are explicitly named or in the form
(?<name>...)
.RegexOptions.IgnorePatternWhitespace - Eliminates unescaped white space
from the pattern and enables comments marked with the hash sign (
from the pattern and enables comments marked with the hash sign (
#
).RegexOptions.Compiled - Specifies that the regular expression
is compiled to an assembly. The regular expression will be faster to match but it
takes more time to compile initially. This option (although tempting) should only
be used when the expression will be used many times. e.g. in a
is compiled to an assembly. The regular expression will be faster to match but it
takes more time to compile initially. This option (although tempting) should only
be used when the expression will be used many times. e.g. in a
foreach
loopRegexOptions.ECMAScript - Enables ECMAScript-compliant behavior
for the expression. This flag can be used only in conjunction with the IgnoreCase,
Multiline, and Compiled flags. The use of this flag with any other flags results
in an exception.
for the expression. This flag can be used only in conjunction with the IgnoreCase,
Multiline, and Compiled flags. The use of this flag with any other flags results
in an exception.
RegexOptions.RightToLeft - Specifies that the search will be
from right to left instead of from left to right.
from right to left instead of from left to right.