Regular Expression Reference—Group capture conditional control


Jump to: navigation, search
Visual C# Tutorials
.NET Framework Tutorials

Regular Expression Pocket Reference

© 2003 O'Reilly Media

Grouping, capturing, conditionals, and control

This section covers the syntax for grouping subpatterns, capturing submatches, conditional submatches, and quantifying the number of times a subpattern matches. (See MRE 135-140.)

Capturing and grouping parentheses: (...) and \1, \2, ...
Parentheses perform two functions: grouping and capturing. Text matched by the subpattern within parentheses is captured for later use. Capturing parentheses are numbered by counting their opening parentheses from the left. If backreferences are available, the submatch can be referred to later in the same match with \1, \2, etc. The captured text is made available after a match by implementation-specific methods. For example, \b(\w+)\b\s+\1\b matches duplicate words, such as the the.

Grouping-only parentheses: (?:...)
Groups a subexpression, possibly for alternation or quantifiers, but does not capture the submatch. This is useful for efficiency and reusability. For example, (?:foobar) matches foobar, but does not save the match to a capture group.

Named capture: (?< name>...)
Performs capturing and grouping, with captured text later referenced by name. For example, Subject:(?<subject>.*) captures the text following Subject: to a capture group that can be referenced by the name subject.

Atomic grouping: (?>...)
Text matched within the group is never backtracked into, even if this leads to a match failure. For example, (?>[ab]*)\w\w matches aabbcc but not aabbaa.

Alternation: ...|...
Allows several subexpressions to be tested. Alternation's low precedence sometimes causes subexpressions to be longer than intended, so use parentheses to specifically group what you want alternated. For example, \b(foo|bar)\b matches either of the words foo or bar.

Conditional: (? if then | else)
The if is implementation dependent, but generally is a reference to a captured subexpression or a lookaround. The then and else parts are both regular expression patterns. If the if part is true, the then is applied. Otherwise, else is applied. For example, (<)?foo(?(1)>|bar) matches <foo> and foobar.

Greedy quantifiers: *, +, ?, { num,num }
The greedy quantifiers determine how many times a construct may be applied. They attempt to match as many times as possible, but will backtrack and give up matches if necessary for the success of the overall match. For example, (ab)+ matches all of ababababab.

Lazy quantifiers: *?, +?, ??, { num,num }?
Lazy quantifiers control how many times a construct may be applied. However, unlike greedy quantifiers, they attempt to match as few times as possible. For example, (an)+? matches only an of banana.

Possessive Quantifiers: *+ , ++ , ?+ , { num,num }+
Possessive quantifiers are like greedy quantifiers, except that they "lock in" their match, disallowing later backtracking to break up the sub-match. For example, (ab)++ab will not match ababababab.


Previous_Page_.gif Next_Page_.gif





Personal tools