Regular expressions

Objective

Regular expressions

Regular expressions provide a way of searching for patterns in text. They are considered extremely useful in modern programming, but were missing from early Java implementations. A number of third-party packages were created to provide regular expression functionality before there was an official Java implementation. While you may still see some of those in use, this course will cover the standard implementation. It tends to be faster than the rest, and once you understand how regular expressions work, you can transfer that knowledge quickly to alternative implementations.

References

Summary of pattern constructs

Construct Matches
cany regular character will match itself
\ttab
\nnewline (line feed)
\rcarriage return
\aalarm (bell)
\eescape
\\backslash
\used to escape next character (\{ -> {)
[abc]matches a, b, or c
[^abc]any character except a, b, or c
[a-zA-Z]any letter in upper or lower case
[a-d[q-t]]same as [a-dq-t] (union)
[a-p&&[n-z]]n, o, or p (intersection)
[a-z&&[^mn]]a through z, except for m and n (subtraction)
.any single character (except line terminator usually)
\d[0-9]
\D[^0-9]
\sany whitespace character
\Sany non-whitespace character
\wany word character [a-zA-Z0-9_]
\Wany non-word character
^beginning of sequence (if at start of pattern)
$end of sequence (if at end of pattern)
\ba word boundary
\Ba non-word boundary
\p{Lower}any lowercase character
\p{Upper}any uppercase character
\p{Alpha}any alphabetic character
\p{Digit}any digit
\p{Alnum}any digit or alphabetic character
\p{Punct}any punctuation character
\p{Blank}a space or tab
\p{Cntrl}any control character
\p{XDigit}any hexadecimal digit
\p{Space}any whitespace character
a|bmatches either a or b (alternation)
()used to group patterns
\1backreference to first matched group

Summary of repetition constructs

Construct Effect on previous item
?match 0 or 1 time, greedy
*0 or more times, greedy
+1 or more times, greedy
{n}exactly n times, greedy
{n,}at least n times, greedy
{n,m}at least n but not more than m times, greedy
??match 0 or 1 time, non-greedy
*?0 or more times, non-greedy
+?1 or more times, non-greedy
{n}?exactly n times, non-greedy
{n,}?at least n times, non-greedy
{n,m}?at least n but not more than m times, non-greedy
?+match 0 or 1 time, super-greedy
*+0 or more times, super-greedy
++1 or more times, super-greedy
{n}+exactly n times, super-greedy
{n,}+at least n times, super-greedy
{n,m}+at least n but not more than m times, super-greedy

You probably won't have to worry anytime soon about whether a quantifier is greedy, super-greedy (possessive), or reluctant, but here is what those terms mean:

Using regular expressions

You can use regular expressions a few different ways:

The Matcher class has three methods used to search for patterns:

Demonstration programs

String methods

Several String methods use regular expressions. They are: