Regular expressions

Objective

  • Use regular expressions in Java

Regular expressions overview

Regular expressions provide a way of searching for patterns in text. They are considered extremely useful in modern programming, but were missing from early Java implementations. A number of third-party packages were created to provide regular expression functionality before there was an official Java implementation. While you may still see some of those in use, this course will cover the standard implementation. It tends to be faster than the rest, and once you understand how regular expressions work, you can transfer that knowledge quickly to alternative implementations.

References

Summary of pattern constructs

ConstructMatches
cany regular character will match itself
\ttab
\nnewline (line feed)
\rcarriage return
\aalarm (bell)
\eescape
\\backslash
\used to escape next character (\{ -> {)
[abc]matches a, b, or c
[^abc]any character except a, b, or c
[a-zA-Z]any letter in upper or lower case
[a-d[q-t]]same as [a-dq-t] (union)
[a-p&&[n-z]]n, o, or p (intersection)
[a-z&&[^mn]]a through z, except for m and n (subtraction)
.any single character (except line terminator usually)
\d[0-9]
\D[^0-9]
\sany whitespace character
\Sany non-whitespace character
\wany word character [a-zA-Z0-9_]
\Wany non-word character
^beginning of sequence (if at start of pattern)
$end of sequence (if at end of pattern)
\ba word boundary
\Ba non-word boundary
\p{Lower}any lowercase character
\p{Upper}any uppercase character
\p{Alpha}any alphabetic character
\p{Digit}any digit
\p{Alnum}any digit or alphabetic character
\p{Punct}any punctuation character
\p{Blank}a space or tab
\p{Cntrl}any control character
\p{XDigit}any hexadecimal digit
\p{Space}any whitespace character
a|bmatches either a or b (alternation)
()used to group patterns
\1backreference to first matched group

Summary of repetition constructs

ConstructEffect on previous item
?match 0 or 1 time, greedy
*0 or more times, greedy
+1 or more times, greedy
{n}exactly n times, greedy
{n,}at least n times, greedy
{n,m}at least n but not more than m times, greedy
??match 0 or 1 time, non-greedy
*?0 or more times, non-greedy
+?1 or more times, non-greedy
{n}?exactly n times, non-greedy
{n,}?at least n times, non-greedy
{n,m}?at least n but not more than m times, non-greedy
?+match 0 or 1 time, super-greedy
*+0 or more times, super-greedy
++1 or more times, super-greedy
{n}+exactly n times, super-greedy
{n,}+at least n times, super-greedy
{n,m}+at least n but not more than m times, super-greedy

You probably won't have to worry anytime soon about whether a quantifier is greedy, super-greedy (possessive), or reluctant, but here is what those terms mean:

  • greedy: Greedy is the default. A greedy quantifier will match as much as it can, and then, if no match can be made, it backs off, slowly giving up characters until a match can be made, or until attempts at matching fail.
  • reluctant: A reluctant quantifier will match as little as it can, and then, if no match can be made, it starts grabbing additional characters that it can match until an overall match can be made, or until attempts at matching fail.
  • possessive (super-greedy): A super-greedy quantifier will match as much as it can, and then, if no match can be made, refuses to back off, leaving the attempt at a match in failure.

Using regular expressions

You can use regular expressions a few different ways:

  • formal use of Pattern and Matcher classes:
    // look for hat or heat anywhere in line Pattern p = Pattern.compile(".*he?at.*"); // Now give it a String to search Matcher m = p.matcher("Have you seen my hat anywhere?"); // check for a match (should be true in this case) boolean b = m.matches();
  • use Pattern's convenience method "matches":
    boolean b = Pattern.matches(".*he?at.*", "Have you seen my hat anywhere?");
  • use String's convenience method "matches":
    String s = "Have you seen my hat anywhere?"; boolean b = s.matches(".*he?at.*");

The Matcher class has three methods used to search for patterns:

  • matches: attempts to match the entire input sequence against the pattern
  • lookingAt: attempts to match the input sequence, starting at the beginning, against the pattern
  • find: scans the input sequence looking for the next subsequence that matches the pattern

Demonstration programs

String methods

Several String methods use regular expressions. They are:

  • boolean matches(String regex)
  • String replaceAll(String regex, String replacement)
  • String replaceFirst(String regex, String replacement)
  • String[] split(String regex)
  • String[] split(String regex, int limit)