Objective
- Use regular expressions in Java
Regular expressions provide a way of searching for patterns in text. They are considered extremely useful in modern programming, but were missing from early Java implementations. A number of third-party packages were created to provide regular expression functionality before there was an official Java implementation. While you may still see some of those in use, this course will cover the standard implementation. It tends to be faster than the rest, and once you understand how regular expressions work, you can transfer that knowledge quickly to alternative implementations.
Construct | Matches |
---|---|
c | any regular character will match itself |
\t | tab |
\n | newline (line feed) |
\r | carriage return |
\a | alarm (bell) |
\e | escape |
\\ | backslash |
\ | used to escape next character (\{ -> {) |
[abc] | matches a, b, or c |
[^abc] | any character except a, b, or c |
[a-zA-Z] | any letter in upper or lower case |
[a-d[q-t]] | same as [a-dq-t] (union) |
[a-p&&[n-z]] | n, o, or p (intersection) |
[a-z&&[^mn]] | a through z, except for m and n (subtraction) |
. | any single character (except line terminator usually) |
\d | [0-9] |
\D | [^0-9] |
\s | any whitespace character |
\S | any non-whitespace character |
\w | any word character [a-zA-Z0-9_] |
\W | any non-word character |
^ | beginning of sequence (if at start of pattern) |
$ | end of sequence (if at end of pattern) |
\b | a word boundary |
\B | a non-word boundary |
\p{Lower} | any lowercase character |
\p{Upper} | any uppercase character |
\p{Alpha} | any alphabetic character |
\p{Digit} | any digit |
\p{Alnum} | any digit or alphabetic character |
\p{Punct} | any punctuation character |
\p{Blank} | a space or tab |
\p{Cntrl} | any control character |
\p{XDigit} | any hexadecimal digit |
\p{Space} | any whitespace character |
a|b | matches either a or b (alternation) |
() | used to group patterns |
\1 | backreference to first matched group |
Construct | Effect on previous item |
---|---|
? | match 0 or 1 time, greedy |
* | 0 or more times, greedy |
+ | 1 or more times, greedy |
{n} | exactly n times, greedy |
{n,} | at least n times, greedy |
{n,m} | at least n but not more than m times, greedy |
?? | match 0 or 1 time, non-greedy |
*? | 0 or more times, non-greedy |
+? | 1 or more times, non-greedy |
{n}? | exactly n times, non-greedy |
{n,}? | at least n times, non-greedy |
{n,m}? | at least n but not more than m times, non-greedy |
?+ | match 0 or 1 time, super-greedy |
*+ | 0 or more times, super-greedy |
++ | 1 or more times, super-greedy |
{n}+ | exactly n times, super-greedy |
{n,}+ | at least n times, super-greedy |
{n,m}+ | at least n but not more than m times, super-greedy |
You probably won't have to worry anytime soon about whether a quantifier is greedy, super-greedy (possessive), or reluctant, but here is what those terms mean:
You can use regular expressions a few different ways:
The Matcher class has three methods used to search for patterns:
Several String methods use regular expressions. They are: