CIS 119 - Regular Expressions
Objectives
- use regular expressions in JavaScript scripts
- use special characters in regular expressions
- use character classes in regular expressions
- specify grouping in regular expressions
- specify repetition in regular expressions
- use alternation in regular expressions
- use flags to modify regular expression parsing
- specify special occurrences in regular expressions
- use the String class regular expression methods
- use regular expression (RegExp) objects
Regular expressions
Regular expressions allow us to search for patterns in text.
This is a powerful tool which has become a standard part of most
modern languages. It is difficult to imagine developing any
procedural language these days without supporting regular
expressions.
The concepts learned with JavaScript regular expressions will
be beneficial when using everything from UNIX/Linux standard tools
to programming languages such as Java, Perl, and PHP. Taking the
time to learn how to use regular expressions will almost certainly
pay off in the future.
Online regular expression tester
JavaScript regular expressions
Special characters
- \0 = null
- \t = tab
- \n = newline
- \v = vertical tab
- \f = form feed
- \r = carriage return
- \xnn = an ASCII character, where nn is the hex value of the character
- \unnnn = a Unicode character, where nnnn is the hex value of the character
- \cX = control character (Control-X in this case)
- \s = any whitespace character
- \S = any non-whitespace character
- \w = any ASCII word character ([a-zA-Z0-9_])
- \W = any ASCII non-word character ([^a-zA-Z0-9_])
- \d = any ASCII digit ([0-9])
- \D = any ASCII non-digit ([^0-9])
- . = any one character (not newline though)
Character classes
- [abc] = one character, either a, b, or c
- [a-c] = one character, either a, b, or c
- [^abc] = any one character except a, b, or c
- [a-z] = any one lowercase character
- [a-zA-Z] = any one uppercase or lowercase character
- [0-9] = any one character that is a digit
Repetition and grouping
- ? = 0 or 1 of previous item
- * = 0 or more of previous item
- + = 1 or more of previous item
- {n} = n of previous item
- {n,} = n or more of previous item
- {n,m} = n to m of previous item
- () = grouping with remembered groups
- (?:) = grouping without remembered groups
- The repetition patterns are greedy matchers by default; you
can make them non-greedy by following the repetition
symbol with a '?'.
Alternation and special marker symbols
- ^ = beginning of String or line
- $ = end of String or line
- | = alternation
- \b = word boundary
- \B = not word boundary
- \n = backreference, where n is the number of the subexpression being referenced
- (?=p) = look ahead assertion for pattern p without matching
- (?!p) = negative look ahead assertion for pattern p without matching
Flags
- i = case insensitive matching
- g = global matching (don't stop at the first match)
- m = multiline mode
String methods which support regular expressions
- search(regexp)
- returns start of first match found or -1 if match not found
- replace(regexp, replacement)
- if regexp is a string, then that is searched for directly
- the replacement argument may contain a string, numbered subexpressions from
the matching ($1, ..., $99), or even a function which can generate the replacement
- match(regexp)
- if argument is not RegExp, it is converted into one
- if the global property of the RegExp is set, then null is returned
if no match is found; an array of matched strings is returned if
matches are found; note that the length of the returned array will
indicate the number of matches found
- if the global property is not set, then only a single match is searched for;
null is returned if no match is found; if a match is found, an array is
returned with element 0 being the matched text, and the rest of the elements
containing all the matched subexpressions; the array will also have two
properties set if a match is found: "input" will contain the input string, and
"index" will contain the start position of the match found
- split(regexp)
- returns an array of strings delimited by the matched regexp
- parenthesize regexp expression if you want those returned as well
RegExp object
- constructor
- RegExp(pattern [, flags])
- flags can be any combination of g (global), i (ignore case),
or m (multiline)
- methods
- exec(String)
- applies RegExp pattern to String
- returns null if no match found
- can call repeatedly to find all matches
- lastIndex is set to 0 when no more matches are found
- set lastIndex to 0 when done searching a String
- test(String)
- returns true if a match is found
- returns false if no match is found
- properties
- source: text of regular expression
- global: true if searching for multiple matches
- multiline: true is searching through multiline strings
- ignoreCase: true if match is case insensitive
- lastIndex: position in String where next search will begin