Regular expressions

Objectives

  • Use regular expressions in JavaScript scripts
  • Use special characters in regular expressions
  • Use character classes in regular expressions
  • Specify grouping in regular expressions
  • Specify repetition in regular expressions
  • Use alternation in regular expressions
  • Use flags to modify regular expression parsing
  • Specify special occurrences in regular expressions
  • Use the String class regular expression methods
  • Use regular expression (RegExp) objects

Regular expressions

Regular expressions allow us to search for patterns in text. This is a powerful tool which has become a standard part of most modern languages. It is difficult to imagine developing any procedural language these days without supporting regular expressions.

The concepts learned with JavaScript regular expressions will be beneficial when using everything from UNIX/Linux standard tools to programming languages such as Java, Perl, and PHP. Taking the time to learn how to use regular expressions will almost certainly pay off in the future.

Online regular expression tester

JavaScript regular expressions

  • regular expressions delimited using "/" character
  • support through RegExp object and new String methods
  • examples:
  • see reg01.html

Special characters

  • \0 = null
  • \t = tab
  • \n = newline
  • \v = vertical tab
  • \f = form feed
  • \r = carriage return
  • \xnn = an ASCII character, where nn is the hex value of the character
  • \unnnn = a Unicode character, where nnnn is the hex value of the character
  • \cX = control character (Control-X in this case)
  • \s = any whitespace character
  • \S = any non-whitespace character
  • \w = any ASCII word character ([a-zA-Z0-9_])
  • \W = any ASCII non-word character ([^a-zA-Z0-9_])
  • \d = any ASCII digit ([0-9])
  • \D = any ASCII non-digit ([^0-9])
  • . = any one character (not newline though)

Character classes

  • [abc] = one character, either a, b, or c
  • [a-c] = one character, either a, b, or c
  • [^abc] = any one character except a, b, or c
  • [a-z] = any one lowercase character
  • [a-zA-Z] = any one uppercase or lowercase character
  • [0-9] = any one character that is a digit

Repetition and grouping

  • ? = 0 or 1 of previous item
  • * = 0 or more of previous item
  • + = 1 or more of previous item
  • {n} = n of previous item
  • {n,} = n or more of previous item
  • {n,m} = n to m of previous item
  • () = grouping with remembered groups
  • (?:) = grouping without remembered groups
  • The repetition patterns are greedy matchers by default; you can make them non-greedy by following the repetition symbol with a '?'.

Alternation and special marker symbols

  • ^ = beginning of String or line
  • $ = end of String or line
  • | = alternation
  • \b = word boundary
  • \B = not word boundary
  • \n = backreference, where n is the number of the subexpression being referenced
  • (?=p) = look ahead assertion for pattern p without matching
  • (?!p) = negative look ahead assertion for pattern p without matching

Flags

  • i = case insensitive matching
  • g = global matching (don't stop at the first match)
  • m = multiline mode

String methods which support regular expressions

  • search(regexp)
    • returns start of first match found or -1 if match not found
  • replace(regexp, replacement)
    • if regexp is a string, then that is searched for directly
    • the replacement argument may contain a string, numbered subexpressions from the matching ($1, ..., $99), or even a function which can generate the replacement
  • match(regexp)
    • if argument is not RegExp, it is converted into one
    • if the global property of the RegExp is set, then null is returned if no match is found; an array of matched strings is returned if matches are found; note that the length of the returned array will indicate the number of matches found
    • if the global property is not set, then only a single match is searched for; null is returned if no match is found; if a match is found, an array is returned with element 0 being the matched text, and the rest of the elements containing all the matched subexpressions; the array will also have two properties set if a match is found: "input" will contain the input string, and "index" will contain the start position of the match found
  • split(regexp)
    • returns an array of strings delimited by the matched regexp
    • parenthesize regexp expression if you want those returned as well

RegExp object

  • constructor
    • RegExp(pattern [, flags])
    • flags can be any combination of g (global), i (ignore case), or m (multiline)
  • methods
    • exec(String)
      • applies RegExp pattern to String
      • returns null if no match found
      • can call repeatedly to find all matches
      • lastIndex is set to 0 when no more matches are found
      • set lastIndex to 0 when done searching a String
    • test(String)
      • returns true if a match is found
      • returns false if no match is found
  • properties
    • source: text of regular expression
    • global: true if searching for multiple matches
    • multiline: true is searching through multiline strings
    • ignoreCase: true if match is case insensitive
    • lastIndex: position in String where next search will begin