JavaScript has implemented a regular expression object, RegExp, based on the PERL regular expression. Regular expressions are not only a part of JavaScript and PERL, but also a part of PHP, ASP, most UNIX/Linux commands, and most other serverside scripting languages used for CGI. Consequently, they are both usefully and portable.
This is intended to be a cook book of useful regular expression examples. There are many good tutorial available on the internet and no reason to repeat those efforts. However, for the sake of being complete, the introduction offers a nutshell review of regular expressions in JavaScript.
You can download this JavaScript file with the expressions used on these pages. The expressions are in functions an modified slightly so they can be used right out of the box.
The literal form, shown in the following table, can be used in any place that accepts a variable created with the constructor or created by a literal assignment.
Method | Syntax | Example |
---|---|---|
RegExp Literal | /pattern/flags | var re = /Linux/ig; |
RegExp Object Constructor | new RegExp("pattern","flags") | var re = new RegExp("Linux","ig"); |
The following table lists the meta-characters that can be used in JavaScript regular expressions. These have special meaning inside a regular expression. Instead of being explicit characters they indicate qualities of the characters or modify the characters being matched by the regular expression. Meta-characters provide for flexible patterns and are expanded to match ordinary characters.
Character | Matches | Example |
---|---|---|
. | Any character | Matches any character |
+ | one or more preceding term | /ah+x/ matches ""ahx" or "ahhhhx" |
* | zero or more preceding term | /ah*x/ matches "aax", "ahx", or "ahhhhx" |
? | zero or one preceding term | /ah?x/ matches "aax" and "ahx" but not "ahhx" |
\ | Escape changes meaning of next character | See the following. |
\. | a period | /etc\./ matches "etc." |
\\ | back slash | +\\.+/ matches "home\my"|
\/ | forward slash | /.+\/.+/ matches "home/my" |
\* | Asterisks | /file\*name/ matches "file*name" |
\+ | Plus sign | /.+ \+ .+/ matches "five + four" |
\? | Question mark | /.+\?/ matches "really?" |
\b | Word boundary | /\bto/ matches "to" in "today" |
\B | Not word boundary | /\Bto/ matches "to" in "stove" |
\d | Digits 0 through 9 | /H\d/ matches "H3" |
\D | Not a digit | /H\D/ matche "HG" |
\s | Single white space | /cross\sbrowser/ matches "cross browser" |
\S | Non white space | /cross\Sbrowser/ matches "cross-browser" |
\w | Letters, numbers and underscore | /1\w/ matches "1b" or "1_" |
\W | Not Letters, numbers and underscore | /1\W/ matches "1%" |
\n | Uses as \1, etc. for memory sets | See Backreferences and Forward Looking ...table |
[...] | One of set | /th[eo]se/ matches "those" or "these" |
[^...] | Not one of set | /th[^eo]s/ matches "this" |
{n} | Exactly n preceding term | /\d{3}/ matches "123" |
{n,} | n or more preceding term | /\d{2,} matches "456" but not "4" |
{n,m} | At least n at most m | /\d{2,4} matches "123" |
| | or | /John|Sara/ matches "John" in "Call John and Bob." or "Sara" in "Also get Sara." |
^ | At beginning of line | /^Hi/ matches "Hi" in "Hi Mark" |
$ | At the end of a line | /Mark$/ matches "Mark" in "and Mark" |
This group includes subsets in "( )" that are backreferences, and, new to Javascript, two forward looking meta characters that qualify the preceding character matches by following characters
Character | Meaning | Example |
---|---|---|
(...) | Subset or memory numbered in order of openning parens |
/^(\w{5}).*(\1)$/ matches "there ... there" |
(?:...) | Non-capturing Subset not numbered |
/(\(d{5})(?:\.\d+) ([+-])/
For "12345.99 -" \1 or $1 = 12345 and \2 or $2 = "-" (instead of $3) |
(?=...) | lookahead to match this pattern | lookahead in not included in match |
(?!...) | lookahead to exclude this pattern | lookahead in not included in match |
quantifier? | non-greedy or lazy | /.*?\s/ For "book ruler pencil " matches "book " match(/.*\s/g) returns ["book ruler pencil "] array of 1 item match(/.*?\s/g) returns ["book ","ruler ","pencil "] array of 3 items |
Regular expression backreferences can be useful in many ways if you understand them. Each pair of parenthesis () is numbered from 1 starting with the set having the left most opening parenthesis. The content can be reference in the regular expression further to the right with \n where n is the number. In other uses, in JavaScript, the contents of the parethesis can be read by using $1. See the replace method and the RegExp object later on this page and in the examples.
Three flags modify the whole expression. Most commonly used are "g" for global matching and "i" for case insensitive matches. The global flag allows multiple replacements in the case of the replace method.
Flag | Meaning | Description |
---|---|---|
g | Global Search | The RegExp searches for a pattern throughout the string, replacing all occurance or creating an array of all matches. |
i | Ignore Case | The regular expression becomes case insensitive. |
m | Multiline Input | Allows an expression using ^ and/or $ to match text that is wrapped to multiple lines in a textarea |
Regular expressions are used exclusively on strings; however, this can be approached by way of the String object or the RegExp object. When using string methods the RegExp is an argument to the method,; whereas, when using RegExp methods the string is the argument. The string method match() and the RegExp method exec() return the same array.
Method | Description and Example |
---|---|
RegExp.test(string) | Tests if the given string matches the Regexp, and returns a boolean. var YesNo = /sample/.test("Sample text"); YesNo is false (because of case) |
RegExp.exec(string) | Applies the RegExp to the string and returns an array of the match information; null if no match. var List = /s(amp)le/i.exec("Sample text"); List contains ["Sample","amp"] |
RegExp.compile(RegExp, flags) | Changes the pattern of an existing RegExp object. Used in loops where the pattern changes. re.compile(/pattern/, flags); or re.compile(pattern-var, flag); |
String.match(RegExp) | Similer to exec(), but is a method of the string object not the RegExp object. var List = "An the key word is: FUN".match(/:\s(\w+)$/g); List contains [": FUN","FUN"] |
String.search(RegExp) | Returns the index of the beginning first match with the REgExp or -1. var ndx = "Where is a break point".search(/break/); ndx is 11 |
String.replace(RegExp,string|function) | Replaces first occurance (or every occurance with g flag) of pattern
string returning the edited string. var str = "Real programmers use Windows for work." str = str.replace(/Windows/,'Linux'); str is "Real programmers use Linux for work." var str = "Ace of spades and King of spades"; str = str.replace(/.*(spades|clubs).*\1/, "a pair of $1."); str is "A pair of spades." |
String.split(RegExp) | Cuts a string into an array, making cuts at pattern matches. var phParts = "(123) 456-7890".split(/\(|\)\s?|-/) phParts contains ["","123","456","7890"] |
The static (global) RegExp object has properties that contain useful information on the last regular expression executed. Data is for either RegExp methods or String methods. Not all properties are supported across the different browser brands. These properties are accessed using the static object as: RegExp.property or, using the short form, RegExp.$x for $_, and $n some and RegExp['$x'] for all.
Property (Short form) | Description |
---|---|
input ($_) | The source string to which the regular expression was applied |
lastMatch ($&) | Part of string that match the pattern of the last operation |
lastParen ($+) | String that matched last parenthesized subset. |
leftContext ($` ) | Part of input string that preceded any matched portion |
rightContext ($') | Part of input string that followed the last matched portion |
$1 ... $9 | First 9 parenthesized memory sets (see Backreferences and Forward Looking .... above) |