Regular Expressions in JavaScript

Regular expressions can be used in the String.replace() method to cleanup and modify user entered data or data from external sources. This is handy for forms or when working with AJAX or XML building Web Applications. The RegExp object enables JavaScript to reformat character data with one line code where string methods would require several.

Trimming Leading and trailing white space  

Frequently it is necessary to remove leading, trailing or both leading and trailing spaces from character data before using it in code.

Select whether you want to trim the left spaces, right spaces, or both left and right space below. The test string and the trimmed string with quotes are displayed.

Enter Data with Leading and/or Trailing Spaces
Results
Select an Option and Click Trim



In each case, one of the following three functions were used. By putting the regular expressions in functions and in a library (*.js), they are both available throughout your code, and in one place for maintenance. Reusing tested code this way reduces programming bugs.

Note that the regular expression to trim both leading and trailing spaces requires the global flag g or it will only remove the leading spaces.

            function ltrim(str){
                return str.replace(/^\s+/, '');
            }
            function rtrim(str) {
                return str.replace(/\s+$/, '');
            }
            function alltrim(str) {
                return str.replace(/^\s+|\s+$/g, '');
            }

The \s meta character matches other white space characters including new lines and tabs. It will remove carrage returns.

Padding Strings  

Another common requirement is to pad strings (or strings representing numbers) to size with a filler character. try this out and then take a look at the code.

Enter String to Pad, Pad Char, and Length    
Results
Select an Option and Click Trim



There are two functions one to pad on the left and another to pad on the right. The process is straight forward. A RegExp object is created using the length value that looks like this:

/.{4}$/ for left pad and /^.{4}/ for right pad.

The expression takes 4 (or length) characters from the string to which it is applied. The $ indicates the characters must be from the right end of the string, and the ^ indicates the characters come from the left end. That handles left or right padding respectively. There is a simple do-while to create a string of filler characters the needed length. Last the RegExp is applied to string prefixed or suffixed with the filler string for left pad or right pad respectively.

        function padleft(val, ch, num) {
            var re = new RegExp(".{" + num + "}$");
            var pad = "";
            if (!ch) ch = " ";
            do  {
                pad += ch;
            }while(pad.length < num);
            return re.exec(pad + val)[0];
        }
        function padright(val, ch, num){
            var re = new RegExp("^.{" + num + "}");
            var pad = "";
            if (!ch) ch = " ";
            do {
                pad += ch;
            } while (pad.length < num);
            return re.exec(val + pad)[0];
        }

Center by Padding  

This could be called allPad, but padCenter is a more accurate description. The following regular expressions center a string by padding with a specified character on each side. Since it is possible that the string can't be exactly centered—because there is no half character—there is an option to put the extra character on the right; default is left.

Enter string to center, pad character, and Length    
Results
Select an Action

Well, actually the task is accomplished by replacing a substring of a string of pad characters with the new text. Since the pad string is composed of a single repeated character, we are looking to match equal lengths of padding characters on either side of a substring that is the length of the test being padded.

function padcenter(str, ch, size, extra2right) {
	var pad = "";
    var len = str.length;
    var re;

    if (extra2right)
        re = new RegExp("^(.*)(.{" + len + "})(\\1)");
    else
        re = new RegExp("(.*)(.{" + len + "})(\\1)$");

    do {
        pad += ch;
    } while (--size);

    return str.replace(re, "$1" + str + "$3");
}//eof padcenter

Both regular expressions created, match the three subsets required: left padding, center substring, and right padding.

The \\1 in the third subset matches exactly the first subset. Two backslashes are required here. The first escapes the second and only one is passed to the regular expression constructor.

Some combinations of total padded length and string length do not divided evenly. Those leave an extra character on one side. By default, this is on the left. If the extra character should be on the right, the expression is anchored on the left with the ^ carat. The, $, dollar sign is used to anchor the expression on the right. The extra character will be on the side opposite the anchor—^ or $.

The replace method's second argument is the value that replaces the string matched by the regular expression; in this case, "$1" + str + "$3". The $1 and $3 are the first and third subsets. The $1 has the same meaning as \1 within the expression: the first subset. And, $2, the middle subset, is the substring being replaced with str.

Extract the middle of a string  

This is a temporary detour if you're following content development along the page. We'll need to be able to extract a specific sized substring from the middle of a larger string as a fallback option later.

Enter String and number of characters to extract
Results
Select an Action

function extractMiddle(str, size, extra2right) {
	var len = Math.floor( (str.length - size)/2);

	if (extra2right)
		re = new RegExp("(.{" + len + "})(.{" + size + "})(.*)");
	else
		re = new RegExp("(.*)(.{" + size + "})(.{" + len + "})");

	return str.replace(re, "$2");
}//eof extractMiddle

The code to extract a specified length substring from the middle of a string is similar to the code to Center by Padding. The three differences are:

Center String in String of same character  

Enter string to center and string in which it is centered

Results
Select an Action

This is very much the same as Center by Padding. The difference is that the string in which the first argument will be centered is passed rather than passing a single character and generating the string.

If the string to be centered (inner string) is larger than the string it is to be centered in (outer string), the inner string is cut to the length of the outer string using extractMiddle method shown above. This matches the behavior of the methods padRight and padLeft, also shown above, which truncate strings longer than the padded length to the padded length.

function centerInStr(inStr, outStr, extra2right) {
	var re;
	var len = inStr.length;
	var size = outStr.length;
	var rtrnVal;

	if (len <= size) {
		if (extra2right)
			re = new RegExp("^(.*)(.{" + len + "})(\\1)");
		else
			re = new RegExp("(.*)(.{" + len + "})(\\1)$");

		rtrnVal = outStr.replace(re, "$1" + inStr +"$3");
	} else {
		rtrnVal = extractMiddle(inStr, size, extra2right);
	}
	return rtrnVal;
}//eof centerInStr

Since the outer string is composed of a single repeated character, the expression can use the meta character, \1 to match the first subset.

For a fuller explanation, see Center by Padding.

Center String in String of Mixed Characters  

This is similar to Center String in String of same character above, but the string in which the text is to be centered has a mix of characters.

Enter string to center and string in which it is centered

Results
Select an Action

Since there are different characters in the outer string, the first and third substrings won't match and the meta character \1 cannot be used. Instead, the code follows the extractMiddle method, and specifies one length.

If the string to be centered (inner string) is larger than the string it is to be centered in (outer string), the inner string is cut to the length of the outer string using the extractMiddle method shown above. This matches the behavior of the methods padRight and padLeft, also shown above, which truncate strings longer than the padded length.

function centerInStr2(inStr, outStr, extra2right) {
	var inSize = inStr.length;
	var outSize = outStr.length;
	var len = Math.floor( (outSize - inSize) /2);
	var re;
	var rtrnVal;

	if (inSize <= outSize) {
		if (extra2right)
			re = new RegExp("(.{"+len+"})(.{" + inSize + "})(.*)");
		else
			re = new RegExp("(.*)(.{" + inSize + "})(.{"+len+"})");

		rtrnVal = outStr.replace(re, "$1" + inStr + "$3");
	} else {
		rtrnVal = extractMiddle(inStr, outSize, extra2right);
	}

	return rtrnVal;
}//eof centerInStr2

Insert String in Middle of Another String  

This last example of inserting a substring into another string doesn't replace characters but returns a string with the combined length of the original strings.

Enter string to center and string in which it is centered

Results
Select an Action

This has elements of the methods shown above. All of the methods centering a string in another string are very similar and should be reviewed if this example is unclear.

The difference in this example is that the regular expression divides the outer string into two subsets. It returns a string composed of the two subset and the inserted string. The full string into which the text is inserted is matched by the regular expression and is replace with the larger replacement value: "$1" + inStr + "$2".

function centerInStr3(inStr, outStr, extra2right) {
    var outSize = outStr.length;
	var inSize = inStr.length;
    var len = Math.floor(outSize/2);
	var re;

	if (extra2right)
		re = new RegExp("(.{" + len + "})(.*)");
	else
		re = new RegExp("(.*)(.{" + len + "})");

	return outStr.replace(re, "$1" + inStr + "$2");
}//eof centerStr3

Convert Backslash to Forward Slash  

One of the trickiest characters in JavaScript is the Backslash (\). The other is the forward slash. The reason is that these characters have special meaning. The backslash is an escape character converting the following character to something else, and the pair is taken as one character. An example is \n for a new line character. Here is a Javascript regular expression to convert a backslash to a forward slash. The principle applies when dealing with special characters in general.

Enter String to Pad, Pad Char, and Length  
Results
Choose an Action

The regular expression is simple. The trick is to use a double backslash: the first one escapes the second causing it to be taken literially.

        function back2forward(dataStr) {
            return dataStr.replace(/\\/g, "/");
        }

The regular expression has the global flag (g) to convert all the backslashes. Without the global flag, only the first one is converted.

Convert Forward Slash to Backslash  

The same principle applies here as the previous example: use a backslash to escape the special character. Escaping a special character by preceding it with a backslash removes its special meaning.

        function forward2back(dataStr) {
            return dataStr.replace(/\//g, "\\");
        }

Convert carrage returns - line feeds to HTML <br />  

Replace carriage returns or newlines in text entered by users with <br /> for redisplay as HTML. This is useful in BBS, forum, and guest book applications.

Enter text and break the lines by pressing the Enter key. The converted data should have one and only one <br /> for each time the Enter key was pressed,

Put Line Breaks in Data by Pressing Enter:
Converted Data
Choose an Action

This is tricky because the character to be replace is not the same character or number of characters on Mac OS, Linux/Unix, and MicroSoft OS. Mac uses an ASCII 13 or carriage return, Linux/Unix uses an ASCII 10 or line feed sometimes called a new line, and Microsoft use both as CR LF.

This example uses a single RegExp shown below. Notice that the double character pattern must come first; otherwise, there will be two line breaks for each time the user pressed enter on a Windows client. The example uses the first option shown with this set [\r\n] pattern in this (\r\n|set) OR pattern. The second option using three OR choices worked as well.

Also, note this expression requires the global flag g to process more than one replacement.

    function return2br(dataStr) {
        return dataStr.replace(/(\r\n|[\r\n])/g, "<br />");
    }
or
    function return2br(dataStr) {
        return dataStr.replace(/(\r\n|\r|\n)/g, "<br />");
    }

This could also have been done using two expressions and converting the data twice, but the conversion for Microsoft style breaks has to be done first.

Convert MS new lines CRLF
    function ms_return2br(dataStr) {
        return dataStr.replace(/\r\n/, '<br />');
    }
Convert non-MS newlines either CR or LF
    function return2br(dataStr) {
        return dataStr.replace(/[\r\n]/g, "<br />");
    }
or
    function return2br(dataStr) {
        return dataStr.replace(/(\r|\n)/g, "<br />");
    }

Removing Formatting from Phone and SSN  

Phone Numbers and Social Security Numbers are customarily written with separator characters. In a variety of situations these need to be removed for processing or storage in a database. This next regular expression will remove parenthesis, periods, dashes, and white space.

Enter numbers with any of the customary separators and they'll be stripped out.

Enter a Formatted Number
Cleaned String
Choose an Action

Just one regular expession was used that had a set of characters to be replace, and the global flag. Two options are shown below: itemizing characters to be removed or removing any non-digit. This example uses the first option.

    Itemize characters to remove
            function cleanString (str) {
                return str.replace(/[\(\)\.\-\s,]/g, "");
            }
    Remove all non-digits
            function cleanString (str) {
                return str.replace(/[^\d]/g, "");
            }

Convert Alpha-numeric Phone Number to All Numeric  

This example is based on a request from a site visitor. He needed code to convert alpha-numeric phone numbers to all numbers; e.g., 1.800.FUN.SITE to 1.800.386.7483. Letters represent numbers between 2 and 9.

The key here is that the String.replace() method can take a function as the replacement argument. This increases the complexity of the transformations that can be done.

Enter an Alpha-numeric Phone Number
Converted Number
Choose an Action
function alpha2numericPhone(phoneStr) {
    var newStr = phoneStr.replace(/[a-zA-Z]/g, alpha2number);
    
    return checkReplaceParm(newStr);
    function alpha2number(char) {
        var rtrnVal = null;
        switch (char.toLowerCase()) {
        case "a": case "b": case "c":
            rtrnVal = "2";
            break;
        case "d": case "e": case "f":
            rtrnVal = "3";
            break;
        case "g": case "h": case "i":
            rtrnVal = "4";
            break;
        case "j": case "k": case "l":
            rtrnVal = "5";
            break;
        case "m": case "n": case "o":
            rtrnVal = "6";
            break;
        case "p": case "q": case "r": case "s":
            rtrnVal = "7";
            break;
        case "t": case "u": case "v":
            rtrnVal = "8";
            break;
        case "w": case "x": case "y": case "z":
            rtrnVal = "9";
            break;
        }
        return rtrnVal;
    }
}

The regular expression /[a-zA-Z]/g finds all the letters in the string. The trick is using a function as the replacement value in the call to the replace method. The switch case logic is trivial. It is just necessary to know that the first argument passed by replace to alpha2number is the matched string.

In this example, the matched string is the character in which we are interested. If the RegExp where more complex, the first argument would be the full matched string followed by each parenthetical match, the offset of the match and the full string. The offset of the match is always the second to last argument and the last argument is always the full string.

Whatever is in the first argument is what will be replaced. Parenthetical arguments allow more complex logic in the conversion by parsing the matched string.

In this example, alpha2number is a nested function. That helps encapsulate all the code for this task—since alpha2number is not likely to used outside of alpha2numericPhone—, but it is not necessary.

An additional function checkReplaceParm is used to test that the browser supports using a function as an argument to the replace method. Very old browsers will replace the character with the function string rather than executing the function.

   function checkReplaceParm(str) {
        /* Check browser supports functions in replace */
        if (/^\s*function\s+alpha2number/.test(str)) {
            alert("This browser does not support using a function as a parameter for replace.");
            return "";
        }
        else {
            return str;
        }
    }

This test is less likely to be needed today than it would have been a few years ago. I often leave it out, but that depends on the browsers you need to support.

Capitalize Each Word in a String  

This example capitalizes the first letter in every word in a string. The take home on this is not how to capitalize every word in a string (although, that can be useful), but that the String.replace() method can have a function as its second argument. This expands the magnitude of complexity of the transformations that can be performed.

Enter String:
Converted Data
Choose an Action

For the example, the function that actually changes the letter's case was declared as a normal nested function. However, it could have been an anonymous function initialized within the replace function call. The recipe, Replace tag brackets, demonstrates this. Also, note the use of the word boundary meta-character \b to pick out the beginning of words.

    function cnvrt2Upper(str) {
        return str.toLowerCase().replace(/\b[a-z]/g, cnvrt);
        function cnvrt() {
            return arguments[0].toUpperCase();
        }
    }

Convert String to Title  

This example converts a string to title case (i.e. the first letter of each word is capitalized except if the word is an article or preposition unless it is the first word, which is always capitalized). The take home on this is how to nest replace method calls.

Also, the second to last argument passed to the function by replace() method holds the position where the substring was found. The first argument is the matched string—in this case, each word—followed by strings match by parenthesized subexpressions (not used here). The last argument is the original string.

Enter a Title:
Converted Data
Choose an Action

Note the use of the word boundary meta-character \b to pick out words, and how the position of the word in the string is tested by evaluating the second to last item of the arguments object.

    function cnvrt2title(str) {
        return str.toLowerCase().replace(/\b\w+\b/g, cnvrt);
        function cnvrt() {
            if (arguments[arguments.length -2] == 0)
                return arguments[0].replace(/^[a-z]/, cnvrt2);
            else if (/^(a|about|after|an|and|at|by|for|from|in|into|nor|of|on|onto|over|the|to|up|with|within)$/.test(arguments[0]) )
                return arguments[0];
            else
                return arguments[0].replace(/^[a-z]/, cnvrt2);
        }
        function cnvrt2() {
            return arguments[0].toUpperCase();
        }
    }

The first convrt function determines if this is the first word by testing the value of the second to last item in the arguments object. It then tests if the word is in a list of those that should not be converted. Here a reqular expression is used since this is a section on regular expressions but any method of lookup can be used. If the word should be capitalized, then a function similar to the previous example is called.

Using a lists of words is not exact since English rules for capitalization depend on how the word is used. But, it is close enough.

Parenthetical subexpressions can be used to elimimate the need for the second replace method call using the following code.

function cnvrt2title(str) {
    var re = new RegExp(/^(a|about|after|an|and|at|by|for|from|in|into|nor|of|on|onto|over|the|to|up|with|within)$/);        
    return str.toLowerCase().replace(/\b([a-z])(\w*)\b/g, cnvrt);
    function cnvrt() {
        if (re.test(arguments[0]) && arguments[arguments.length-2])
            return arguments[0];
        else
            return arguments[1].toUpperCase() + arguments[2];
    }
}

The key changes are highlighted. The first character and the reset of the word are parsed by the parenthesis.

There some efficiency gains in this last example because the regular expression is created only once. The nested function has access to the local variables of the parent function. The performance improvement may be minimal in most cases, but if the transformation was called many times it can add up.

Replace tag brackets  

HTML entered in user input fields can create problems when redisplayed on a Web page. It would be nice to have a way to change the brackets into entities when submitting the form. The following function does just that- replace "<" and ">" with "&lt;" and "&gt;", respectively. You can also choose to do this on the server with the same or similar regular expression.

Enter Test data with HTML Tags
Converted Data
Choose an Action

This example uses the same technique of the previous example. The replacement value is a function rather than a string.

    function html2entity(str){
        return str.replace(/[<>]/g,
            function(s){return (s == "<")? "&lt;" :"&gt;"});
    }

Inverting words or characters  

This next example uses a data parsing technique to invert the order of names and add or remove the comma.

Enter Names to Be Inverted
Converted Data
Choose an Action

Both expressions are based on some assumptions about the data. However, notice that last_first expression is sensitive to having any character other than white space between the two words. While first_first lets the comma be optional.

        function first_first (name) {
            return name.replace(/^([^,]*)[,\s]+(\w+)/, "$2 $1");
        }
        function last_first (name) {
            return name.replace(/(.+)\s+([\w']+)$/, "$2, $1");
        }

This last piece of code is our segue into the next section: Parsing Data with Regular Expressions