Using Regular Expressions
Regular expressionstext strings that define a patternare an old acquaintance for seasoned programmers who have worked with UNIX shells or Web developers who have worked with Perl or JavaScript. Although Perl and JavaScript have natively supported regular expressions for a long time, regular expressions are new to VBScript. Microsoft added support for regular expressions in VBScript 5.0 and is expanding their capabilities in VBScript 5.5. (A free beta version of VBScript 5.5 is available from http://msdn.microsoft.com/scripting/.)
The introduction of regular expressions to VBScript is an important advancement. To use regular expressions, you first must understand what they are in general and how they work in VBScript. With this basic understanding, you can use regular expressions to perform advanced text processing, such as searching and replacing text and searching and modifying text.
Understanding Regular Expressions
By the means of special functions, you apply a regular expression to given text and verify whether the text matches the pattern. To use a regular expression, you need to set the pattern, then test whether the text matches that pattern. You receive a Boolean value of True if a match occurs and False if it doesn't.
You're probably using simple regular expressions already. When you type an MS-DOS command such as
Dir *.*
you're processing all the files and directories that match the *.* pattern. Thus, *.* is a simple regular expression that encompasses all the filenames formed by two strings (of any length and format) that are separated by a dot. The asterisk in this regular expression is a metacharacter. Metacharacters are characters that have a special meaning in a set of regular expressions. Metacharacters play roughly the same role as keywords in scripting languages.
The MS-DOS command prompt supports a limited set of regular expressions commonly called wildcard expressions. The MS-DOS wildcard expressions include only two metacharacters: * (represents any variable-length combination of letters and digits) and ? (represents a single occurrence of one character chosen from the set of letters and digits).
In applications, you typically use regular expressions that consist of a combination of metacharacters and constant strings. For example, consider the MS-DOS command
Dir a*.exe
which lists all the filenames that begin with the letter a and end with the .exe extension. The regular expression a*.exe consists of one metacharacter (*) and two constant strings (a and .exe). The MS-DOS command
Dir a?b*.exe
uses a regular expression that has two metacharacters (? and *) and three constant strings (a, b, and .exe). When you use this regular expression with the Dir command, you receive a list of filenames that begin with the letter a, have the letter b as the third character, and end with the .exe extension, such as abb.exe or axb.exe.
As these examples show, metacharacters are key to regular expressions. Each metacharacter has a special meaning that affects the way in which the underlying regular expression processor applies the pattern to the text you're checking.