Subscribe to Windows IT Pro
July 24, 2000 12:00 AM

Understanding VBScript: Using Regular Expressions

Windows IT Pro
InstantDoc ID #9170
Rating: (0)
Downloads
9170.zip

Searching and Modifying Text
Regular expressions are handy for searching and modifying text. For example, suppose you have text that contains URLs expressed in the short form without the protocol signature (e.g., www.expoware.com instead of http://www.expoware.com) and you want to automatically insert the http:// prefix in all of them. The text you're searching is If you comply with regular expression, go to www.regexp.com or www.re.com, so to trap all the URLs, you can set the pattern

regexp.Pattern = "www.\w+\.\w+"

The www.\w+\.\w+ pattern traps all the substrings that start with the www. expression and are followed by two dot-separated words. (This example doesn't consider URLs with more dots, such as www.xxx.co.uk. I'll discuss these URLs next month.)

At this point, you might be tempted to use the Replace method to add the http:// prefix. However, this method won't work. Although the Replace method would correctly find all the matching strings, it would replace them with the same constant string (i.e., the same URL). Instead, you need to modify, not replace, each matching string. To modify text, you use RegExp's Execute method.

The Execute method performs a regular expression search against a specified pattern, finds matches, and returns those matches in the Matches collection object. A Match object represents each match in the collection object. The Match object has three properties:

  • Value—returns the matching substring
  • Length—returns the matching substring's length
  • FirstIndex—returns the matching substring's position within the original text

As Listing 2 shows, you can use the Execute method to search the text and capture the matching URLs in the Matches collection. You can then use a For...Each statement to walk through each Match object (i.e., each URL) in the Matches collection and apply the Match object's Value property to return the URL.

At this point, you can apply the Replace function. However, the Replace function has a quirk: After the first match, the Replace function truncates any characters that precede the part of the string you're replacing. In other words, except for the first URL, you'll receive only a portion of the URL string and not the entire string. For this reason, you must first save the characters that the Replace function will truncate before you use that function. As callout A in Listing 2 shows, you use the Left function with the FirstIndex property to extract those characters and set them to the temp variable. You then concatenate the temp variable and the substring that the Replace function changes to obtain the full URL. Without regular expressions, you can't obtain this result so easily.

What's Next
Data validation is another area in which you can exploit the full power of regular expressions. Each time you need users to enter formatted data, you can define the input mask as a pattern and have RegExp parse the data for you. Next month, I'll show you how to enhance the InputBox function to make it support regular expressions and automatic data validation. I'll also show you how to use runtime code evaluation with text processing to create an improved version of the Replace function. With this subroutine, you can use pattern matching to identify candidates for replacement and runtime code evaluation to execute special code on each match.

Related Content:

ARTICLE TOOLS

Comments
  • Anonymous User
    7 years ago
    Jan 29, 2005

    ****

You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.