Subscribe to Windows IT Pro
July 09, 2003 12:00 AM

The Scripting Dictionary Makes It Easy

Look it up
Windows IT Pro
InstantDoc ID #39312
Rating: (0)
Downloads
39312.zip

Differences Between Files
Systems administrators often need to compare and contrast a couple of files. For example, you might have one file that contains unique usernames and unique machine names and a second file that contains unique machine names and IP addresses, and you want to find the machines in file 1 that don't have an IP address in file 2 and the machines in file 2 that don't have a corresponding username in file 1. Or you might have a case like mine, in which two files with hundreds of thousands of keys and values in lines of data were supposed to contain the same values but were unsorted and had missing data and thus were more difficult to compare. Let's take a look at how you tackle these kinds of problems.

The second problem is easier to solve, so let's look at that one first. You can't just sort the files and compare them line by line because after you run into a missing line in one of the files, every subsequent line generates an error. You could work around this problem by writing a routine that, when it encounters a missing line, skips that line and checks the next one. You could also read the files into an array and have a couple of loops check every item in array 1 to see whether it's in array 2, but the Dictionary object lets you write much simpler code.

To start, create two Dictionaries and populate them with the contents of the files, which are comma-separated value (CSV) files in this case. Then, check every item in Dictionary 1 to see whether it's in Dictionary 2, and vice versa, writing results to the standard output in both cases. The script in Listing 5 starts by defining three constants for the files—two input .csv files and one text file for the results—and the two usual FileSystemObject constants for reading and writing to files. After declaring the variables, the script creates the Scripting::FileSystemObject object and opens the three files—two for reading and one for writing. The script then creates each Dictionary in turn and uses two While...Wend loops to go through the two input files one line at a time and read the line into the strLine variable until the script is at the end of the file. The script then uses the VBScript Split function with the comma as a separator to pass the key and item to the Dictionary::Add method. After each While loop has finished, the script closes the relevant file. Now that the Dictionaries are populated, the script needs to compare them.

A For Each...Next loop goes through dicData1 first, pulling out each key into the strKey variable in turn. The Dictionary::Exists method then checks whether the key from dicData1 is also in dicData2. If the key doesn't exist in dicData2, the loop writes a line to the result file stating that the key exists only in dicData1. If the key exists in both Dictionaries, the loop uses the Dictionary::Item method with strKey as the argument to compare the items. If the items are the same, the loop takes no action. However, if they're different, the loop prints that fact.

After the first loop has checked every key in dicData1 to see whether it's also in dicData2, a second loop checks whether any keys exist in dicData2 that don't exist in dicData1. If the loop finds any such keys, it prints them to the result file. Finally, the script uses the Dictionary::Count property to print a line count of both input files and closes the result file. You can modify the output text as you see fit—even printing the items out if you want to. You can download two small sample .csv files—Input1.csv and Input2.csv—from the Code Library on the Windows Scripting Solutions Web site (http://www.winscriptingsolutions.com, InstantDoc ID 39312).

The script that Listing 6 shows solves the first problem I mentioned at the start of this section. The Users.csv file contains username and PC name pairs, and the Clients.csv file contains PC name and IP address pairs. Apart from the script using different variables (e.g., dicData1 instead of dicUsers), the script's set of For...Each loops is similar to that of the script in Listing 5. This time, however, the key of dicClients is the item of dicUsers, so the loops are modified to take account of that. The first loop retrieves each username (strKey) and each PC (strItem) in the Users Dictionary and checks to determine whether strItem exists in the Clients Dictionary as a key. If not, the loop writes that fact to the result file. The second loop walks the Clients dictionary. This time, each client (strKey) is an item in the Users dictionary. To check to determine whether this key corresponds to any items in the Users dictionary, the loop uses the Dictionary::Items method to retrieve an array of all the items. The loop then goes through all the items (strItem) in the array to determine whether strKey matches any of them. If it does, the loop sets the variable bolFound to TRUE. (I previously initialized it to FALSE.) After the loop has checked all the Clients Dictionary items against the Users Dictionary, if the bolFound variable is still FALSE, the loop writes out that the PC exists only in the Clients Dictionary. You can download sample Clients.csv and Users.csv files from the Code Library.

I leave you with a quick statistic: In my testing, I found that using Dictionaries was eight times faster than using arrays. When you're comparing tens of thousands of entries, that kind of performance improvement really makes a difference.

Related Content:

ARTICLE TOOLS

Comments
  • Jim Hunter
    9 years ago
    Jul 16, 2003

    When running the following code from this article

    When running the following code from this article:



    25: While Not filInput1.AtEndOfStream

    26: strLine = filInput1.ReadLine

    27: dicData1.Add Split(strLine,",")(0), Split(strLine,",")(3)

    28: Wend



    I get the error: _05.vbs(27, 2) Microsoft VBScript runtime error: This key is already associated with an element of this collection. Any ideas on why I get this error? The line numbers were added by me after running code. I used a comma deliminated file with a .txt ending not .csv.


You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.