Last month, I went over some of the basics of Microsoft Index Server. I showed you how the default Web catalog is created, what it does, and how you can add and remove directories from its scope. The Microsoft Management Console (MMC) Index Server Services (ISS) snap-in lets you add directories to existing catalogs and create new catalogs. Index Server also has a set of HTML administration pages that give you a wide range of virtual root information and index statistics on the default Web catalog. Also, several example search pages exist that range from basic to complex Microsoft SQL Server ad hoc query builders that you can use as templates for building search pages.
This month, I build on the information in last month's article. I show you how to
- Build a new custom catalog for a Web domain of your choice.
- Add directories to the catalog that are outside that Web domain and exclude directories that are inside the Web domain.
- Modify one of Index Server's example search pages, Query.asp, to search the domain using your new catalog.
Building a Web Catalog
Before you begin building a Web catalog, you must first open the IIS and Index Server snap-ins. You can open the IIS snap-in in MMC by choosing Start, Programs, Windows NT 4.0 Option Pack, Microsoft Internet Information Server, Internet Service Manager. To open the Index Server snap-in, choose Start, Programs, Windows NT 4.0 Option Pack, Microsoft Index Server, Index Server Manager. The Index Server snap-in opens in MMC. Next, choose Stop from the Action menu to stop Index Server. (If you've upgraded to Index Server 2.0 and MMC 1.1, you administer Index Server from its instance in MMC, as I do in this article.) Index Server's short name in the system is Content Index (CI), so the snap-in is called Ciadmin. You'll see references to CI in your event logs every time Index Server updates an index.
To begin catalog building, right-click Index Server on Local Machine and select New, Catalog, as Figure 1 shows. The Add Catalog dialog box, which Figure 2 shows, appears. Enter a name for your new catalog, and select a directory location in which to store the catalog. This directory is important because the CI internal parameters use the directory, not the catalog name, for searches. When you enter the information and click OK, you see a message that the catalog will remain offline until you restart Index Server. After you restart Index Server, the catalog entry appears in Ciadmin. The problem is that by default, the catalog is pointing at (tracking) the default Web site on the server machine, just like the Web catalog.
To change the Web domain that the catalog is tracking, right-click the new catalog and select Properties. Choose the Web tab, which Figure 3 shows. By default, the Track Virtual Roots check box, and the default Web site are selected. Click the drop-down list box, and select the virtual root of your choice. The catalog Properties dialog box contains two other tabsLocation and Generation. When you choose the Location tab, you can't change the location of the catalog. You must delete the catalog and recreate it if you want to change its location.
The Generation tab, which Figure 4 shows, lets you specify whether you want to filter files with unknown extensions. If you select the Filter Files with unknown extensions check box, the Indexer ignores files that have extensions you haven't specified. If this check box is clear, the Indexer attempts to index every file it finds in every directory in the scope. The other option on this tab is Generate characterizations. Characterizations (also called abstracts) are the bit of text (of maximum size) that appears under the document title. Index Server draws this text from different places depending on the document filter you're using. For example, in an .html document, the filter populates the characterization from the Description metatag. If no description metatag exists, then the results can be unpredictable depending on the actual HTML in the document. In general, if the document property or HTML element doesn't exist, the Indexer simply takes the maximum number of characters from the beginning of the document. Similarly, Index Server takes the title from the <title> element in the .html document or the Title property in a Microsoft Word document.
Excluding and Adding Directories
To refine your catalog, you can exclude or add directories. You can exclude subdirectories and virtual directories from the catalog's scope in a couple of ways. The first way is through the IIS snap-in in MMC; the second way is through the Ciadmin snap-in.
Excluding a directory through the IIS snap-in. To exclude a directory through the IIS snap-in, select a directory under the virtual root for which you built the catalog but that you don't want indexed (e.g., Search). Right-click the directory, and select Properties. From the Directory tab of the Properties dialog box, clear the Index this directory check box. This method is effective, but it's sometimes tedious to test because you have to view the properties for each subdirectory to see whether it's excluded.
Excluding a directory through the Ciadmin snap-in. Click the plus sign (+) next to your new catalog name, then right-click the Directories folder. Select Add Directory to bring up the Add Directory dialog box, which Figure 5 shows.
In this example, I've added three directories to the catalog. The directory that Figure 5 shows is unique because it's an excluded directory. (Notice that I've selected the Exclude option under Type.) By default, Index Server would index the d:\Ideva\cybercash subdirectory as part of the Ideva domain, but Exclude tells CI not to index this subdirectory. As a result, I can come to one screenthe Ciadmin console, which Figure 6 showsand view the entire scope. The other two directories in Figure 6 (i.e., g:\testers
paradise and d:\TechArticles) are directories that are outside this Web domain. TechArticles is on the same drive in the server, but it isn't a subdirectory or virtual directory of the Ideva domain. Testersparadise, a Web domain on a different server, is identified by its Alias (Uniform Naming ConventionUNC) name. Alias (UNC), which is optional, is the server name and path for this directory (e.g., \\hudson\dtestersparadise). What you enter in the Alias field is returned to the client who is executing a query and gets a hit from this directory. Alias (UNC) is useful in an intranet setting in which a user needs direct access to this directory. On the Internet, I consider this information private and don't include it. Don't worry: Users can't see documents or even references to documents that they don't have permission to see. (I'll talk more about security at the end of the article.)
When you're finished adding directories, you must stop and restart Index Server so that it can update all its indexes. After the restart, close the Ciadmin MMC console and reopen it to force it to refresh its directory listings. When everything is updated, you'll see your new catalog and the directories. The Ideva Web domain shows up as the folder with the globe in Figure 6.
Inheritance overrides. Here's something to note: In the IIS snap-in in MMC, I selected the Index this directory check box on the Home Directory tab of the Properties dialog box for the Ideva domain. If I hadn't selected this check box in the IIS snap-in, the Ideva site wouldn't show up in the directories list in the ideva_index catalogonly the directories that I added would appear. See "The Basics of Index Server," July 2000 for information about this setting and about the Inheritance Overrides dialog box, which Figure 7 shows, that opens in the IIS snap-in when you change this setting. Inheritance Overrides gives you the opportunity to change the default Excluded status on any specially marked directories in the domain, such as the Microsoft FrontPage private directory _vti_bin.
Modifying Query.asp
Index Server includes some nice sample search pages. You'll find several versions of these sample pages in the various sample sites that Microsoft has shipped (e.g., EXAir). The easiest way to find them is to look in the default Web site in the \iissamples\
isssamples folder. The sample pages can be hard to understand because they rely on a server having only one Web domainthe default Web siteand that is all you can search with them. Virtually no documentation exists about how to set up one of the fancy search pages to use a specific catalog. However, I'll tell you what I found out.
In the beginning (Index Server 1.xwhen Microsoft wrote the NT 4.0 Option Pack documentation), each server machine had only one Web domain on it. As a result, the default Web catalog and the Registry default values worked perfectly. But how do you tell Index Server to look at a different catalog? The magic missing parameter is called CICatalog. (See the sidebar, "Additional Resources," page 12 for more information about CI parameters and Registry settings.)
When Microsoft wrote the sample pages, Active Server Pages (ASP) was in its infancy, all script was VBScript, and people used primarily Internet Server API (ISAPI) to interact with the IIS server. At that time, using Internet Data Query (IDQ) files to send queries to the Index Server was the preferred method for searching a catalog. Next, you used an HTML Extension (.htx) file to format the output back to the user. Of course, Query.asp has no reference to the CICatalog parameter. Because everyone uses ASP these days, I decided to make the Query.asp example work in my domain with my new catalog.
Anonymous User October 26, 2004 (Article Rating: