Query Forms
Basic Querying Features
Basic Indexing Features
Support for Multiple Languages
Microsoft Index Server is the Microsoft content-indexing and searching solution for Microsoft Internet Information Server (IIS) and Peer Web Services (PWS). An add-on module for IIS and PWS, Microsoft Index Server is designed to index the full text and properties of documents on an IIS-based (or PWS-based) server. Index Server can index documents for both corporate intranets and for any drive accessible through a uniform naming convention (UNC) path on the Internet.
Clients can formulate queries by using any World Wide Web (WWW) browser to fill in the fields of a simple Web query form. The Web server forwards the query form to the query engine, which finds the pertinent documents and returns the results to the client formatted as a Web page.
Unlike many content indexing systems, Index Server can index the text and properties of formatted documents, such as those created by Microsoft® Word or Microsoft® Excel. This feature lets you publish existing documents on your intranet Web without converting them to HyperText Markup Language (HTML).
Users submit their queries by filling out fields in a form. With Index Server, the administrator for a Web server can create customized forms to help users find documents at the local site. The administrator can modify the form so that the user can search by contents or by other document properties, such as author or subject. The administrator creates a query form with standard HTML, and the form becomes little more than a Web page itself. Any user who knows how to create Web pages with HTML can put together a simple query form in minutes.
These are the basic features of a query:
The scope tells the query engine where to look when searching. It describes the set of documents within the corpus that will be searched. The restriction tests to see if a document should be returned. A restriction is a set of terms that can be combined by various operators. The result set defines the information to return from a query.
In addition to the basic features, other features let you control how results are returned and displayed, for example, how results are sorted. You can also:
A query scope specifies the set of documents that must be searched. Typically scopes are specified by a directory path on a storage volume, such as D:\Docs. IIS and PWS Web sites correspond to virtual roots that point to a collection of documents.
Index Server indexes documents based on sites. An administrator can index all the sites on a server, or select a subset of sites to index. Queries can be run against multiple sites, against a single site, or even against a single physical directory within a site.
You can query against the contents of Web pages and other documents served by IIS (or PWS) and Index Server. The types of documents you can query include HTML, Microsoft® Word, Microsoft® Excel, Microsoft® PowerPoint®, and plain text documents. Other document types are not supported by Index Server directly, but a content filter can extend the list of supported document types. A content filter reads a proprietary document format and emits textual words, which are indexed by Index Server. For more information on content filters, contact Microsoft and ask about the IFilter interface.
With Index Server you can search for multiple words and phrases within documents as well as words and phrases near other words and phrases. Index Server also provides free-text queries. With free-text queries, you can enter any set of words or phrases, or even a complete sentence, as the query restriction. Index Server will examine this text, identify all the nouns and noun phrases, and post a query using those terms. For example, assume you typed the following free-text query:
The Fulton County Grand Jury said Friday an investigation of Atlantas recent primary election produced no evidence that any irregularities took place.
Index Server identifies the following words and noun phrases:
Words: Fulton, county, grand, jury, Friday, investigation, Atlanta, recent, primary, election, produce, evidence, irregularity
Phrases: Fulton county grand jury, primary election, grand jury, Atlantas recent primary election
These words and phrases are combined into a restriction, weighted for proper ranking, and posted as a query against the corpus.
Note The free-text query is preceded by $contents.
In addition to querying contents, users can query properties stored on objects. These properties include file size, creation and modification dates, file names, authors, and so on. Clients can query both textual properties (file name and author, for example) and numerical properties (size and modification date, for example). Clients can also query all ActiveX properties, including custom properties on Microsoft Office documents.
You can use the standard comparison operators in queries. These include =, >, <, >=, <=, and != (not equal) for numeric and textual properties. In addition, for textual properties all the content query functionality is available. Properties can be compared only to constants; you cannot compare one property to another in the first release of Index Server. With Boolean operators (AND, OR, and NOT) and parentheses, you can freely mix restriction terms.
Index Server supports fuzzy queries, which contain simple wildcards (such as those in MS-DOS®) and matches regular expressions (from UNIX®) against textual properties. Content queries support simple-prefix matching (for example, dog* will return dogmatic and doghouse). Index Server also supports linguistic stemming, which matches inflected and base forms of query words. (For example, swim** is expanded to swimming, swam, swum, and so on.)
Although Index Server does not support true natural language processing, it supports free-text mode.
Index Server assembles query hits into result sets, which are returned to the client. The administrator can limit the maximum number of hits returned to the client. For example, a result set of 200 hits can be returned the client in 10 pages of 20 hits each. The query form determines the number of hits returned per page, but you can configure a form to let the client specify the number of hits to be returned.
In addition to sorting by rank, Index Server can sort query results according to any document property.
If the corpus is stored on a Windows NT File System (NTFS) volume, Index Server respects all security restrictionsaccess control list (ACL) checking is performed. In a result set, a user can never see a document reference if the ACL on that object prohibits read access to that client.
If allowed, the client can specify the specific properties to return in a result set (that is, the columns in the result set). Any property that is valid in a query restriction is valid as a result column. But the administrator can restrict the properties returned by a query.
In addition to returning properties stored with the document, Index Server can generate document abstracts, which can also be returned in a result set. An abstract briefly summarizes the content of a document. A document abstract can also be part of a query restriction.
IIS and PWS already log all traffic moving between a client and the server. Standard IIS and PWS logging picks up query information such as the querying IP address and the queries posted to the server.
These are the basic features of an index:
Indexes are controlled on each virtual path. An index is built over a set of directories (and their child directories). By default, you can incrementally refresh an indexthat is, refresh an index by indexing only changed files. Index Server does not need to re-index all the documents to pick up a few changes.
With Index Server, a number of different performance monitors help administrators optimize their query service. These monitors measure criteria such as the number of documents that need to be indexed and how fast queries are being processed.
By design, Index Server requires little if any maintanance. Once set up, all operations are automatic, including updates, index creation and optimization, and even crash recovery if there is a power failure or if the index gets corrupted. Index Server was designed from the start to work in mission-critical environments where the server must be running 24 hours a day, 7 days a week.
Most Web pages today are authored in English, but many documents are not. Because IIS and PWS can serve documents, multilingual indexing and querying features are a standard feature of Index Server. The query system was built with localization in mind. It is completely modular and can dynamically load and unload language-specific utilities. These utilities include word breakers, stemmers, and normalizers. These linguistic components are available for several languages.
Index Server can index multilingual documents and switch between languages as required (for example, index an English paragraph, index a French paragraph, and switch back to English). All index information is stored as Unicode characters, and all queries are converted to Unicode before they are processed.
© 1996 by Microsoft Corporation. All rights reserved.