167 lines
13 KiB
HTML
167 lines
13 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
|
|
|
|
<html>
|
|
|
|
<head>
|
|
<title>Microsoft Index Server Guide: Understanding Index Server</title>
|
|
<meta name="FORMATTER" content="Microsoft FrontPage 1.1">
|
|
<meta name="GENERATOR" content="Microsoft FrontPage 1.1">
|
|
</head>
|
|
|
|
<body bgcolor="#FFFFFF">
|
|
<!--Headerbegin--><p align=center><a name="TOP"><img src="onepix.gif" alt="Space" align=middle width=1 height=1></a> <a href="default.htm#Top"><img src="toc.gif" alt=" Contents" align=middle border=0 width=89 height=31></a> <a href="front.htm"><img src="previous.gif" alt="Previous" align=middle border=0 width=32 height=31></a> <a href="install.htm"><img src="next.gif" alt="Next" align=middle border=0 width=32 height=31></a> </p>
|
|
<hr>
|
|
<!--Headerend--><p><a name="UnderstandingIndexServer"><font size=6><strong>Understanding Index Server</strong></font></a></p>
|
|
<p><!--Chaptoc--></p>
|
|
<blockquote>
|
|
<p><a href="intro.htm#QueryForms">Query Forms</a> <br>
|
|
<a href="intro.htm#BasicQueryingFeatures">Basic Querying Features</a> <br>
|
|
<a href="intro.htm#BasicIndexingFeatures">Basic Indexing Features</a> <br>
|
|
<a href="intro.htm#SupportforMultipleLanguages">Support for Multiple Languages</a> <br>
|
|
</p>
|
|
</blockquote>
|
|
<hr>
|
|
<!--ChaptocEnd--><p>Microsoft Index Server is the Microsoft content-indexing and searching solution for Microsoft Internet Information Server
|
|
(IIS) and Peer Web Services (PWS). An add-on module for IIS and PWS, Microsoft Index Server is designed to index the
|
|
full text and properties of documents on an IIS-based (or PWS-based) server. Index Server can index documents for both
|
|
corporate intranets and for any drive accessible through a uniform naming convention (UNC) path on the Internet.</p>
|
|
<p>Clients can formulate queries by using any World Wide Web (WWW) browser to fill in the fields of a simple Web query form.
|
|
The Web server forwards the query form to the query engine, which finds the pertinent documents and returns the results to
|
|
the client formatted as a Web page.</p>
|
|
<p>Unlike many content indexing systems, Index Server can index the text and properties of formatted documents, such as those
|
|
created by Microsoft® Word or Microsoft® Excel. This feature lets you publish existing documents on your intranet Web
|
|
without converting them to HyperText Markup Language (HTML).</p>
|
|
<hr>
|
|
<h1><a href="#TOP"><img src="up.gif" alt="To Top" align=middle border=0 width=14 height=11></a><a name="QueryForms">Query Forms</a></h1>
|
|
<p>Users submit their queries by filling out fields in a form. With Index Server, the administrator for a Web server can create
|
|
customized forms to help users find documents at the local site. The administrator can modify the form so that the user can
|
|
search by contents or by other document properties, such as author or subject. The administrator creates a query form with
|
|
standard HTML, and the form becomes little more than a Web page itself. Any user who knows how to create Web pages
|
|
with HTML can put together a simple query form in minutes.</p>
|
|
<hr>
|
|
<h1><a href="#TOP"><img src="up.gif" alt="To Top" align=middle border=0 width=14 height=11></a><a name="BasicQueryingFeatures">Basic Querying Features</a></h1>
|
|
<p>These are the basic features of a query:</p>
|
|
<ul>
|
|
<li>Scope</li>
|
|
<li>Restriction</li>
|
|
<li>Result set</li>
|
|
</ul>
|
|
<p>The <em>scope</em> tells the query engine where to look when searching. It describes the set of documents within the <a href="glossary.htm#Corpus">corpus</a> that will be
|
|
searched. The <em>restriction</em> tests to see if a document should be returned. A restriction is a set of terms that can be combined by
|
|
various operators. The <em>result set</em> defines the information to return from a query. </p>
|
|
<p>In addition to the basic features, other features let you control how results are returned and displayed, for example, how results
|
|
are sorted. You can also:</p>
|
|
<ul>
|
|
<li>Limit query to specific scopes</li>
|
|
<li>Search for words and phrases within document contents</li>
|
|
<li>Search for words or phrases near another word or phrase</li>
|
|
<li>Search for words and phrases within textual properties. (for example, @DocAuthor Sally)</li>
|
|
<li>Search for properties with <, <=, =, =>, > against a constant. (for example,. DATE > 1/1/95)</li>
|
|
<li>Apply Boolean operators, <strong>AND</strong>, <strong>OR</strong>, and<strong> NOT</strong></li>
|
|
<li>Search with wild cards ( for example, “*”, “?” and <a href="glossary.htm#regex">regular expressions</a>).</li>
|
|
<li>Fully integrate searches with Windows NT security model.</li>
|
|
<li>Rank hits by quality</li>
|
|
<li>Return specified property data.</li>
|
|
</ul>
|
|
<h2>Scope</h2>
|
|
<p>A query scope specifies the set of documents that must be searched. Typically scopes are specified by a directory path on a
|
|
storage volume, such as D:\Docs. IIS and PWS Web sites correspond to virtual roots that point to a collection of documents. </p>
|
|
<p>Index Server indexes documents based on sites. An administrator can index all the sites on a server, or select a subset of sites
|
|
to index. Queries can be run against multiple sites, against a single site, or even against a single physical directory within a site.</p>
|
|
<h2>Restriction</h2>
|
|
<p>You can query against the contents of Web pages and other documents served by IIS (or PWS) and Index Server. The types
|
|
of documents you can query include HTML, Microsoft® Word, Microsoft® Excel, Microsoft® PowerPoint®, and plain text
|
|
documents. Other document types are not supported by Index Server directly, but a content filter can extend the list of
|
|
supported document types. A content filter reads a proprietary document format and emits textual words, which are indexed
|
|
by Index Server. For more information on content filters, contact Microsoft and ask about the IFilter interface.</p>
|
|
<p>With Index Server you can search for multiple words and phrases within documents as well as words and phrases near other
|
|
words and phrases. Index Server also provides free-text queries. With <em>free-text queries</em>, you can enter any set of words or
|
|
phrases, or even a complete sentence, as the query restriction. Index Server will examine this text, identify all the nouns and
|
|
noun phrases, and post a query using those terms. For example, assume you typed the following free-text query:</p>
|
|
<blockquote>
|
|
<p><em>The Fulton County Grand Jury said Friday an investigation of Atlanta’s recent primary election
|
|
produced no evidence that any irregularities took place.</em></p>
|
|
</blockquote>
|
|
<p>Index Server identifies the following words and noun phrases:</p>
|
|
<blockquote>
|
|
<p><strong>Words: </strong>Fulton, county, grand, jury, Friday, investigation, Atlanta, recent, primary, election, produce,
|
|
evidence, irregularity</p>
|
|
<p><strong>Phrases: </strong>Fulton county grand jury, primary election, grand jury, Atlanta’s recent primary election</p>
|
|
</blockquote>
|
|
<p>These words and phrases are combined into a restriction, weighted for proper ranking, and posted as a query against the
|
|
corpus.</p>
|
|
<p><strong>Note</strong>   The free-text query is preceded by <em>$contents</em>.</p>
|
|
<h3>Property Restrictions</h3>
|
|
<p>In addition to querying contents, users can query properties stored on objects. These properties include file size, creation and
|
|
modification dates, file names, authors, and so on. Clients can query both textual properties (file name and author, for
|
|
example) and numerical properties (size and modification date, for example). Clients can also query all ActiveX™ properties,
|
|
including custom properties on Microsoft Office documents.</p>
|
|
<p>You can use the standard comparison operators in queries. These include =, >, <, >=, <=, and != (not equal) for numeric and
|
|
textual properties. In addition, for textual properties all the content query functionality is available. Properties can be compared
|
|
only to constants; you cannot compare one property to another in the first release of Index Server. With Boolean operators
|
|
(<strong>AND</strong>, <strong>OR</strong>, and<strong> NOT</strong>) and parentheses, you can freely mix restriction terms. </p>
|
|
<h3>Fuzzy Queries</h3>
|
|
<p>Index Server supports fuzzy queries, which contain simple wildcards (such as those in MS-DOS®) and matches <a href="glossary.htm#regex">regular
|
|
expressions</a> (from UNIX®) against textual properties. Content queries support simple-prefix matching (for example, “dog*”
|
|
will return “dogmatic” and “doghouse”). Index Server also supports linguistic stemming, which matches inflected and base
|
|
forms of query words. (For example, “swim**” is expanded to “swimming,” “swam,” “swum,” and so on.)</p>
|
|
<p>Although Index Server does not support true natural language processing, it supports free-text mode.</p>
|
|
<h2>Result Sets</h2>
|
|
<p>Index Server assembles query hits into result sets, which are returned to the client. The administrator can limit the maximum
|
|
number of hits returned to the client. For example, a result set of 200 hits can be returned the client in 10 pages of 20 hits
|
|
each. The query form determines the number of hits returned per page, but you can configure a form to let the client specify the
|
|
number of hits to be returned.</p>
|
|
<p>In addition to sorting by rank, Index Server can sort query results according to any document property.</p>
|
|
<p>If the corpus is stored on a Windows NT File System (NTFS) volume, Index Server respects all security restrictions—access
|
|
control list (ACL) checking is performed. In a result set, a user can never see a document reference if the ACL on that object
|
|
prohibits read access to that client.</p>
|
|
<p>If allowed, the client can specify the specific properties to return in a result set (that is, the columns in the result set). Any
|
|
property that is valid in a query restriction is valid as a result column. But the administrator can restrict the properties returned
|
|
by a query.</p>
|
|
<p>In addition to returning properties stored with the document, Index Server can generate document abstracts, which can also be
|
|
returned in a result set. An abstract briefly summarizes the content of a document. A document abstract can also be part of a
|
|
query restriction.</p>
|
|
<h2>Logging</h2>
|
|
<p>IIS and PWS already log all traffic moving between a client and the server. Standard IIS and PWS logging picks up query
|
|
information such as the querying IP address and the queries posted to the server.</p>
|
|
<hr>
|
|
<h1><a href="#TOP"><img src="up.gif" alt="To Top" align=middle border=0 width=14 height=11></a><a name="BasicIndexingFeatures">Basic Indexing Features</a></h1>
|
|
<p>These are the basic features of an index:</p>
|
|
<ul>
|
|
<li>Full text search in Web pages</li>
|
|
<li>Full text search in formatted data such as Microsoft Word or Microsoft Excel documents</li>
|
|
<li>Incrementally refreshing of indexes</li>
|
|
<li>Control of indexing for each virtual path</li>
|
|
<li>Indexing of property values</li>
|
|
<li>Indexing of text regardless of language</li>
|
|
<li>Automatic index updates</li>
|
|
<li>Performance monitoring</li>
|
|
<li>Zero-maintenance design, 24-hour reliablity</li>
|
|
<li>Multithreading to take advantage of SMP computers</li>
|
|
</ul>
|
|
<p>Indexes are controlled on each virtual path. An index is built over a set of directories (and their child directories). By default,
|
|
you can incrementally refresh an index—that is, refresh an index by indexing only changed files. Index Server does not need to
|
|
re-index all the documents to pick up a few changes.</p>
|
|
<p>With Index Server, a number of different performance monitors help administrators optimize their query service. These
|
|
monitors measure criteria such as the number of documents that need to be indexed and how fast queries are being processed.</p>
|
|
<p>By design, Index Server requires little if any maintanance. Once set up, all operations are automatic, including updates, index
|
|
creation and optimization, and even crash recovery if there is a power failure or if the index gets corrupted. Index Server was
|
|
designed from the start to work in mission-critical environments where the server must be running 24 hours a day, 7 days a
|
|
week.</p>
|
|
<hr>
|
|
<h1><a href="#TOP"><img src="up.gif" alt="To Top" align=middle border=0 width=14 height=11></a><a name="SupportforMultipleLanguages">Support for Multiple Languages</a></h1>
|
|
<p>Most Web pages today are authored in English, but many documents are not. Because IIS and PWS can serve documents,
|
|
multilingual indexing and querying features are a standard feature of Index Server. The query system was built with localization
|
|
in mind. It is completely modular and can dynamically load and unload language-specific utilities. These utilities include word
|
|
breakers, stemmers, and normalizers. These linguistic components are available for several languages.</p>
|
|
<p>Index Server can index multilingual documents and switch between languages as required (for example, index an English
|
|
paragraph, index a French paragraph, and switch back to English). All index information is stored as Unicode characters, and
|
|
all queries are converted to Unicode before they are processed.</p>
|
|
<!--Footerbegin--><hr>
|
|
<p align=center><a href="default.htm#Top"><img src="toc.gif" alt=" Contents" align=middle border=0 width=89 height=31></a> <a href="front.htm"><img src="previous.gif" alt="Previous" align=middle border=0 width=32 height=31></a> <a href="#TOP"><img src="up_end.gif" alt="To Top" align=middle border=0 width=32 height=31></a> <a href="install.htm"><img src="next.gif" alt="Next" align=middle border=0 width=32 height=31></a> </p>
|
|
<hr>
|
|
<p align=center><em>© 1996 by Microsoft Corporation. All rights reserved.<!--Footerend--></em></p>
|
|
</body>
|
|
|
|
</html>
|