windows-nt/Source/XPSP1/NT/inetsrv/query/fsci/dll/strategy.txt

116 lines
4.2 KiB
Plaintext
Raw Permalink Normal View History

2020-09-26 03:20:57 -05:00
TYPES OF RESTRICTIONS
1) Content Restriction
Input: <property>, <text>, <fuzzy level>
Matches documents which contain <text> in <property>.
<property> may be any textual Ole property or a special property. The
special properties include CONTENTS (the main body of the document), ALL
(search all properties), and user-defined PSEUDO-PROPERTIES (text
distinguished for purposes of content search).
<fuzzy level> describes how exactly <text> has to match the document.
Fuzzy level 0 is exact match. Fuzzy level 1 is prefix match (each word
is treated as a prefix). Fuzzy level 2 is morphological stemming (run
would match run, running, ran, etc.)
The result of a content query may be out-of-date.
2) Property Restriction
Input: <property>, <relop>, <value>
Matches documents where <property> <relop> <value>.
<property> must be a true Ole property, or a few special properties that
are valid only in query results. The special properties are RANK (how
well the restriction matches the object), HITCOUNT (number of content
index 'hits'), and RANK VECTOR (for use with vector restriction)
<relop> is one of: <, <=, =, !=, >=, >, SOME OF, and ALL OF. The last
two are bitwise operations valid only for integer types. In C++ syntax,
SOME OF is (<property> & <value>) != 0, and ALL OF is (<property> &
<value>) == <value>.
<value> is a STGVARIANT.
The result of a property query always reflects the last saved state of
all objects.
TYPES OF INDEXES
1) Content Index
The content index is a mapping of <property>,<words> back to the
documents which contain <words> in <property>.
There is no scoping within the content index.
The content index is lazily updated. It may be out-of-date.
2) Value Index
A value index is a mapping from <property>,<range of values> back to the
documents which have a value within <range of values> for the
<property>.
In other words, the possible range of values for a data type
(VT_FILETIME, VT_I4, etc) is divided into "buckets". Every possible
value falls into one of these buckets. Note that the mapping is from
bucket to document, not value to document. A search for SIZE == 500
might map to a bucket from 250 to 525. So the result of index lookup
would be all files from SIZE 250 to 525, not just those having SIZE ==
500.
There is no scoping within a value index.
Value indices can be used in conjunction with content index. They are
lazily updated with the same frequency as content index.
There is no administration necessary to set up value indices. All
properties are value indexed except a few hard-coded exceptions. This
may change in the future.
3) View Index
A view index is a B-Tree. It contains a complete sorted list of files
for a single directory. Besides key columns, the view can contain
additional unsorted columns. These improve retrieval efficiency but
have less effect on query efficiency.
View indices must be created by an administrator.
4) Directory Index
Listed for completeness. This is a view index on the filename property.
It is always available.
RULES FOR MATCHING QUERY WITH INDEX (in order of precedence)
1) If a query contains a content restriction, use content index, adding
value indices if appropriate.
2) If one or more properties of a property restriction are used in the
sort order of a view index, and the query is shallow, then use view
index.
Note that properties of the view must be used in order. A view on SIZE
and FILENAME could be used for queries involving SIZE, and queries
involving both SIZE and FILENAME, but not for queries involving just
FILENAME.
If more than one view is applicable, then the view in which the most
keys of the sort appear in the restriction is used. Thus given two
views: SIZE, FILENAME and SIZE, ATTRIBUTES, a query for SIZE and
FILENAME would use the former.
3) If one or more properties of a property restriction is value indexed,
and the value index is reasonably up-to-date, and the query is
shallow, then use value indexing.
4) If 1, 2, and 3 do not apply, or if the volume is downlevel (not Ofs),
then use the directory index (e.g. enumeration).