116 lines
4.2 KiB
Plaintext
116 lines
4.2 KiB
Plaintext
|
TYPES OF RESTRICTIONS
|
||
|
|
||
|
1) Content Restriction
|
||
|
|
||
|
Input: <property>, <text>, <fuzzy level>
|
||
|
|
||
|
Matches documents which contain <text> in <property>.
|
||
|
|
||
|
<property> may be any textual Ole property or a special property. The
|
||
|
special properties include CONTENTS (the main body of the document), ALL
|
||
|
(search all properties), and user-defined PSEUDO-PROPERTIES (text
|
||
|
distinguished for purposes of content search).
|
||
|
|
||
|
<fuzzy level> describes how exactly <text> has to match the document.
|
||
|
Fuzzy level 0 is exact match. Fuzzy level 1 is prefix match (each word
|
||
|
is treated as a prefix). Fuzzy level 2 is morphological stemming (run
|
||
|
would match run, running, ran, etc.)
|
||
|
|
||
|
The result of a content query may be out-of-date.
|
||
|
|
||
|
2) Property Restriction
|
||
|
|
||
|
Input: <property>, <relop>, <value>
|
||
|
|
||
|
Matches documents where <property> <relop> <value>.
|
||
|
|
||
|
<property> must be a true Ole property, or a few special properties that
|
||
|
are valid only in query results. The special properties are RANK (how
|
||
|
well the restriction matches the object), HITCOUNT (number of content
|
||
|
index 'hits'), and RANK VECTOR (for use with vector restriction)
|
||
|
|
||
|
<relop> is one of: <, <=, =, !=, >=, >, SOME OF, and ALL OF. The last
|
||
|
two are bitwise operations valid only for integer types. In C++ syntax,
|
||
|
SOME OF is (<property> & <value>) != 0, and ALL OF is (<property> &
|
||
|
<value>) == <value>.
|
||
|
|
||
|
<value> is a STGVARIANT.
|
||
|
|
||
|
The result of a property query always reflects the last saved state of
|
||
|
all objects.
|
||
|
|
||
|
TYPES OF INDEXES
|
||
|
|
||
|
1) Content Index
|
||
|
|
||
|
The content index is a mapping of <property>,<words> back to the
|
||
|
documents which contain <words> in <property>.
|
||
|
|
||
|
There is no scoping within the content index.
|
||
|
|
||
|
The content index is lazily updated. It may be out-of-date.
|
||
|
|
||
|
2) Value Index
|
||
|
|
||
|
A value index is a mapping from <property>,<range of values> back to the
|
||
|
documents which have a value within <range of values> for the
|
||
|
<property>.
|
||
|
|
||
|
In other words, the possible range of values for a data type
|
||
|
(VT_FILETIME, VT_I4, etc) is divided into "buckets". Every possible
|
||
|
value falls into one of these buckets. Note that the mapping is from
|
||
|
bucket to document, not value to document. A search for SIZE == 500
|
||
|
might map to a bucket from 250 to 525. So the result of index lookup
|
||
|
would be all files from SIZE 250 to 525, not just those having SIZE ==
|
||
|
500.
|
||
|
|
||
|
There is no scoping within a value index.
|
||
|
|
||
|
Value indices can be used in conjunction with content index. They are
|
||
|
lazily updated with the same frequency as content index.
|
||
|
|
||
|
There is no administration necessary to set up value indices. All
|
||
|
properties are value indexed except a few hard-coded exceptions. This
|
||
|
may change in the future.
|
||
|
|
||
|
3) View Index
|
||
|
|
||
|
A view index is a B-Tree. It contains a complete sorted list of files
|
||
|
for a single directory. Besides key columns, the view can contain
|
||
|
additional unsorted columns. These improve retrieval efficiency but
|
||
|
have less effect on query efficiency.
|
||
|
|
||
|
View indices must be created by an administrator.
|
||
|
|
||
|
4) Directory Index
|
||
|
|
||
|
Listed for completeness. This is a view index on the filename property.
|
||
|
It is always available.
|
||
|
|
||
|
RULES FOR MATCHING QUERY WITH INDEX (in order of precedence)
|
||
|
|
||
|
1) If a query contains a content restriction, use content index, adding
|
||
|
value indices if appropriate.
|
||
|
|
||
|
2) If one or more properties of a property restriction are used in the
|
||
|
sort order of a view index, and the query is shallow, then use view
|
||
|
index.
|
||
|
|
||
|
Note that properties of the view must be used in order. A view on SIZE
|
||
|
and FILENAME could be used for queries involving SIZE, and queries
|
||
|
involving both SIZE and FILENAME, but not for queries involving just
|
||
|
FILENAME.
|
||
|
|
||
|
If more than one view is applicable, then the view in which the most
|
||
|
keys of the sort appear in the restriction is used. Thus given two
|
||
|
views: SIZE, FILENAME and SIZE, ATTRIBUTES, a query for SIZE and
|
||
|
FILENAME would use the former.
|
||
|
|
||
|
3) If one or more properties of a property restriction is value indexed,
|
||
|
and the value index is reasonably up-to-date, and the query is
|
||
|
shallow, then use value indexing.
|
||
|
|
||
|
4) If 1, 2, and 3 do not apply, or if the volume is downlevel (not Ofs),
|
||
|
then use the directory index (e.g. enumeration).
|
||
|
|