Names Section
Query Section
Effect of Parameters on Query Performance
Internet Data Query files (files with an .idq extension) for Microsoft Index Server (together with the form parameters) specify the query that Microsoft Index Server will run. The .idq file is divided into two sections, the names section and the query section. The names section is optional, and need not be supplied for standard queries.
Note All paths to .idq files must be the full path name from a virtual root, not a relative path or a physical path. In other words, all paths must start with a slash and cannot contain . or .. components. See the following examples:
Valid Paths
/scripts/myquery.idq
/scripts/samples/search/query.idq
Invalid Paths
c:\inetsrv\scripts\myquery.idq
scripts/query.idq
/samples/../scripts/query.idq
The .idq files cannot be on a virtual root pointing to a remote Uniform Naming Convention (UNC) share.
The names section of the Internet Data Query file defines nonstandard column names that can be referred to in the query. The columns refer to ActiveX properties that have been created in document files with IPropertyStorage, or in the Microsoft® Office summary and custom properties. The globally unique identifier (GUID) for Microsoft Office is 0xF29F85E0,0x4FF9,0x1068,0xAB9108002B27B3D9. The following sample defines a few of the ActiveX Summary Information properties:
[Names] #Property set for OLE document properties DocTitle = F29F85E0-4FF9-1068-AB91-08002B27B3D9 2 DocSubject( DBTYPE_STR|DBTYPE_BYREF ) = F29F85E0-4FF9-1068-AB91-08002B27B3D9 3 DocAuthor( DBTYPE_STR|DBTYPE_BYREF ) = F29F85E0-4FF9-1068-AB91-08002B27B3D9 4 DocEditTime( DBTYPE_DATE ) = F29F85E0-4FF9-1068-AB91-08002B27B3D9 0xa DocLastPrinted( DBTYPE_DATE ) = F29F85E0-4FF9-1068-AB91-08002B27B3D9 0xb DocPageCount( DBTYPE_I4 ) = F29F85E0-4FF9-1068-AB91-08002B27B3D9 0xe DocWordCount( DBTYPE_I4 ) = F29F85E0-4FF9-1068-AB91-08002B27B3D9 0xf SalesRegion( DBTYPE_WSTR | DBTYPE_BYREF ) = D5CDD505-2E9C-101B-9397-08002B2CF9AE "SalesRegion"
Within the section, any blank line, or a line beginning with a number sign (#) is ignored. Other lines consist of a friendly name, optionally followed by a datatype in parenthesis, followed by an equal sign (=), then a GUID identifying the property set for the column, followed by either a number or a string giving the PROPID or the property name, respectively. If no datatype is provided, DBTYPE_WSTR is assumed.
The friendly name is the token in query restrictions, sort specifications, and so on. Multiple friendly names can point to the same property. For example, the friendly name Author might be replaced by Auteur if an author property is to be shown to a French audience. Friendly names cannot contain spaces or special characters such as angle brackets, equal signs, exclamation points, commas, periods, and asterisks (>=<!,.*).
The GUID and PROPID/property name is the name of the property within the ActiveX property namespace. See the Win32 Software Development Kit (SDK) for more information on ActiveX properties. The PROPID may be specified as a decimal (base 10) or in hexadecimal (base 16) number. In the latter case, the number must be preceded by 0x. Property names must be enclosed in quotation marks. For example, 10 is not the same as 10.
The datatype is used during restriction parsing to correctly interpret user input. The following table lists the datatypes supported, their equivalent ActiveX mnemonics, and any formatting restrictions.
Datatype | ActiveX mnemonics | Formatting restrictions |
---|---|---|
DBTYPE_I1 | VT_I1 | Integer. Expressed in either decimal (base 10) or hexadecimal (base 16) notation. The latter requires 0x before the number, for example, 0x3F8. |
DBTYPE_UI1 | VT_UI1 | Integer. Expressed in either decimal (base 10) or hexadecimal (base 16) notation. The latter requires 0x before the number. |
DBTYPE_I2 | VT_I2 | Integer. Expressed in either decimal (base 10) or hexadecimal (base 16) notation. The latter requires 0x before the number. |
DBTYPE_UI2 | VT_UI2 | Integer. Expressed in either decimal (base 10) or hexadecimal (base 16) notation. The latter requires 0x before the number. |
DBTYPE_I4 | VT_I4 | Integer. Expressed in either decimal (base 10) or hexadecimal (base 16) notation. The latter requires 0x before the number. |
DBTYPE_UI4 | VT_UI4 | Integer. Expressed in either decimal (base 10) or hexadecimal (base 16) notation. The latter requires 0x before the number. |
DBTYPE_I8 | VT_I8 | Integer. Expressed in either decimal (base 10) or hexadecimal (base 16) notation. The latter requires 0x before the number. |
DBTYPE_UI8 | VT_UI8 | Integer. Expressed in either decimal (base 10) or hexadecimal (base 16) notation. The latter requires 0x before the number. |
DBTYPE_R4 | VT_R4 | Real number. Can be expressed in scientific notation. |
DBTYPE_R8 | VT_R8 | Real number. Can be expressed in scientific notation. |
DBTYPE_CY | VT_CY | Currency. Expressed as two integers, separated by a period, for example, 100.55. Cannot be preceded by $, ¥, £, and so on. This datatype does not specify the currency format. |
DBTYPE_DATE | VT_DATE | Date. Expressed as an absolute in two forms: yyyy/mm/dd and yyyy/mm/dd hh:mm:ss. Also expressed as a relative date: -#y, -#m, -#w, -#d, -#h, -#n, -#s where the letters correspond to year, month, week, day, hour, minute and second, respectively. Positive relative dates into the future are not supported. |
DBTYPE_BOOL | VT_BOOL | Boolean. Expressed as TRUE or FALSE. |
DBTYPE_STR | VT_LPSTR | String. Any input accepted. |
DBTYPE_WSTR | VT_LPWSTR | Unicode string. Any input accepted. |
DBTYPE_BSTR | VT_BSTR | Basic string. Any input accepted. |
DBTYPE_GUID | VT_CLSID | GUID (Globally Unique IDentifier). Expressed as xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx. |
DBTYPE_BYREF | (not applicable) | Older operator. Should be added to strings. For example: DBTYPE_WSTR | DBTYPE_BYREF. |
DBTYPE_VECTOR | VT_VECTOR | Older operator. Vector properties are fully supported. |
VT_FILETIME | VT_FILETIME | Expressed as an absolute in two forms: yyyy/mm/dd and yyyy/mm/dd hh:mm:ss. Also expressed as a relative date: -#y, -#m, -#w, -#d, -#h, -#n, -#s where the letters correspond to year, month, week, day, hour, minute and second, respectively. Positive relative dates into the future are not supported. |
The friendly names are always available, even if they are not explicitly defined in the names section. See List of Property Names on the Query Language page. For other Microsoft Office properties, see the Microsoft Office Software Developers Kit (SDK). For properties available with other products, see the documentation for each independent software vendor.
The HTML filter extracts text from the content field of a meta element. For example, if an HTML file has this line:
<META NAME="DESCRIPTION" CONTENT="Sample query form for Microsoft Index Server">
Then a user can query the information in the content field, namely Sample query form for Microsoft Index Server, by using the HTML meta property. The GUID for the meta property is D1B5D3F0-C0B3-11CF-9A92-00A0C908DBF1 and the property name is specified by the name field, or the HTTP-EQUIV field. In the above example, the property name is DESCRIPTION. Thus a friendly name, say MetaDescription, for the meta property can be defined as
MetaDescription(DBTYPE_WSTR) = D1B5D3F0-C0B3-11CF-9A92-00A0C908DBF1 description
The GUID for meta property is a registry parameter located at
HKEY_LOCAL_MACHINE \System \CurrentControlSet \Control\HtmlFilter \MetaTagClsid
The query section of the .idq file specifies parameters that will be used in the query. It can refer to form variables and can include conditional expressions to set a variable to alternative values depending upon some condition. The section begins with a [Query] tag, and is followed by a set of parameters. Here is a simple .idq file:
[Query] CiScope=/ CiColumns=FileName CiRestriction=#filename *.* CiTemplate=/Scripts/Template.htx
The preceding four parameters are required. In many cases, one or more parameters will be passed down from a form. Here is a very simple form:
<FORM ACTION="/scripts/simple.idq" METHOD="GET"> Query : <INPUT TYPE="TEXT" NAME="Restriction" SIZE="60" MAXLENGTH="100" VALUE=""> <INPUT TYPE="SUBMIT" VALUE="Execute Query"> </FORM>
This form can work with the following .idq file to pass parameters through from the user:
[Query] CiScope=/ CiColumns=FileName CiRestriction=%Restriction% CiTemplate=/Scripts/Template.htx
Conditional expressions can also be used in .idq files in exactly the same manner as .htx files. In addition to the four parameters shown earlier, there are many other optional parameters. Common additions include CiSort and CiForceUseCi. See the full list of additions.
Warning Be careful when substituting parameters for the CiTemplate parameter because you could unintentionally allow files in execute-only scripts directories to be sent over the network. For example, if an .idq file contained the line
CiTemplate=%CiTemplate%
a client could send a URL that contained the following line in the query string:
CiTemplate=/scripts/mysecretfile.pl
With this string, an unauthorized user could read the contents of a confidential file.
It is better to switch among different. htx files by just using the base name of the file and adding the script directory and file name extension in the parameter substitution. The following file, Sample.idq, shows how to do this:
[Query]CiRestriction=%q% CiTemplate=/scripts/%t%.htx CiSort=%s% CiScope=/
The query can be executed with a URL like the following:
http://computername/scripts/sample.idq?q=ActiveX&t=form1
The fastest query is a sequential query that uses the content index. Certain parameter settings will force the query engine to use a less efficient method to resolve the query. To guarantee fast queries, set CiSort to nothing (or descending by rank) set CiForceUseCi to TRUE, and do not reference CiMatchedRecordCount, CiRecordsNextPage, or CiTotalNumberPages in the .htx template.
Note: A Uniform Resource Locator (URL) or a form-based query can send up to 4 kilobytes (K) of data. If a query larger than 4K is sent, the behavior is unpredictable. The query size includes all variables sent from the browser to the .idq file.
A query can be executed sequentially (results fetched as needed) or it can be executed nonsequentially (results cached on the server). A sequential query requires fewer server resources, but also has some limitations. Backwards scrolling (CiBookmarkSkipCount < 0) will re-execute the query and scroll forward to the specified position. Sequential queries cannot refer to the following variables: CiMatchedRecordCount, CiRecordsNextPage, and CiTotalNumberPages.
Either of the following actions will force a query to be nonsequential:
Referring to the CiMatchedRecordCount, CiRecordsNextPage or CiTotalNumberPages variables in the .htx page.
Setting the CiSort parameter to a sort other than the native order of the result. CiSort can safely be set to sort ascending on WorkId (CiSort=WorkId[a]) or descending on Rank (CiSort=Rank[d]).
Executing queries that must be enumerated can also slow down performance. Most queries are resolved by using the content index, but certain conditions force the query engine to recursively search the disk to locate matching files. These queries include:
Regular expressions on properties other than FileName which begin with a wildcard.
Property value queries when CiForceUseCi is set to FALSE and the index is not up-to-date.
Property value queries involving regular expressions with a wildcard prefix on a property other than FileName (for example, #DocAuthor *son).
Property value queries involving regular expressions that start and end with wildcards (for example, #filename *sample*).
Certain property value queries involving OR (such as @write > -1d OR @create > -1d).
Queries can be forced to use the content index by setting CiForceUseCi to TRUE in the .idq file. The query engine will always use the content index, but query results may be out-of-date for recently modified files. If the content index was used for a query, and some files on disk have been modified more recently than their contents have been filtered, the built-in variable CiOutOfDate will be set to the value 1. In some cases, a query is simply too complex to be resolved solely through use of the content index. In these cases, the built-in variable CiQueryIncomplete will be set to 1. Content queries can always be out of date and can use the content index anytime.
Special support has been put in Index Server to optimize content queries that are sorted descending by rank (CiSort = Rank[d]). For such queries, minimal information can be retrieved from the index, before additional property and security tests are performed. However, if the total number of results matching the query is greater than CiMaxRecordsInResultSet then additional testing must be performed during index retrieval to remove items from this set that fail additional property and security tests. This frees up space in the result set for items matching the full query. This processing uses up resources, and can be deferred by setting CiDeferNonIndexedTrimming to TRUE. The query will then pick CiMaxRecordsInResultSet items first, and trim those. The end result may be a number of matching items less than CiMaxRecordsInResultSet. For queries with the scope set to the entire corpus, on a server with little or no security, you can consider setting CiDeferNonIndexedTrimming to TRUE to improve performance.
© 1996 by Microsoft Corporation. All rights reserved.