This directory and its subdirectory contain source and binary files for the statitics support packages that can be run across multiple platforms. The directory is organized as follows: 1) .\ -> a) stat.c (single c source) b) makefile.rst (common for windows, OS2 16, and OS2386) c) makefile (for Windows NT) d) sources (sources file for Windows NT) e) The header file, teststat.h, required for building the dlls is under ..\inc\. 2) .\win (FOR WINDOWS) a) .\src (contains the remaining .asm file, and the module def file) b) .\bin (the binary statwin.dll file) 3) .\WIN32 (FOR WIN32 APPS) a) .\src (contains the .def file and an i386 sub-dir. and a mips subdir, each containing an asm file) b) .\bin (the binary file) 4) .\os2286 (FOR 16 bit OS/2 Cruiser and Sloop apps) a) .\src (the module def file) b) .\bin (the binary stat286.dll file) 5) .\os2386 (FOR 32 bit OS/2 Cruiser apps) a) .\src (the module def file) b) .\bin (the binary stat386.dll file) ********************************************************************** To build an application that uses a statistics DLL: -------------------------------------------------- To use one of the above binaries, please read the USAGE NOTES at the end of this document. Please copy the teststat.h file from this directory to the directory where you are building your application. Copy the relevant .dll to your libpath. It is essential that you define the type of system you are building your application for, since the header file uses some special types that are dependent on the system. While compiling your application, add the following flag: -DXXX where XXX stands for one of: WIN - for Windows applications OS2286 - for 16 bit OS/2 applications OS2386 - for 32 bit OS/2 applications WIN32 - for Win32 applications. ********************************************************************** To build one of the dlls: ------------------------ If building a Windows, OS2 16 or OS/2 32 bit dll: -------------------------------------------------------- a) Copy the stat.c file found under this directory and teststat.h from ..\inc\. to a local directory. Copy the .asm file from win\src if building for WIN. b) Also copy the "makefile.rst" from here to the same directory. c) From ???\src copy the remaining files to the local directory, where ??? represents win, os2286 or os2386. d) Edit "makefile.rst" to define the system that you are making the dll for. Eg. if you are making the dll for windows, remove the comment sign (#) from the line "WIN=TRUE" in the makefile and ensure that the other system defines (OS2286 and OS2386) are commented out. f) Type "nmake -f makefile.rst" and the dll will be created for you. (Ensure that your development environment is set up for the right system). If building the Win32 dll: ------------------------- a) Copy stat.c, makefile and sources files found under this directory and teststat.h from ..\inc\. to a local directory. b) tc the win32\src directory to your local directory. This will create an i386 (or mips) sub-directory containing an asm file on your local machine. c) From the directory where you have your sources file, type "build -xxx statw32" from the command line, where xxx represents your target system. It is 386 by default. d) A binary file "statw32.dll" will be created along with the .obj file under .\xxx\obj where xxx is your target system. It is i386 by default. In case you have any questions, or if you run into any problems, contact vaidy (936-7812). ***************************************************************** USAGE NOTES ----------- This is the user's guide to using TestStat.dll, the statistical package. In case of questions contact vaidy (936- 7812). This document describes the use of each of the functions available through this module and then demonstrates the use of these routines through an example. This module provides basic statistical routines which can be used to compute average, min, max, standard deviation, and statistical convergence. Statistical convergence of the average is determined by the number of test iterations required for the average to converge to a "stable" value. The number of iterations required is computed on the fly as the data is collected, so that the caller is informed when enough data is collected (the average is stable). Stable averages obtained in this way can be compared to other stable averages obtained under different experimental conditions with known confidence levels. Notes in this document describe the meanings of stability and confidence more formally. In addition to the functionlity described above, this module also provides routines that generate normally distributed random numbers. Three routines are provided that return random numbers within a specified boundary, a set of uniformly distributed random numbers within the range 0 to 1 and a normally distributed set of numbers around a mean, which satisfy a given mean and standard deviation. 1) TestStatOpen: ------------- Description: Allocates an instance data array for the data set and other global data structures required by the high level functions. USHORT FAR PASCAL TestStatOpen ( USHORT usMinIterations, USHORT usMaxIterations ); usMinIterations - The minimum number of iterations that the calling application has to run before the convergence algorithm may be used. usMaxIterations - The maximum number of iterations that the test program may run. The maximum acceptable value is 64K. An internal data array of usMaxIterations of ULONGs is allocated. The caller should bear this in mind when setting this parameter. Remarks: This routine should be called before the first call to TestStatInit. If usMinIterations is zero, an error code is returned. If usMinIterations is greater than usMaxIterations an error code is returned. If usMinIterations is equal to usMaxIterations, TestStatConverge will return TRUE after that many iterations. This function frees the caller from the responsibility of allocating any data storage or book-keeping. Return Value: 0 if the call succeeded. An error code indicating a failure. The error code may be one of: STAT_ERROR_ILLEGAL_MIN_ITER STAT_ERROR_ILLEGAL_MAX_ITER STAT_ERROR_ALLOC_FAILED See also: TestStatInit, TestStatConverge, TestStatValues, TestStatClose. 2) TestStatInit: ------------- Description: Initializes variables required by the convergence and statistics routines. VOID FAR PASCAL TestStatInit ( VOID ); Remarks: This routine should be called before the first call to TestStatConverge and after each call to TestStatValues, if you want to converge on a new set of data. Return Value: None. See also: TestStatOpen, TestStatClose, TestStatConverge, TestStatValues. 3) TestStatConverge: ----------------- Description: Automatically computes number of iterations required for 95% confidence in data obtained. BOOL FAR PASCAL TestStatConverge ( ULONG ulNewDataPoint, ); ulNewDataPoint - The data point obtained for the current iteration. Remarks: This routine should be called for each iteration of the test. The first call to this routine should be preceded by a call to TestStatInit. The test program should check for the return value and should stop the test as soon as a TRUE is returned. In making tests of significance, sometimes errors will be encountered in the results concerning an hypothesis tested. The hypothesis is that the difference between the actual mean in one experiment and the actual mean in a second experiment is less than a specified value. This difference is expressed as a percentage of the first experiment's mean. We call this difference, the "precision" of the comparison. If the assumption is true and the results of the tests leads one to believe that it is false, the condition is described as a TYPE I error. If the assumption is false and the test results show that the two means are within the prescribed difference, the condition is described as a TYPE II error. The probability of TYPE I error is set by the significance level of the test. Choosing a small probability of one type of error, increases the probability of the other type. The routines in this module operate on the following set of assumed parameters: 95% confidence that if the means differ by less than 5% they are really the same, and 85% confidence that if the means differ by more than 5% that they are really different. The algorithm in this module uses these assumptions to determine the number of iterations needed to achieve these levels of confidence. The reason for emphasizing TYPE II error is that a TYPE I error indicates that the means differ, when in fact, they are the same. If they differ, we will usually explore why, and in doing so, will discover that they are not really different after all. If, on the other hand, we get a TYPE II error, then this means that the results show no difference, whereas the means really are different. This is to be avoided since if means don't differ from one run to the next, we are unlikely to look further into the problem. When additional iterations are forced by a high usMinIterations, then the resulting precision will usually be less than 5%. Conversely, when usMaxIterations are reached without converging, then the precision will be greater than 5%. The precision returned by TestStatValues will indicate how meaningful the comparisons of two means will be. Return Value: FALSE if further iterations are required for the test to converge or usMinIterations has not been reached. TRUE if already converged or maximum limit on iterations has been reached. See also: TestStatOpen, TestStatInit, TestStatValues, TestStatClose. 4) TestStatValues: ---------------- Description: Automatically computes a number of useful statistical values for a given set of data. VOID FAR PASCAL TestStatValues ( PSZ pszOutputString, USHORT usOutlierFactor, PULONG * pulDataArray, PUSHORT pcusElementsInArray, PUSHORT pcusDiscardedValues, ); pszOutputString - A pointer to a string buffer to which output data may be returned. The minimum size of the buffer should be 81 bytes. The string will be a NULL terminated ascii string. usOutlierFactor - Factor that defines the range of acceptable data values. A value of zero will ignore this factor and all data will be considered valid. pulDataArray - A pointer to the data array. If the outlier factor has been chosen, this array has as many elements as there were good data points in the data set. Else, all the data points are contained in the data array. pcusElementsInArray - The number of elements in the array pointed to by puDataArray. pcusDiscardedValues - pointer to the number of data points discarded based upon the outlier factor. Remarks: This routine should be called only once for each test, normally after TestStatConverge has returned TRUE. Any call to this should be followed by a call to TestStatInit before the next call to TestStatConverge, if you want to converge on a new set of data. The Outlier factor decides the range of acceptable values in the data set. The format of the returned string will be (as in C): "%4u %10lu %10lu %10lu %6u %5u %10lu %4u %2u ". These represent the mode number, mean, minimum, maximum, the number of iterations completed, the precision, the standard deviation, number of points discarded, and, the outlier factor from the data set. The mode number will always be zero. The precision will be 5% in case test results converged before the limit on the maximum iterations is reached. Otherwise, it returns the precision of the results gathered. The precision value in this case assumes that the Type I error and Type II error probabilities are 85% and 95% respectively. The outlier factor determines along with the standard deviation any abnormal data points. Any data point that does not satisfy: [Mean - (SDev * OF)] < Data Point < [Mean + (SDev * OF)], where SDev is the standard deviation computed with good data points and OF is the outlier factor, is left out in the statistics computation. The standard deviation is recomputed and this process is repeated until there are no abnormal data entries in the data set. The number of outliers that were discarded is also returned to the calling program. To ignore the outlier factor and this process of elimination, the outlier factor may be set to zero. Otherwise, the outlier factor should be at least 2 in order for the results to be meaningful. Return Value: None See also: TestStatOpen, TestStatClose, TestStatInit, TestStatConverge 5) TestStatClose: ------------- Description: Deallocates instance data structures and all memory allocated by TestStatOpen and TestStatInit. VOID FAR PASCAL TestStatClose ( VOID ); Remarks: This routine should be called after the last call to TestStatValues. A call to this must be followed by a call to TestStatOpen and TestStatInit, in that order, before the application calls TestStatConverge and TestStatValues. Return Value: None See also: TestStatOpen, TestStatInit, TestStatConverge, TestStatValues. ------------------------------------------------------------ Usage of Statistical routines for convergence and values: TestApp ------------------------------------------------------------------ #define MIN_ITERATION 3 #define MAX_ITERATION 200 #define OUTLIER_FACTOR 4 Body of test application { USHORT usMinIteration = MIN_ITERATION; USHORT usMaxIteration = MAX_ITERATION; ULONG ulDataForCurrentIter; ULONG far *pulDataArray; // make sure you have the "far" for 16 bit. char chOutputBufferForString [81]; USHORT usOutlierFactor = OUTLIER_FACTOR; USHORT cusDiscardedValues; USHORT cusElementsInArray; : : if (!TestStatOpen (usMinIteration, usMaxIteration)) { // Data Array could not be allocated. // Cannot do convergence/statistics routines; // Check parameters to call; } do { // for each test or if need to run convergence again TestStatInit () // Initialize test variables; : do { // convergence loop; do until a // TRUE is returned // Start the timer; // Test operation; // Stop the timer; ulDataForCurrentIter = // get the elapsed time for // operation; } while (!TestStatConverge (ulDataForCurrentIter)); // the data set has converged. Call the Statistics // routine for the values and output data TestStatValues (OutputBufferForString, usOutlierFactor, &pulDataArray, &cusDiscardedValues, &cusElementsInArray, ); // the OutputBufferForString array has all the data. // iDiscardedValues has the number of discarded values // } while (//more tests or need to converge on new data set ) : : TestStatClose(); : } ------------------------------------------------------------------- Random Number Generation Routines: 6) TestStatUniRand: --------------- Description: Returns a number within the range of 0 to 1 based on a starting seed. double FAR PASCAL TestStatUniRand ( VOID ); Remarks: This routine returns a set of uniformly distributed numbers between 0 and 1, on being, called repeatedly. TestStatUniRand makes use of the multiplicative congruential algorithm discussed in Knuth, Vol. II, Chapter 3. A starting seed is chosen along with a multiplier and a modulus values. The seed for the next iteration is computed from these values as follows: Temp Value = X * A, where, X is the current seed value and A is the multiplier. The remainder of the division of this value by the modulus identifier is determined. This will be the seed for the next iteration. This value is divided by the modulus value to obtain a normalized value (that lies between 0 and 1). This normalized value is returned to the caller. Through experiments, Sullivans, W. L has determined that a good set of values is returned by selecting one of the 9 following values as starting seeds: 32347753, 52142147, 52142123, 53214215, 23521425, 42321479, 20302541, 32524125, 42152159. TestStatUniRand uses 32347753 as the starting seed. A good set of values, mentioned above, implies that for the given seed, it takes a very large number of iterations, before the set of returned values is repeated. The following values have been chosen for the multiplier and the modulus by M.C. Pike and I.D. Hill (reference): Multiplier - 3125 Modulus id - 67108864 Return Value: A double float between 0 and 1. See also: TestStatShortRand, TestStatRand, TestStatNormDist. 7) TestStatShortRand: ----------------- Description: Returns a number within the range of 0 to 65535 based on a starting seed. USHORT FAR PASCAL TestStatShortRand ( VOID ); Remarks: This routine returns a set of uniformly distributed numbers between 0 and 65535, on being, called repeatedly. TestStatShortRand makes use of the multiplicative congruential algorithm discussed in Knuth, Vol. II, Chapter 3. A starting seed is chosen along with a multiplier and a modulus values. The seed for the next iteration is computed from these values as follows: Temp Value = X * A, where, X is the current seed value and A is the multiplier. The remainder of the division of this value by the modulus identifier is determined. This will be the seed for the next iteration. This value is multiplied by 65535 and divided by the modulus value to obtain a value between 0 and 65535. This value is returned to the caller. Through experiments, Sullivans, W. L has determined that a good set of values is returned by selecting one of the 9 following values as starting seeds: 32347753, 52142147, 52142123, 53214215, 23521425, 42321479, 20302541, 32524125, 42152159. TestStatShortRand uses 32347753 as the starting seed. A good set of values, mentioned above, implies that for the given seed, it takes a very large number of iterations, before the set of returned values is repeated. The following values have been chosen for the multiplier and the modulus by M.C. Pike and I.D. Hill (reference): Multiplier - 3125 Modulus id - 67108864 Return Value: A USHORT between 0 and 65535. See also: TestStatUniRand, TestStatRand, TestStatNormDist. 8) TestStatRand: ------------ Description: Returns a uniformly distributed random number within a specified range. ULONG FAR PASCAL TestStatRand ( ULONG ulLower, ULONG ulUpper ); ulLower - Specifies the lower boundary of the desired random number. Should be atleast 1 in value. ulUpper - Specifies the upper boundary of the desired random number. May not exceed 67108863. Remarks: TestStatRand calls TestStatNorm for obtaining a normalized random number. The value obtained from TestStatNorm is then multiplied by the range (i.e. the difference between ulUpper and ulLower). The computed value is then added to the lower limit and the resulting number is returned. It should be noted that both ulLower and ulUpper are included in the range of returned random numbers. Return Value: A random number within the specified range. See Also: TestStatShortRand, TestStatUniRand, TestStatNormDist. 9) TestStatNormDist: ---------------- Description: With every call, returns a number that forms a set of points whose mean is approximately the input mean and whose standard deviation is nearly equal to the input standard deviation. A normally distributed set of points is generated. LONG FAR PASCAL TestStatNormDist ( ULONG ulMean, USHORT usSDev ); Remarks: This routine uses a formula discussed in 'Random Number Generation and Testing', IBM Data Processing Techniques, C20-8011 and 'Tuning an Operating System for General Purpose Use', Russell P. Blake, Online Conferences (info. to be filled in). TestStatNormDist makes use of TestStatShortRand to get a set of uniformly distributed numbers. It generates a point around the input mean using the following formula: 14 _ lRetVal <- Mean + ( -7 + >_ TestStatShortRand ()) * Std. Dev i=1 The set of points generated with several calls to this routine will be uniformly distributed with a mean of about the input mean and a standard deviation of approximately the input standard deviation. The returned value may be negative, too, depending upon the values returned by TestStatShortRand and the input standard deviation! Return Value: A long integer. See also: TestStatShortRand, TestStatUniRand, TestStatRand.