This directory and its subdirectory contain source and
binary files for the statitics support packages that can
be run across multiple platforms.
The directory is organized as follows:
1) .\ ->
a) stat.c (single c source)
b) makefile.rst (common for windows, OS2 16, and OS2386)
c) makefile (for Windows NT)
d) sources (sources file for Windows NT)
e) The header file, teststat.h, required for building
the dlls is under ..\inc\.
2) .\win (FOR WINDOWS)
a) .\src (contains the remaining .asm file, and the
module def file)
b) .\bin (the binary statwin.dll file)
3) .\WIN32 (FOR WIN32 APPS)
a) .\src (contains the .def file and an i386 sub-dir.
and a mips subdir, each containing an asm file)
b) .\bin (the binary file)
4) .\os2286 (FOR 16 bit OS/2 Cruiser and Sloop apps)
a) .\src (the module def file)
b) .\bin (the binary stat286.dll file)
5) .\os2386 (FOR 32 bit OS/2 Cruiser apps)
a) .\src (the module def file)
b) .\bin (the binary stat386.dll file)
**********************************************************************
To build an application that uses a statistics DLL:
--------------------------------------------------
To use one of the above binaries, please read the USAGE NOTES at the
end of this document. Please copy the teststat.h file from this
directory to the directory where you are building your application.
Copy the relevant .dll to your libpath.
It is essential that you define the type of system you are building
your application for, since the header file uses some special types
that are dependent on the system. While compiling your application,
add the following flag: -DXXX where XXX stands for one of:
WIN - for Windows applications
OS2286 - for 16 bit OS/2 applications
OS2386 - for 32 bit OS/2 applications
WIN32 - for Win32 applications.
**********************************************************************
To build one of the dlls:
------------------------
If building a Windows, OS2 16 or OS/2 32 bit dll:
--------------------------------------------------------
a) Copy the stat.c file found under this directory
and teststat.h from ..\inc\. to a local directory.
Copy the .asm file from win\src if building for WIN.
b) Also copy the "makefile.rst" from here to the same directory.
c) From ???\src copy the remaining files to the local directory,
where ??? represents win, os2286 or os2386.
d) Edit "makefile.rst" to define the system that you are making the
dll for. Eg. if you are making the dll for windows, remove
the comment sign (#) from the line "WIN=TRUE" in the makefile
and ensure that the other system defines (OS2286 and OS2386)
are commented out.
f) Type "nmake -f makefile.rst" and the dll will be created for
you. (Ensure that your development environment is set up for
the right system).
If building the Win32 dll:
-------------------------
a) Copy stat.c, makefile and sources files found under this
directory and teststat.h from ..\inc\. to a local directory.
b) tc the win32\src directory to your local directory. This
will create an i386 (or mips) sub-directory containing an asm
file on your local machine.
c) From the directory where you have your sources file, type
"build -xxx statw32" from the command line, where xxx represents
your target system. It is 386 by default.
d) A binary file "statw32.dll" will be created along with the
.obj file under .\xxx\obj where xxx is your target system.
It is i386 by default.
In case you have any questions, or if you run into any problems,
contact vaidy (936-7812).
*****************************************************************
USAGE NOTES
-----------
This is the user's guide to using TestStat.dll, the
statistical package. In case of questions contact vaidy (936-
7812).
This document describes the use of each of the functions
available through this module and then demonstrates the use
of these routines through an example.
This module provides basic statistical routines which can be
used to compute average, min, max, standard deviation, and
statistical convergence. Statistical convergence of the
average is determined by the number of test iterations required
for the average to converge to a "stable" value. The number of
iterations required is computed on the fly as the data is
collected, so that the caller is informed when enough data is
collected (the average is stable). Stable averages obtained in
this way can be compared to other stable averages obtained under
different experimental conditions with known confidence levels.
Notes in this document describe the meanings of stability and
confidence more formally.
In addition to the functionlity described above, this module
also provides routines that generate normally distributed
random numbers. Three routines are provided that return
random numbers within a specified boundary, a set of uniformly
distributed random numbers within the range 0 to 1 and a normally
distributed set of numbers around a mean, which satisfy a given
mean and standard deviation.
1) TestStatOpen:
-------------
Description: Allocates an instance data array for the data set
and other global data structures required by the high level
functions.
USHORT FAR PASCAL
TestStatOpen (
USHORT usMinIterations,
USHORT usMaxIterations
);
usMinIterations - The minimum number of iterations
that the calling application has to run before
the convergence algorithm may be used.
usMaxIterations - The maximum number of iterations
that the test program may run. The maximum
acceptable value is 64K. An internal data array
of usMaxIterations of ULONGs is allocated. The
caller should bear this in mind when setting this
parameter.
Remarks: This routine should be called before the first
call to TestStatInit. If usMinIterations is zero, an error
code is returned. If usMinIterations is greater than
usMaxIterations an error code is returned. If usMinIterations
is equal to usMaxIterations, TestStatConverge will return TRUE
after that many iterations. This function frees the caller
from the responsibility of allocating any data storage or
book-keeping.
Return Value: 0 if the call succeeded.
An error code indicating a failure. The
error code may be one of:
STAT_ERROR_ILLEGAL_MIN_ITER
STAT_ERROR_ILLEGAL_MAX_ITER
STAT_ERROR_ALLOC_FAILED
See also: TestStatInit, TestStatConverge, TestStatValues,
TestStatClose.
2) TestStatInit:
-------------
Description: Initializes variables required by the convergence
and statistics routines.
VOID FAR PASCAL
TestStatInit (
VOID
);
Remarks: This routine should be called before the first
call to TestStatConverge and after each call to
TestStatValues, if you want to converge on a new set of
data.
Return Value: None.
See also: TestStatOpen, TestStatClose, TestStatConverge,
TestStatValues.
3) TestStatConverge:
-----------------
Description: Automatically computes number of iterations
required for 95% confidence in data obtained.
BOOL FAR PASCAL
TestStatConverge (
ULONG ulNewDataPoint,
);
ulNewDataPoint - The data point obtained for the
current iteration.
Remarks: This routine should be called for each
iteration of the test. The first call to this routine
should be preceded by a call to TestStatInit. The test
program should check for the return value and should stop
the test as soon as a TRUE is returned.
In making tests of significance, sometimes errors will be
encountered in the results concerning an hypothesis tested.
The hypothesis is that the difference between the actual
mean in one experiment and the actual mean in a second
experiment is less than a specified value. This
difference is expressed as a percentage of the first
experiment's mean. We call this difference, the "precision"
of the comparison.
If the assumption is true and the results of the tests leads one
to believe that it is false, the condition is described as a
TYPE I error. If the assumption is false and the test results
show that the two means are within the prescribed difference, the
condition is described as a TYPE II error.
The probability of TYPE I error is set by the significance level
of the test. Choosing a small probability of one type of error,
increases the probability of the other type.
The routines in this module operate on the following set of
assumed parameters:
95% confidence that if the means differ by less than 5% they
are really the same, and 85% confidence that if the means
differ by more than 5% that they are really different.
The algorithm in this module uses these assumptions to determine
the number of iterations needed to achieve these levels of
confidence.
The reason for emphasizing TYPE II error is that a TYPE I error
indicates that the means differ, when in fact, they are the same.
If they differ, we will usually explore why, and in doing so, will
discover that they are not really different after all. If, on the
other hand, we get a TYPE II error, then this means that the
results show no difference, whereas the means really are
different. This is to be avoided since if means don't differ
from one run to the next, we are unlikely to look further into
the problem.
When additional iterations are forced by a high usMinIterations,
then the resulting precision will usually be less than 5%.
Conversely, when usMaxIterations are reached without converging,
then the precision will be greater than 5%. The precision returned
by TestStatValues will indicate how meaningful the comparisons of
two means will be.
Return Value: FALSE if further iterations are required for
the test to converge or usMinIterations has
not been reached.
TRUE if already converged or maximum limit
on iterations has been reached.
See also: TestStatOpen, TestStatInit, TestStatValues,
TestStatClose.
4) TestStatValues:
----------------
Description: Automatically computes a number of useful
statistical values for a given set of data.
VOID FAR PASCAL
TestStatValues (
PSZ pszOutputString,
USHORT usOutlierFactor,
PULONG * pulDataArray,
PUSHORT pcusElementsInArray,
PUSHORT pcusDiscardedValues,
);
pszOutputString - A pointer to a string buffer to which
output data may be returned. The minimum size of
the buffer should be 81 bytes. The string will
be a NULL terminated ascii string.
usOutlierFactor - Factor that defines the range of
acceptable data values. A value of zero will ignore
this factor and all data will be considered
valid.
pulDataArray - A pointer to the data array. If the outlier
factor has been chosen, this array has as many elements
as there were good data points in the data set. Else,
all the data points are contained in the data array.
pcusElementsInArray - The number of elements in the array
pointed to by puDataArray.
pcusDiscardedValues - pointer to the number of data
points discarded based upon the outlier factor.
Remarks: This routine should be called only once for
each test, normally after TestStatConverge has returned
TRUE. Any call to this should be followed by a call to
TestStatInit before the next call to TestStatConverge,
if you want to converge on a new set of data. The Outlier
factor decides the range of acceptable values in the data set.
The format of the returned string will be (as in C):
"%4u %10lu %10lu %10lu %6u %5u %10lu %4u %2u ".
These represent the mode number, mean, minimum, maximum,
the number of iterations completed, the precision, the standard
deviation, number of points discarded, and, the outlier factor
from the data set. The mode number will always be zero.
The precision will be 5% in case test results converged before
the limit on the maximum iterations is reached. Otherwise,
it returns the precision of the results gathered.
The precision value in this case assumes that the Type I error
and Type II error probabilities are 85% and 95% respectively.
The outlier factor determines along with the standard deviation
any abnormal data points. Any data point that does not satisfy:
[Mean - (SDev * OF)] < Data Point < [Mean + (SDev * OF)],
where SDev is the standard deviation computed with good data
points and OF is the outlier factor, is left out in the
statistics computation. The standard deviation is
recomputed and this process is repeated until there are no
abnormal data entries in the data set. The number of
outliers that were discarded is also returned to the calling
program. To ignore the outlier factor and this process of
elimination, the outlier factor may be set to zero.
Otherwise, the outlier factor should be at least 2 in order
for the results to be meaningful.
Return Value: None
See also: TestStatOpen, TestStatClose, TestStatInit,
TestStatConverge
5) TestStatClose:
-------------
Description: Deallocates instance data structures and
all memory allocated by TestStatOpen and TestStatInit.
VOID FAR PASCAL
TestStatClose (
VOID
);
Remarks: This routine should be called after the last
call to TestStatValues. A call to this must be followed by a
call to TestStatOpen and TestStatInit, in that order, before
the application calls TestStatConverge and TestStatValues.
Return Value: None
See also: TestStatOpen, TestStatInit, TestStatConverge,
TestStatValues.
------------------------------------------------------------
Usage of Statistical routines for convergence and values: TestApp
------------------------------------------------------------------
#define MIN_ITERATION 3
#define MAX_ITERATION 200
#define OUTLIER_FACTOR 4
Body of test application
{
USHORT usMinIteration = MIN_ITERATION;
USHORT usMaxIteration = MAX_ITERATION;
ULONG ulDataForCurrentIter;
ULONG far *pulDataArray; // make sure you have the "far" for 16 bit.
char chOutputBufferForString [81];
USHORT usOutlierFactor = OUTLIER_FACTOR;
USHORT cusDiscardedValues;
USHORT cusElementsInArray;
:
:
if (!TestStatOpen (usMinIteration,
usMaxIteration)) {
// Data Array could not be allocated.
// Cannot do convergence/statistics routines;
// Check parameters to call;
}
do { // for each test or if need to run convergence again
TestStatInit ()
// Initialize test variables;
:
do { // convergence loop; do until a
// TRUE is returned
// Start the timer;
// Test operation;
// Stop the timer;
ulDataForCurrentIter = // get the elapsed time for
// operation;
} while (!TestStatConverge (ulDataForCurrentIter));
// the data set has converged. Call the Statistics
// routine for the values and output data
TestStatValues (OutputBufferForString,
usOutlierFactor,
&pulDataArray,
&cusDiscardedValues,
&cusElementsInArray,
);
// the OutputBufferForString array has all the data.
// iDiscardedValues has the number of discarded values
//
} while (//more tests or need to converge on new data set )
:
:
TestStatClose();
:
}
-------------------------------------------------------------------
Random Number Generation Routines:
6) TestStatUniRand:
---------------
Description: Returns a number within the range of 0 to 1 based on
a starting seed.
double FAR PASCAL
TestStatUniRand (
VOID
);
Remarks: This routine returns a set of uniformly distributed numbers
between 0 and 1, on being, called repeatedly. TestStatUniRand makes
use of the multiplicative congruential algorithm discussed in Knuth,
Vol. II, Chapter 3. A starting seed is chosen along with a
multiplier and a modulus values. The seed for the next iteration is
computed from these values as follows:
Temp Value = X * A, where,
X is the current seed value and A is the multiplier. The remainder
of the division of this value by the modulus identifier is
determined. This will be the seed for the next iteration. This
value is divided by the modulus value to obtain a normalized value
(that lies between 0 and 1). This normalized value is returned to
the caller.
Through experiments, Sullivans, W. L has determined that a good set of
values is returned by selecting one of the 9 following values as
starting seeds:
32347753, 52142147, 52142123, 53214215, 23521425, 42321479,
20302541, 32524125, 42152159.
TestStatUniRand uses 32347753 as the starting seed. A good set of
values, mentioned above, implies that for the given seed, it takes
a very large number of iterations, before the set of returned values
is repeated. The following values have been chosen for the
multiplier and the modulus by M.C. Pike and I.D. Hill (reference):
Multiplier - 3125
Modulus id - 67108864
Return Value: A double float between 0 and 1.
See also: TestStatShortRand, TestStatRand, TestStatNormDist.
7) TestStatShortRand:
-----------------
Description: Returns a number within the range of 0 to 65535 based on
a starting seed.
USHORT FAR PASCAL
TestStatShortRand (
VOID
);
Remarks: This routine returns a set of uniformly distributed numbers
between 0 and 65535, on being, called repeatedly. TestStatShortRand makes
use of the multiplicative congruential algorithm discussed in Knuth,
Vol. II, Chapter 3. A starting seed is chosen along with a
multiplier and a modulus values. The seed for the next iteration is
computed from these values as follows:
Temp Value = X * A, where,
X is the current seed value and A is the multiplier. The remainder
of the division of this value by the modulus identifier is
determined. This will be the seed for the next iteration. This
value is multiplied by 65535 and divided by the modulus value
to obtain a value between 0 and 65535. This value is returned to
the caller.
Through experiments, Sullivans, W. L has determined that a good set of
values is returned by selecting one of the 9 following values as
starting seeds:
32347753, 52142147, 52142123, 53214215, 23521425, 42321479,
20302541, 32524125, 42152159.
TestStatShortRand uses 32347753 as the starting seed. A good set of
values, mentioned above, implies that for the given seed, it takes
a very large number of iterations, before the set of returned values
is repeated. The following values have been chosen for the
multiplier and the modulus by M.C. Pike and I.D. Hill (reference):
Multiplier - 3125
Modulus id - 67108864
Return Value: A USHORT between 0 and 65535.
See also: TestStatUniRand, TestStatRand, TestStatNormDist.
8) TestStatRand:
------------
Description: Returns a uniformly distributed random number within
a specified range.
ULONG FAR PASCAL
TestStatRand (
ULONG ulLower,
ULONG ulUpper
);
ulLower - Specifies the lower boundary of the desired random
number. Should be atleast 1 in value.
ulUpper - Specifies the upper boundary of the desired random
number. May not exceed 67108863.
Remarks: TestStatRand calls TestStatNorm for obtaining a normalized
random number. The value obtained from TestStatNorm is then
multiplied by the range (i.e. the difference between ulUpper and
ulLower). The computed value is then added to the lower limit
and the resulting number is returned. It should be noted that both
ulLower and ulUpper are included in the range of returned random
numbers.
Return Value: A random number within the specified range.
See Also: TestStatShortRand, TestStatUniRand, TestStatNormDist.
9) TestStatNormDist:
----------------
Description: With every call, returns a number that forms a set
of points whose mean is approximately the input mean and whose
standard deviation is nearly equal to the input standard deviation.
A normally distributed set of points is generated.
LONG FAR PASCAL
TestStatNormDist (
ULONG ulMean,
USHORT usSDev
);
Remarks: This routine uses a formula discussed in 'Random Number
Generation and Testing', IBM Data Processing Techniques, C20-8011
and 'Tuning an Operating System for General Purpose Use', Russell
P. Blake, Online Conferences (info. to be filled in).
TestStatNormDist makes use of TestStatShortRand to get a set of
uniformly distributed numbers. It generates a point around the
input mean using the following formula:
14
_
lRetVal <- Mean + ( -7 + >_ TestStatShortRand ()) * Std. Dev
i=1
The set of points generated with several calls to this routine
will be uniformly distributed with a mean of about the input mean
and a standard deviation of approximately the input standard
deviation. The returned value may be negative, too, depending
upon the values returned by TestStatShortRand and the input standard
deviation!
Return Value: A long integer.
See also: TestStatShortRand, TestStatUniRand, TestStatRand.