Next: Function: clear() Up: GRASP Routines: General purpose Previous: Function: binshort()   Contents

## Function: is_gaussian()

0 int is_gaussian(short *array,int n,int min,int max,int print)
This is a quick and robust test to see if a collection of values has a probability distribution that is consistent with a Gaussian normal distribution (normal IFO operation"), or if the collection of values contains outlier" points, indicating that the set of values contains pulses", blips" and other obvious" exceptional events that stick out above the noise" (caused by bad cabling, alignment problems, or other short-lived transient events).

The arguments are:

array: Input. The values whose probability distribution is examined are array[0..n-1].
n: Input. The length of the previous array.
min: Input. The minimum value that the input values might assume. For example, if array[] contains the output of a 12-bit analog-to-digital converter, one might set min=-2048. Of course the minimum value in the input array might be considerably larger than this (i.e., closer to zero!) as it should be if the ADC is being operated well within its dynamic range limits. If you're not sure of the smallest value produced in array[], set min smaller (i.e., more negative) than needed; the only cost is storage, not computing time.
max: Input. The maximum value that the input values might assume. For example, if array[] contains the output of a 12-bit analog-to-digital converter, one might set max=2047. The previous comments apply here as well: set max larger than needed, if you are not sure about the largest value contained in array[].
print: Input. If this is non-zero, then the routine will print some statistical information about the distribution of the points.

The value returned by is_gaussian() is 1 if the distribution of points is consistent with a Gaussian normal distribution with no outliers, and 0 if the distribution contains outliers.

The way this is determined is as follows (we use to denote the array element array[i]):

• First, the mean value of the distribution is determined using the standard estimator:
 (16.5.327)

• Next, the points are binned into a histogram . Here is the number of points in the array that have value . The sum over the entire histogram is the total number of points: .
• Then the standard deviation is estimated in the following robust way. It is the smallest integer for which
 (16.5.328)

This value of is a robust estimator of the standard deviation; the range of about the mean includes 68% of the samples. (Note that since the values of are integers, we replace by the closest integer to it, in the previous equation).
• Next, the number of values in the range from one standard deviation to three standard deviations is found, and the number of values in the range from three to five standard deviations is found. This is compared to the expected number:
 (16.5.329)

• If there are points more than five standard deviations away from the mean, or significantly more points in the 3 to 5 standard deviation range than would be expected for a Gaussian normal distribution, then is_gaussian() returns 0. If the numbers of points in each range is consistent with a Gaussian normal distribution, then is_gaussian() returns 1.
Authors: Bruce Allen, ballen@dirac.phys.uwm.edu
Comments: This function should be generalized in the obvious way, to look at one sigma wide bins in a more systematic way. It can eventually be replaced by a more rigorously characterized test to see if the distribution of sample values is consistent with the normal IFO operation.

Next: Function: clear() Up: GRASP Routines: General purpose Previous: Function: binshort()   Contents
Bruce Allen 2000-11-19