Next: Optimization and computation-speed considerations Up: GRASP Routines: Template Bank Previous: Example: template program   Contents

## Example: multifilter program

0 This example implements optimal filtering by a bank of properly-spaced templates. One could do this with trivial modifications of the example optimal program given earlier. Here we have shown something slightly more ambitious. The multifilter program is an MPI-based parallel-processing code, designed to run on either a network of workstations or on a dedicated parallel machine. It is intended to illustrate a particularly simple division of labor among computing nodes. Each segment of data (of length NPOINT) is broadcast to the next available node. That node is responsible for filtering the data through a bank of templates, chosen to cover the mass range from MMIN to MMAX. The output of each one of these filters is a set of 11 signals, which measure the following quantities:
1. The largest signal-to-noise ratio (SNR) at the output of the filter, for the given segment of data,
2. The distance for an optimally-oriented source, in Mpc, at which the SNR would be unity.
3. The amplitude of the zero-degree phase chirp matching the observed signal.
4. The amplitude of the ninety-degree phase chirp matching the observed signal.
5. The offset of the best-fit chirp into the given segment of data
6. The offset of the impulse into the given segment of data, which would produce the observed output.
7. The time of that impulse, measured in seconds from the start of the data segment,
8. The time (in seconds, measured from the start of the data segment) at which an inspiral, best fitting the observed filter output, would have passed through the start frequency FLO.
9. The time (in seconds, measured from the start of the data segment) at which an inspiral, best fitting the observed filter output, would have passed through coalescence.
10. The observed average value of the output SNR (should be approximately unity).
11. The probability, using the splitup technique described earlier, that the observed filter output is consistent with a chirp plus stationary detector noise.

For completeness, we give this code in its entirety here. We also show some typical graphs produced by the MPE utility nupshot which illustrates the pattern of communication and computation for an analysis run. For these graphs, the analysis run lasted only about four minutes, and analyzed about three minutes of IFO data. We have performed an identical, but longer run, which analyzed about five hours of IFO ouput in just over three hours, running on a network of eight SUN workstations. The data is analyzed in 6.5 second segments, each of which is processed through a set of 66 filter templates completely covering the mass range from 1.2 to 1.6 solar masses. For the run that we have profiled here, STORE_TEMPLATES is set to 1. This means that each slave allocates memory internally for storing the Fourier-transformed chirp signals; the slaves only compute these once. However this does place demands on the internal storage space required - in the run illustrated here each individual process allocated about 34 Mbytes of internal memory. Another version of the code has also been tested; in this version the slave nodes compute the filters and Fourier transform them each time they are needed, for each new segment of data. This code has STORE_TEMPLATES set to 0. This is less efficient computationally, but requires only a small amount of internal storage. For a given hardware configuration, the optimal balance between these extremes, and between the amount of redundant broadcasting of data, depends upon the relative costs of communication and computation, and the amount of internal storage space available.

Based on these figures, it is possible to provide a rough table of computation times. These are given in tabular form in Table .

Table: Approximate computation times for different elements of the optimal-filtering process.
 Task Color Approximate time Processing done data slaves dark blue 350 msec transfer 384 kbytes data master yellow 1 msec transfer 3 kbytes correlate purple 500 msec 2 ffts of 64k floats, and search splitup (likelihood) light blue 330 msec several runs through 64k floats real FFT (one phase) green 150 msec 1 fft of 64k floats compute template gray 350 msec compute 2 arrays of 18k floats orthonormalize templates wheat 25 msec several runs through 64k floats

Author: Bruce Allen, ballen@dirac.phys.uwm.edu
Comments: There are many other ways in which this optimal filtering code could be parallelized. This program illustrates one of the possibilities. Other possibilities include: maintaining different templates on different processes, and broadcasting identical IFO data to these different processes, or parallelizing across both data and templates.

Includes/multifilter.tex

Next: Optimization and computation-speed considerations Up: GRASP Routines: Template Bank Previous: Example: template program   Contents
Bruce Allen 2000-11-19