next up previous contents
Next: Optimization and computation-speed considerations Up: GRASP Routines: Template Bank Previous: Example: template program   Contents

Example: multifilter program

0 This example implements optimal filtering by a bank of properly-spaced templates. One could do this with trivial modifications of the example optimal program given earlier. Here we have shown something slightly more ambitious. The multifilter program is an MPI-based parallel-processing code, designed to run on either a network of workstations or on a dedicated parallel machine. It is intended to illustrate a particularly simple division of labor among computing nodes. Each segment of data (of length NPOINT) is broadcast to the next available node. That node is responsible for filtering the data through a bank of templates, chosen to cover the mass range from MMIN to MMAX. The output of each one of these filters is a set of 11 signals, which measure the following quantities:
  1. The largest signal-to-noise ratio (SNR) at the output of the filter, for the given segment of data,
  2. The distance for an optimally-oriented source, in Mpc, at which the SNR would be unity.
  3. The amplitude $\alpha$ of the zero-degree phase chirp matching the observed signal.
  4. The amplitude $\beta$ of the ninety-degree phase chirp matching the observed signal.
  5. The offset of the best-fit chirp into the given segment of data
  6. The offset of the impulse into the given segment of data, which would produce the observed output.
  7. The time of that impulse, measured in seconds from the start of the data segment,
  8. The time (in seconds, measured from the start of the data segment) at which an inspiral, best fitting the observed filter output, would have passed through the start frequency FLO.
  9. The time (in seconds, measured from the start of the data segment) at which an inspiral, best fitting the observed filter output, would have passed through coalescence.
  10. The observed average value of the output SNR (should be approximately unity).
  11. The probability, using the splitup technique described earlier, that the observed filter output is consistent with a chirp plus stationary detector noise.

For completeness, we give this code in its entirety here. We also show some typical graphs produced by the MPE utility nupshot which illustrates the pattern of communication and computation for an analysis run. For these graphs, the analysis run lasted only about four minutes, and analyzed about three minutes of IFO data. We have performed an identical, but longer run, which analyzed about five hours of IFO ouput in just over three hours, running on a network of eight SUN workstations. The data is analyzed in 6.5 second segments, each of which is processed through a set of 66 filter templates completely covering the mass range from 1.2 to 1.6 solar masses. For the run that we have profiled here, STORE_TEMPLATES is set to 1. This means that each slave allocates memory internally for storing the Fourier-transformed chirp signals; the slaves only compute these once. However this does place demands on the internal storage space required - in the run illustrated here each individual process allocated about 34 Mbytes of internal memory. Another version of the code has also been tested; in this version the slave nodes compute the filters and Fourier transform them each time they are needed, for each new segment of data. This code has STORE_TEMPLATES set to 0. This is less efficient computationally, but requires only a small amount of internal storage. For a given hardware configuration, the optimal balance between these extremes, and between the amount of redundant broadcasting of data, depends upon the relative costs of communication and computation, and the amount of internal storage space available.

Figure: Output of the nupshot profiling tool, showing the behavior of the multifilter program running on a workstation network of 8 machines (the fastest of these are Sparc-20 class processors). This shows the first 8 seconds of operation (time on the horizontal axis). The gray segments show the slave processes receiving the template list. During the orange segments, the slave processes are waiting for data; the blue segments show the master transmitting data to each slave. During the light gray segments, the slaves are computing the templates, during the green segments they are computing the FFT's of those templates, and during the purple segments they are correlating the data against the templates. During the brown segment, the master is waiting to receive data back from the slaves.

Figure: This is a continutation of the previous figure. Slave number 1 has completed its computation of the templates, and during the orange segment, waits to make a connection with the master. This is followed by a (very small) yellow segment, during which the slave transmits data back to the master, and a blue segment during which the master transmits new data to slave number 1. Immediately after this, slave number 1 begins a new (purple) sequence of correlation calculations on the newly received block of data. Notice that because slave 1 has already computed the templates, the light gray and green operations are no longer needed.

Figure: This is a continutation of the previous figure, and represents the ``long-term" or ``steady-state" behavior of the multiprocessing system. In this state, the different processors are spending all of their time doing correlation measurements of the data, as indicated by the purple segments, and the master is waiting for the results of the analysis (brown segments).

Figure: This is a continuation of the previous figure, and shows the termination of some of the slave processes (all the data has been analyzed, and there is no new data remaining). The blue segments (data being sent to slaves) are actually termination messages being sent to the different processes 2,3,4 and 6. Processes 5 and 7 are still computing. In the case of process 7, the data being analyzed contains a non-stationary ``spurion" which triggered most of the filters beyond a pre-set threshold level. As a result, process 7 is performing some additional computations (the split-up likelihood test, shown as light blue segments) on the data.

Based on these figures, it is possible to provide a rough table of computation times. These are given in tabular form in Table [*].

Table: Approximate computation times for different elements of the optimal-filtering process.
Task Color Approximate time Processing done
data $\rightarrow$ slaves dark blue 350 msec transfer 384 kbytes
data $\rightarrow$ master yellow 1 msec transfer 3 kbytes
correlate purple 500 msec 2 ffts of 64k floats, and search
splitup (likelihood) light blue 330 msec several runs through 64k floats
real FFT (one phase) green 150 msec 1 fft of 64k floats
compute template gray 350 msec compute 2 arrays of $\approx$ 18k floats
orthonormalize templates wheat 25 msec several runs through 64k floats

Author: Bruce Allen,
Comments: There are many other ways in which this optimal filtering code could be parallelized. This program illustrates one of the possibilities. Other possibilities include: maintaining different templates on different processes, and broadcasting identical IFO data to these different processes, or parallelizing across both data and templates.


next up previous contents
Next: Optimization and computation-speed considerations Up: GRASP Routines: Template Bank Previous: Example: template program   Contents
Bruce Allen 2000-11-19