# LSC-Virgo Burst Analysis Working Group

## Use of the code

bash-3.00$./lalapps_StringSearch -h All arguments are required except -n, -h, -w, -g, -o, -x, -y, -z, -b, -r -a, -l, -p, -T and -i. One of -k or -c must be specified. They are: --bw-flow (-f) FLOAT Low frequency cut-off. --sample-rate (-s) FLOAT Desired sample rate (Hz). --bank-lowest-hifreq-cutoff (-H) FLOAT Template bank lowest high frequency cut-off. --bank-freq-start (-L) FLOAT Template bank low frequency cut-off. --threshold (-t) FLOAT SNR threshold. --frame-cache (-F) STRING Name of frame cache file. --channel-name (-C) STRING Name of channel. --injection-file (-i) STRING Name of xml injection file. --outfile (-o) STRING Name of xml output file. --gps-start-time (-S) INTEGER GPS start time. --gps-end-time (-E) INTEGER GPS end time. --settling-time (-T) INTEGER Number of seconds to truncate filter. --trig-start-time (-g) INTEGER GPS start time of triggers to consider. --pad (-p) INTEGER Pad the data with these many seconds at beginning and end. --short-segment-duration (-d) INTEGER Duration of shor segments. They will overlap by 50%. --kink-search (-k) FLAG Specifies a search for string kinks. --cusp-search (-c) FLAG Specifies a search for string cusps. --test-gaussian-data (-n) FLAG Use unit variance fake gaussian noise. --test-white-spectrum (-w) FLAG Use constant white noise (used only in combination with fake gaussian noise; otherwise ignored). --cluster-events (-l) REAL4 Cluster events with input timescale. --print-spectrum (-a) FLAG Prints the spectrum to Spectrum.txt. --print-fd-filter (-b) FLAG Prints the frequency domain filter to Filter-.txt. --print-td-filter (-r) FLAG Prints the time domain filter to FIRFilter-.txt. --print-snr (-x) FLAG Prints the snr to stdout. --print-data (-y) FLAG Prints the post-processed (HP filtered, downsampled, padding removed, with injections) data to data.txt. --print-injection (-z) FLAG Prints the injeciton data to injection.txt. --help (-h) FLAG Print this message.  Notes on command line options: • bw-flow the high pass filtering frequency. The data is high pass filtered twice, first before casting to REAL4 precision and then after, and if, an injection is made. v1.89 of the code, uses a 4th order Butterworth filter with 90\% of the amplitude at the frequency specified by the user. • sample-rate sets the sampling rate at which the analysis will be performed. Typically data is read in at 16384~Hz and downsampled to a more suitable frequency. The default value is 4096~Hz. • bank-lowest-hifreq-cutoff the lowest high frequency cutoff the code will consider. The largest high frequency cutoff is given by the Nyquist frequency of the downsampled time-series. The use of this argument is described in the template placement section below. • bank-freq-start the low frequency cutoff for the templates used in the match filter. Care should be taken not to start the bank at a frequency that's too close to the high-pass filtering frequency. A warning message is printed when this happens. • threshold the SNR threshold above which events will be considered. • frame-cache is the name of the frame-cache file. • channel-name the name of the channel to be read off the frames. • injection-file the name of an xml file to be used in injections. This argument is optional. • outfile the name of the output xml file. If no name is specified the filename defaults to$<$IFO$>$-STRINGSEARCH-$<$GPSSTART$>$-$<$DURATION$>$.xml; all the info needed is read from the user input command line arguments. • gps-start-time and gps-end-time specify the GPS start and end times (in seconds), of the data to be read in. The values have to jibe with whatever data in the frame cache file, of course. • settling-time half of the length of the filter produced by the InvSpecTrunc procedure (described below). This argument should be consistent (i.e., longer than) with the duration of the template. • trig-start-time the time after which triggers will be printed to the output xml file. It is main purpose is for use in the DAG. • pad the amount of data that will be thrown out at each end of the stretch. Padding is used to remove the ringing of the IIR filters used to high- and low-pass the data. • short-segment-duration length of the short segments into which the long segment specified by the GPS start and end times will be divided into. Because the Median-Mean method is used to estimate the PSD, the relation between the length of the long segment and the sub-segments it will be divided into is constrained. If$T$is the duration of the long segment, and$t$the duration of the short segments then T/t-0.5 must be an odd integer The reason for this is as follows. The median-mean method assumes the long segments, T are divided into N short segments of length t which overlap by 50%. It then divides the short segments into even numbered (0,2,4,6...) and odd numbered (1,3,5,7...). It takes the median of the even numbered, the median of the odd numbered and averages the two. Because of this, there must be an odd number of even segments and also an odd number of odd segments (the median is not well defined otherwise). If there are a total N overlapping segments, then N/2 must be odd. The length spanned by, say, the even segments is N/2 t = T - 1/2 t because the t/2 missing is in the very last odd segment. The figure below shows an example of this segmentation with N=6. • kink-search or cusp-search specify the type of search to be run. • test-gaussian-data replaces the data read in with white gaussian noise with unit variance. • test-white-noise Uses the analytic form for the PSD of white gaussian noise ($S_h=2\sigma^2\Delta t$). It should be used in combination with the test-gaussian-data flag. Otherwise it's ignored. • cluster-events Runs an extra clustering step among the triggers generated by different templates. The time-scale used is input by the user. • print-data Remember this data flag description is inconsistent with what the code actually does. It's no big deal, but must remember to fix this. ## Function description The following is a line by line description of version 1.101 of the string code ( lalapps/src/string/StringSearch.c). • ReadCommandLine function The first function called by the code is ReadCommandLine (line 241), which, surprise, reads the command line. It also checks that the minimum number of variables needed to run the code have been set. The function also does some consistency checking: (1) It checks that the T/t-0.5 is an integer as required by the median mean spectrum estimation used, if it is not it prints out an error message and exits. (2) That the starting frequency of the template bank is sufficiently far from the high pass filtering frequency. If it is not it prints out a warning and continues. Finally, it stores the in and out times of the search summary table. The in times are specified by user command line arguments. The out times specify a duration, which is shorter than that given by the in times, by twice the length of the pad, as well as two quarters of the short segment length. Immediately after, on lines 243-247 the high-pass filter parameters are set. • ReadData function On line 249 the ReadData function is called. This function reads the data. It sets the global variables GV.duration [INT4] and GV.gpsepoch [LIGOTimeGPS]. It reads in the sampling time with a call to LALFrGetREAL8TimeSeries (line 943), and uses this to allocate the data, both the raw data, GV.ht [REAL8TimeSeries], and the data to be processed, GV.ht_proc [REAL4TimeSeries]. On line 952 it sets the global variable SAMPLERATE. On lines 954-958 the data (ht and ht_proc) is set to zero for good measure. On line 961 it checks that the fake Gaussian noise substitution flag has not been set and if not then reads in the data. It then multiplies the strain data by SCALE (which is set to 10^20 on line 96) If this flag has been set then it creates a fake data set with unit variance Gaussian noise. Or we can think of it as creating a series of variance 10^-40, which is then re-scaled by a factor 10^20 (the latter jibes with our conventions of injection amplitudes). • AddInjections function If an injection xml file has been specified, then the function AddInjections is called on line 253. This function calls LALSimBurstTableFromLIGOLw (line 373), which reads the injection data from the xml file. Then XLALBurstInjectSignals is called (line 383), which injects the signals as described in Section II above. Lines 384-386 put the injection into GV.ht_proc (which is why it's been zeroed out). Back in main, the injection can be printed if the print injection flag has been set (line 254). • WindowData function On line 257 the function WindowData is called. This function windows the data with a Tukey window. The reason to do this is to attenuate the ringing of the Butterworth filters, by slowly ramping up to the full amplitude of the data. The Tukey window affects 0.1% of the data (that's what the variable r is). The Tukey window is generated on the spot (lines 333-346) and applied to the data (lines 348-351). • ProcessData function On line 259 the function ProcessData is called. This function high pass filters the data in GV.ht in double precision (line 903) and copies onto GV.ht_proc, casting it into single precision. It then destroys the double precision data (line 912). • DownSample function The data is downsampled with a call to DownSample on line 270. This function calls LALResampleREAL4TimeSeries on line 891, and downsamples to the sampling rate specified by the user, modulo whatever constraints LALResampleREAL4TimeSeries imposes. It uses a Butterworth (line 889) filter to low pass filter the data before downsampling. • Pad removal On lines 274-276 the function LALResizeREAL4TimeSeries is called to remove the pad. At this stage all the IIR filtering has been performed and it is safe to remove the pad. The global variable GV.duration is adjusted in line 281 to reflect this change. • AvgSpectrum function The power spectral estimation is performed with a call to AvgSpectrum on line 283. This function creates the global variable that will contain the spectrum GV.Spec (line 848) as well as the forward and reverse plans that will be used in the FFTs. The spectrum is computed using the Median-Mean method, using segments of length t (CLA.ShortSegDuration in the code). Thus, the number of points of the spectrum will be N= t/(2 Dt) +1, which is used in line 848. On line 857 there is a check for the fake Gaussian noise flag and the white spectrum flag; if both have been set the spectrum is set to its analytic form (Sh=2 (sigma)^2 Dt) and the function is exited. If either flag has not been set, the spectrum is computed using a call to XLALREAL4AverageSpectrumMedianMean (line 873). The window applied to each short segment is a Hann window (line 872). The Welch window doesn't work well at all. On line 285 the spectrum is printed to a file called Spectrum.txt if the print spectrum flag is set. • CreateTemplateBank function On line 287 the CreateTemplateBank function is called. This function creates the template bank, which in our case consists of a set of high frequency cutoffs. The high frequency cutoff of the first template is the Nyquist frequency of the time series. The inner product of the template with itself t1t1=(t1|t1) (as defined by equation 15 of the big write-up) is evaluated on lines 793-797. The first template information is stored in strtemplate[0] in lines 800-802: the high frequency cutoff, the template normalisation, and the mismatch to the previous template (zero since it is the first template). The global strtemplate is an array that contains all the templates. The first template is the one with the largest un-normalised inner product, and the remaining templates will have a lower frequency cutoff (and thus smaller inner product). The fitting factor of templates t1 and t2 is defined as F12=(t1|t2)/((t1|t1)(t2|t2))^1/2=1-epsilon, where epsilon is the mismatch. In our case the only difference between the templates is the high frequency cutoff, and, if t2 has a lower frequency cutoff, (t1|t2)= (t2|t2). Thus, the mismatch is, epsilon = 1-(t2|t2)/(t1|t1) Lines 812-835 find the remaining templates. We start at the second to highest frequency index and subtract off from t2t2 (which has been set to t1t1 on line 804). The process of subtracting contributions continues until the quantity epsilon is at or above 0.05. At this point a new template is defined, this new template is set as the starting point of the subtraction process (by setting t1t1=t2t2 on line 822), and the template high frequency cutoff, mormalisation, and mismatch are stored on lines 823-825. The process begins anew, with the mismatch epsilon now being computed relative to the latest template. The global variable NTemplates is set to the number of templates on line 836. • CreateStringFilters function On line 289 the CreateStringFilters function is called. This function creates a filter for each template, implementing a variant of the so-called InvSpecTrunc procedure. The difference between the InvSpecTrunc procedure implemented here and the usual one is that we've rolled in the template along with the noise. A REAL4 vector, vector, is created on line 664 and a COMPLEX8 vector, vtilde, is created on line 665. These will be used as temporary place-holders for the time-domain and frequency domain versions of the filters. A loop over templates start on line 669. A separate filter will be created for each template. Inside this loop the filter for template m is created (line 675), and its frequency resolution specified (line 674). The real part of the COMPLEX8 vector vtilde is populated with the quantity (f^-4/3 / S_h(f))^1/2 between the low frequency cutoff start of the template bank (f_low_cutoff_index) and the template-dependent high frequency cutoff (f_high_cutoff_index). Lines 679-685. The rest of the elements, including DC and Nyquist are set to zero (lines 688-695). On line 699 the contents of vtilde are inverse fast Fourier transformed into vector. The contents of vector are then multiplied times the frequency resolution to ensure unit sanity. The effect of this is to create a time-domain FIR filter which is symmetric around$t=0$. The truncation occurs on lines 706-713. Here, only the first and last CLA.TruncSecs/2 of vector are kept. The rest of the vector is set to zero. On line 716 a forward FFT is taken to take the FIR filter vector back into the frequency domain and stored in vtilde. On lines 719-725, vtilde is multiplied by the time resolution to ensure unit sanity, and strtemplate[m].StringFilter is finally populated with the square of vtilde. Finally, the DC and Nyquist components of strtemplate[m].StringFilter below the template cutoff are set to zero (lines 728-729). If the print filter flag is set, lines 733-739, strtemplate[m].StringFilter is printed out (this happens for every m) to a file called Filter-m.txt. If the print fir flag is set, lines 741-771, the FIR filter is printed out (this also happens for every m) to a file called FIRFilter-m.txt. The procedure is longer because strtemplate[m].StringFilter needs to be inverse Fourier transformed. It's not good enough to print out vector because it is (in fact) the square root of the FIR filter. • FindStringBurst function The function FindStringBurst is called on line 291. This function does the match filtering of the data, given the filter created and the template bank found. A REAL4 vector, called vector, is created on line 598 and a COMPLEX8 vector, called vtilde, is created on line 601. These will be used as temporary place-holders for the snr (match-filter output) and FFTd data. This function has a loop over templates, and a loop over short segments inside of it. The loop over templates goes from 0 to NTemplates, and the loop over short segments goes from 0 to N=2T/t-1. This follows from the segmnetation equations described above. For every template and every short overlapping chunk an FFT is taken of the data on line 614. The FFTd data is multiplied by the appropriate filter (strtemplate[m].StringFilter) and also by the time resolution (GV.ht_proc.deltaT) which is not included in LALForwardRealFFT (lines 618-622). The output of this is then reverse FFTd (line 624) and multiplied by the frequency resolution (GV.Spec.deltaF), divided the template normalisation (which is not included int the filter) and multiplied by a factor of 2. The reason for the factor of 2, rather than 4 is explained in the long technical document where the FindStringBurst function is described. After computing the SNR the function FindEvents is called (line 605). This function takes as arguments the CLA (command line arguments) structure, the REAL4Vector vector that contains the SNR, the short segment number i, the template number m and a pointer to the single burst table where the events will be stored. This function is described below. Once the loops over templates and segments are done, the events are sorted (lines 641-643). We are aware of the fact that we take an FFT of the data for every template; this is inefficient. As long as there aren't too many templates this inefficiency is something we can live with. • FindEvents function This function is called on line 636 for every short segment of the data, and every template. It takes as arguments the CLA (command line arguments) structure, the REAL4Vector vector that contains the SNR, the short segment number i, the template number m and a pointer to the single burst table where the events will be stored. Its purpose is to find clusters of SNR values above threshold in the REAL4Vector vector that contains the SNR. These clusters will be converted into single burst events. On lines 498-504 the SNR is printed to stdout if the print snr flag is set. Then a loop over the inner half (the first quarter and last quarter are thrown out) of vector starts on line 507. The loop is over the variable p, which is the index of the REAL4Vector vector that contains the SNR. Inside this loop the temporary variable maximum (initialised to zero) will hold the maximum SNR for each cluster and pmax the value of the index for that maximum. timeNS is the gps time in nanoseconds. On line 515 there is a check for the start of cluster: Is the SNR above the user input threshold (CLA.threshold) and is the time after the user input trigger start time (CLA.trigstarttime)? If both conditions are met then the following variables are declared: pend and pstart which contain the ending and starting indices of the cluster; peaktime and starttime which will contain the time of the peak of the cluster, and the start time of the cluster in nanoseconds; and the REAL8 duration which will contain the duration of the cluster. On line 521 the variable timeNS is re-initialised to the starting time of the short segment being analysed. Lines 523-532 allocate the memory for the new event. Lines 540-555 is where the actual clustering takes place. This is the heart of the function. Basically it advances the index p (line 5554) as long as (lines 540-541): 1) the SNR exceeds threshold, or 2) even though we are below threshold, we are still within the user input clustering time (CLA.cluster). The variable pend is the latest point above threshold (lines 550-553) and it is used in the conditional to compute the distance from the last cluster (when we are below threshold). Also (lines 544-548) the maximum value of the SNR is kept and updated in the variable maximum, and the index corresponding to that maximum (pmax). Finally, when we are no longer above threshold, and have moved forward a time longer than the clustring time, or have moved to the edge of the last quarter of the segment we exit the conditional in line 540. The event parameters are defined, the peaktime is computed in line 557 by adding pmax times the sampling in nanoseconds (*1e9) to timeNS (which was re-initialised on line 521 to the starting time of the short segment being analysed). The duration (line 558) is computed using pstart and pend, the starting index of the cluster, and the last point above threshold, respectively. The starttime is similarly computed on line 560. Lines 563-581 fill in the rest of the single burst table elements. A few things of note: (1) A 1 sample fuzz is added to the start and end of each event. We added this at some point to make sure we could find low injections with only one point above threshold and whose peak had been shifted by the noise. We're not sure we should keep this but it doesn't seem to do much harm. (2) The central frequency and bandwidth of the cluster are computed using the template high frequency cutoff (strtemplate[m].f) and the starting frequency of the bank (CLA.fbankstart) on lines 576-577. (3) The amplitude of the event A is computed on line 579. This is the single most interesting output of the filter. \item The value stored in confidence is the negative of the SNR. This is a hack. (4) The clustering time-scale is also stored as part of the trigger (line 581). This will be used later to cluster triggers between different templates • Clustering among templates On lines 293-299, the single burst events are clustered again, this time between templates, if the clustering time-scale has been set. The functions used for clustering can be found in lal/packages/tools/src/SnglBurstUtils.c and are very similar (in some cases the same as) to the functions used in the excess power search. The clustered event list is then sorted (line 298). • OutputEvents function Finally, on line 303, the function OuputEvents is called. This function write the events single burst table to an xml file. On line 425, it checks whether an output file name has been given. If not the filename defaults to$<$IFO$>$-STRINGSEARCH-$<$GPSSTART$>$-$<$DURATION$>$.xml. The function writes all the necessary pieces of the xml (lines 440-466), and frees the memory associated with the events (lines 472-477) and the rest of the tables in the xml file (lines 479-487). ## Injections The equations are in the long technical document. Will transfer them here or make shorter tech document about them (because of equations). But still need to look at: • LALSimBurstTableFromLIGOLw • XLALBurstInjectSignals ## Other codes used by StringSearch • lal/packages/tools/src/SnglBurstUtils.c ## Cosmic string model codes The very last step of the analysis involves convolving the efficiency curves obtained for injections louder than the loudest event (and injections that survive at all) with the comologically expected rate of events. This turns a limit on the number of events into limits on the fundamental string parameters. These are codes that convolve the efficiency curves with cosmological rates to produce effective rates. • lalapps/src/string/cs_gamma.c • lalapps/src/string/cs_gammaLargeLoops.c A paper describing how these are calculated is available here. • How use the codes and generate exclusion plots:  We start from the efficiency as a function of amplitude. There are two such files, string_efficiency_above_loudest.dat (for injections recovered with amplitudes above our loudest event) and string_efficiency.dat (for all injections recovered) Then run v1.10 of lalapps/src/string/cs_gamma as follows (produces a file called gamma.dat): lalapps/src/string/cs_gamma --frequency 75 --log-Gmustart -12 --log-Gmuend -4 --nGmu 50 --log-epsilonstart -12 --log-epsilonend 0 --nepsilon 50 --index 1.0 --log-pstart 0 --log-pend 1 --np 1 --efficiency-file string_efficiency.dat mv gamma.dat gamma_S4sens.dat and lalapps/src/string/cs_gamma --frequency 75 --log-Gmustart -12 --log-Gmuend -4 --nGmu 50 --log-epsilonstart -12 --log-epsilonend 0 --nepsilon 50 --index 1.0 --log-pstart 0 --log-pend 1 --np 1 --efficiency-file string_efficiency_above_loudest.dat mv gamma.dat gamma_S4UL.dat The outputs are two files renamed to gamma_S4sens.dat and gamma_S4UL.dat. These files contain the string models and values of the effectice rate, given the efficiency data. We then run v1.5 of the matlab script lalapps/src/string/contour_plotter.m (suitably edited to find the gamma files).  • Lets start by looking at the draft to remind ourselves of what we're doing. • Next we look at the code . ## Pipeline • Xavi: The segments used are in a file called S4H1H2L1segments.txt in lalapps/src/string DQ flag info is in that file (it was generated with segwizard). Flags used are: H1:ADC_OVERFLOW H1:INJECTION_BURST H1:INJECTION_INSPIRAL H1:INJECTION_PULSTART H1:INJECTION_STOCHASTIC H1:JET H1:OUTSIDE_S4 H1:PRELOCKLOSS_30 H1:SEISMIC_0D9_1D1 H2:ADC_OVERFLOW H2:INJECTION_BURST H2:INJECTION_INSPIRAL H2:INJECTION_PULSTART H2:INJECTION_STOCHASTIC H2:JET H2:OUTSIDE_S4 H2:PRELOCKLOSS_30 H2:SEISMIC_0D9_1D1 H2:WIND_OVER_30MPH L1:ADC_OVERFLOW L1:INJECTION_BURST L1:INJECTION_INSPIRAL L1:INJECTION_PULSTART L1:INJECTION_STOCHASTIC L1:NO_DATA L1:NO_RDS L1:OUT_OF_LOCK L1:OUTSIDE_S4 L1:PRELOCKLOSS_30  The pipeline parameters (segment legnths, snr thresholds etc) are in a file called S4_StringDag.ini in lalapps/src/string The triggers are generated via a DAG ctreated by lalapps/src/string/lalapps_cosmicstring_pipe which uses the parameters in lalapps/src/string/S4_StringDag.ini: The DAG generator sets up a DAG to do the datafinding, creation of injection xml files (if asked to), and running of the string search. • Kipp: The post processing tools used in the cosmic string search are part of the excess power search. Post processing involves taking the triggers generated in the first step and running an amplitude cut between H1 and H2 triggers and triple coincidence with L1. Post-processing codes also do injection finding, efficiency computation, and background estimation. Technical documentation for the coincidence code and parameters, amplitude cuts code and parameters, injection finding and efficiency computation as well as background estimation are available in cvs lalapps/src/power/doc. A version of the document from February 4th 2008 is available here Kipp Cannon will lead this part of the review. ## Software Installation Howto ### Overview On the CIT cluster (running FC4), the following software will need to be installed in your home directory: • sqlite version 3.5.9 • pysqlite version 2.4.1 • numpy version 1.0.4 • scipy version 0.6.0 • glue cvs tag stringsearch-s4-2008-04-09-v2 • lal cvs tag stringsearch-s4-2008-04-09-v2 • lalapps cvs tag stringsearch-s4-2008-04-09-v2 • pylal cvs tag stringsearch-s4-2008-04-09-v2 The instructions below explain how to do this in detail. What we're going to do is download all the source packages into a directory, pick a location where they will be installed, and one-by-one compile install all the software into that same location. We will need to set three environment variables to enable the software. This procedure will not interfere with any other software you have installed, and the software will be easily removed when you are done. The stringsearch-s4-2008-04-09-v1 and stringsearch-s4-2008-04-09-v2 tags differ by updated S4H1H2L1segments.txt and S4H1H2L1vetotimes.txt segment lists, and an updated S4_StringDag.ini configuration file with a corrected [injection] section and corrected LSCdataFind command line options. All three files reside in lalapps/src/string/, and the differences can be checked with the cvs web interface. One final note: please do not improvise. ### Preparation Remove the /opt/lscsoft software from your build environment: 1. touch${HOME}/.nolscsoft

Create and source an environment setup script (this assumes you are using an sh-like command interpreter, like bash):

1. mkdir -p ${HOME}/s4string ; cd${HOME}/s4string
2. cat >environment.sh <<_EOF_
export PKG_CONFIG_PATH="${HOME}/s4string/local/lib/pkgconfig:/opt/lscsoft/libframe/lib64/pkgconfig:/opt/lscsoft/libmetaio/lib64/pkgconfig:${PKG_CONFIG_PATH}"
export PYTHONPATH="${HOME}/s4string/local/lib/python2.4/site-packages:${HOME}/s4string/local/lib64/python2.4/site-packages:${PYTHONPATH}" export PATH="${HOME}/s4string/local/bin:${PATH}" export TMPDIR="/usr1/${USER}"
_EOF_
3. source environment.sh

1. mkdir -p ${HOME}/s4string/src ; cd${HOME}/s4string/src
2. wget http://www.sqlite.org/sqlite-amalgamation-3.5.9.tar.gz
3. tar -xzf sqlite-amalgamation-3.5.9.tar.gz
4. wget http://oss.itsystementwicklung.de/download/pysqlite/2.4/2.4.1/pysqlite-2.4.1.tar.gz
5. tar -xzf pysqlite-2.4.1.tar.gz
6. wget http://downloads.sourceforge.net/numpy/numpy-1.0.4.tar.gz
7. tar -xzf numpy-1.0.4.tar.gz
8. wget http://prdownloads.sourceforge.net/scipy/scipy-0.6.0.tar.gz?download
9. tar -xzf scipy-0.6.0.tar.gz
10. cvs -d :pserver:anonymous@gravity.phys.uwm.edu:2402/usr/local/cvs/lscsoft co -r stringsearch-s4-2008-04-09-v2 glue
11. cvs -d :pserver:anonymous@gravity.phys.uwm.edu:2402/usr/local/cvs/lscsoft co -r stringsearch-s4-2008-04-09-v2 lal
12. cvs -d :pserver:anonymous@gravity.phys.uwm.edu:2402/usr/local/cvs/lscsoft co -r stringsearch-s4-2008-04-09-v2 lalapps
13. cvs -d :pserver:anonymous@gravity.phys.uwm.edu:2402/usr/local/cvs/lscsoft co -r stringsearch-s4-2008-04-09-v2 pylal

### Compile and Install

1. sqlite:
1. cd ${HOME}/s4string/src/sqlite-amalgamation-3.5.9/ 2. ./configure --enable-readline --prefix=${HOME}/s4string/local
3. make install
2. pysqlite:
1. cd ${HOME}/s4string/src/pysqlite-2.4.1/ 2. cat >setup.cfg <<_EOF_ [build_ext] define= include_dirs=${HOME}/s4string/local/include
library_dirs=${HOME}/s4string/local/lib rpath=${HOME}/s4string/local/lib
libraries=sqlite3
_EOF_
3. python setup.py install --prefix=${HOME}/s4string/local 3. numpy: 1. cd${HOME}/s4string/src/numpy-1.0.4/
2. cat >numpy/distutils/site.cfg <<_EOF_
[DEFAULT]
include_dirs = /usr/include
library_dirs = /usr/lib64

[lapack_opt]
libraries = lapack, f77blas, cblas, atlas
_EOF_
3. python setup.py install --prefix=${HOME}/s4string/local 4. scipy: 1. cd${HOME}/s4string/src/scipy-0.6.0/
2. ATLAS=/usr/lib64/atlas python setup.py install --prefix=${HOME}/s4string/local 5. glue: 1. cd${HOME}/s4string/src/glue
2. python setup.py install --prefix=${HOME}/s4string/local 6. lal: 1. cd${HOME}/s4string/src/lal
2. ./00boot
3. ./configure --prefix=${HOME}/s4string/local --with-gcc-flags 4. make install 7. lalapps: 1. cd${HOME}/s4string/src/lalapps
2. ./00boot
3. ./configure --prefix=${HOME}/s4string/local --with-gcc-flags --enable-condor 4. make install 8. pylal: 1. cd${HOME}/s4string/src/pylal

## Running the pipeline Howto

### Environment

Each time you log out and log back into the cluster, you must re-enable the software by re-sourcing the environment configuration script you created above:

1. source ${HOME}/s4string/environment.sh ### Pipeline Configuration To build the analysis pipeline you require segment lists and a pipeline configuration file: 1. cd${HOME}/s4string
2. ln -s src/lalapps/src/string/S4H1H2L1segments.txt .
3. ln -s src/lalapps/src/string/S4H1H2L1vetotimes.txt .
4. ln -s src/lalapps/src/string/S4_StringDag.ini .
Some paths need to be edited in the .ini file:
1. vi S4_StringDag.ini
2. Change the three lines near the top of the file containing
$ENV(HOME)/s4_string/local to contain $ENV(HOME)/s4string/local
3. Find the line in the [input] section from
segments = S4H1H2L1segments.txt
to
segments = ../S4H1H2L1segments.txt
because the DAGs will be constructed in sub-directories.
The post-processing steps are not included in the DAG itself, and will be run from the command line.  A script will be used to construct the DAG from the configuration files and execute the post-processing sequence:
1. cd ${HOME}/s4string 2. ln -s src/lalapps/src/string/Makefile.pipeline Makefile ### Pipeline Construction and Submission Construct the injection and non-injection DAGs with: 1. cd${HOME}/s4string
2. make dag
Submit the two DAGs with:
1. cd ${HOME}/s4string 2. unset X509_USER_PROXY 3. grid-proxy-init -valid$((24*3)):00
4. pushd injections && condor_submit_dag S4_StringDag.dag ; popd
5. pushd noninjections && condor_submit_dag S4_StringDag.dag ; popd

### Post Processing

When the two DAGs complete (they should run without error), run the post-processing sequence:

1. cd ${HOME}/s4string 2. make ligolw_tisi 3. make ligolw_cafe 4. make ligolw_add 5. make ligolw_burca 6. make ligolw_binjfind 7. make ligolw_sqlite 8. make final ### Notes on Running the Pipeline #### metaio The tagged version of lal does not compile against versions of metaio >= 8.0 (which is what you will find on a typical cluster). In order for instructions to work you may have to get an old version of metaio. This is what you do 1. Get metaio_7.2_SOURCE.tar.gz from http://www.ldas-sw.ligo.caltech.edu/ligotools/metaio/ 2. gunzip metaio_7.2_SOURCE.tar.gz 3. tar -xv metaio_7.2_SOURCE.tar 4. gunzip metaio-7.2.tar.gz 5. tar -xvf metaio-7.2.tar 6. cd metaio-7.2 7. ./configure --prefix=${HOME}/s4string/local
8. make
9. make install

Then change the configure line for lal: ./configure --prefix=${HOME}/s4string/local --with-gcc-flags --with-cflags="-I${HOME}/s4string/local/include/"

#### Makefile and the 'make final' step

The magnitude of the loudest event in the Makefile is off by 3% (it should be 3.3683419e-20 and instead it is 3.4744909e-20). This is due to the new segmentation that resulted from including the calibration DQ flags. The loudest event is the same but its magnitude changed a little. Also the calibration uncertainties are a little small. The frequency domain uncertainties are 5% and the time domain uncertainties add another 10% in quadrature (this is conservative, see the segment list DQ flags in lalapps cvs) which makes the total calibration uncertainty conservatively 11%.

The calls to lalapps_stringfinal in the Makefile need to be changed as follows:

final :
lalapps_stringfinal --verbose --cal-uncertainty 0.11 noninjections/triggers/cafe*.sqlite >lalapps_stringfinal.log
lalapps_stringfinal --verbose --injections --injections-bin-size 16 --loudest-survivor H1=3.3683419e-20 --cal-uncertainty 0.11
injections/triggers/cafe*.sqlite >lalapps_stringfinal_injections.log

and then "make final" needs to be run again.

#### lalapps_stringfinal version update

The final review produced requests for changes to the figures generated by lalapps_stringfinal.  The plots included in the latest draft of the paper were generated by installing lalsuite carrying git ID 6b40a0ad9b0d9db77897f0bb86fe6332bc40006d.

### Running the cosmology codes to produce the cosmic string parameter space contour plots

If you've run the pipeline already log in to the cluster where you ran it and do the following:
cd $HOME/s4string/ source environment.sh cd$HOME/s4string/src/lalapps/src/string/

The cosmology functions don't build in lalapps by default (this is on purpose). Build them using:
gcc -O6 -W -Wall -static -o cs_gamma -I${HOME}/s4string/local/include/ cs_gamma.c cs_lambda_cosmo.c -lgsl -lgslcblas -lm -std=c99  The last "-std=c99" avoids a problem with gcc version 4.X that prints a warning about fminf and fmaxf. Go back to the directory where the efficiency files have been created by the pipeline: cd$HOME/s4string/

or use the files string_efficiency.dat and string_efficiency_above_loudest.dat available in cvs (lscdocs/bursts/projects/cosmic_strings/s4cosmic_strings/) for the next few steps.

With the efficiency files we are ready to run the cosmology code to get sensitivity rates. Run the following lines
$HOME/s4string/src/lalapps/src/string/cs_gamma --frequency 75 --log-Gmustart -12 --log-Gmuend -4 --nGmu 50 --log-epsilonstart -12 --log-epsilonend 0 --nepsilon 50 --index 1.0 --log-pstart 0 --log-pend 1 --np 1 --efficiency-file string_efficiency.dat mv gamma.dat gamma_S4sens.dat  to get the effective rate for all signals that survive the pipeline. Then run the comology codes for our upper limits $HOME/s4string/src/lalapps/src/string/cs_gamma --frequency 75 --log-Gmustart -12 --log-Gmuend -4 --nGmu 50 --log-epsilonstart -12 --log-epsilonend 0 --nepsilon 50 --index 1.0 --log-pstart 0 --log-pend 1 --np 1 --efficiency-file string_efficiency_above_loudest.dat
mv gamma.dat gamma_S4UL.dat

to get the effective rate for signals found with amplitudes above our loudest event.

Now lets get an Initial LIGO rate estimate. Open matlab and open and run the file
$HOME/s4string/src/lalapps/src/string/eff_file_writer.m  this creates a file called efficiency2.txt which contains a step function at A50%=1e-20, the Initial LIGO sensitivity estimate in the burst paper. We want to run the cosmology code on that file too: $HOME/s4string/src/lalapps/src/string/cs_gamma --frequency 75 --log-Gmustart -12 --log-Gmuend -4 --nGmu 50 --log-epsilonstart -12 --log-epsilonend 0 --nepsilon 50 --index 1.0 --log-pstart 0 --log-pend 1 --np 1 --efficiency-file efficiency2.txt
mv gamma.dat gamma_LIGOsens.dat

If none of this has worked all the files used and produce here (efficiency2.txt, gamma_LIGOsens.dat, gamma_S4sens.dat, gamma_S4UL.dat, string_efficiency_above_loudest.dat, string_efficiency.dat) are also in cvs (lscdocs/bursts/projects/cosmic_strings/s4cosmic_strings/) anyway.

Lets generate the contour plots. In matlab open (but do not run yet!) the file $HOME/s4string/src/lalapps/src/string/contour_plotter.m Then do the following: 1. Fix the hardwired path to the gamma_*.dat files on lines 4, 5 and 6 to wherever your gamma_*.dat files are, and, 2. Fix the livetime on line 12 (again due to the new calibration DQ flags) from T=1363502.0 to T=1284596.0. This is the livetime reported by the pipeline (see lalapps_stringfinal.log) Now run$HOME/s4string/src/lalapps/src/string/contour_plotter.m
You should get the four contour plots in the paper one for each order of magnitude of reconnection probability. These are in cvs: lscdocs/bursts/papers/s4cosmic_string (p1, p_1, p_2, p_3, with .fig, .eps, .pdf, and .jpg file extensions).

#### Notes on final cosmology fixes and run to make figures

The above directions work. But the plots that ended in the paper, as requested by the burst uber review committee were produced a little differently.

Firstly, the cosmology was fixed as requested by the cosmic string review committee. The pipeline used the tagged version stringsearch-s4-2008-04-09-v2 with the following modifications:

1. Version 1.5 of lalapps/src/string/cs_lambda_cosmo.h was used instead of 1.4.
bash-3.00$cvs diff -r 1.4 -r 1.5 cs_lambda_cosmo.h Index: cs_lambda_cosmo.h =================================================================== RCS file: /usr/local/cvs/lscsoft/lalapps/src/string/cs_lambda_cosmo.h,v retrieving revision 1.4 retrieving revision 1.5 diff -r1.4 -r1.5 28c28 < /* "$Id: stringsearch.html,v 1.83 2009/05/29 16:45:01 siemens Exp $" "$Name:  $" */ --- > /* "$Id: stringsearch.html,v 1.83 2009/05/29 16:45:01 siemens Exp $" "$Name:  $" */ 33,35c33,35 < #define LAMBDA_H_0 2.4e-18 /* Hubble constant (s^-1) */ < #define LAMBDA_OMEGA_M 0.25 < #define LAMBDA_OMEGA_R 4.6e-5 --- > #define LAMBDA_H_0 2.27e-18 /* Hubble constant (s^-1) */ > #define LAMBDA_OMEGA_M 0.279 > #define LAMBDA_OMEGA_R 8.5e-5  2. Version 1.12 of lalapps/src/string/cs_gamma.c was used instead of 1.11. The change was to make suere that when we ask the code to span a range of epsilon values it actually gets all the way to the end. This was a cosmetic change. bash-3.00$ cvs diff -r 1.11 -r 1.12 cs_gamma.c
Index: cs_gamma.c
===================================================================
RCS file: /usr/local/cvs/lscsoft/lalapps/src/string/cs_gamma.c,v
retrieving revision 1.11
retrieving revision 1.12
diff -r1.11 -r1.12
61,62c61,62
< NRCSID( CSGAMMAC, "cs_gamma $Id: stringsearch.html,v 1.83 2009/05/29 16:45:01 siemens Exp$");
< RCSID( "cs_gamma $Id: stringsearch.html,v 1.83 2009/05/29 16:45:01 siemens Exp$");
---
> NRCSID( CSGAMMAC, "cs_gamma $Id: stringsearch.html,v 1.83 2009/05/29 16:45:01 siemens Exp$");
> RCSID( "cs_gamma $Id: stringsearch.html,v 1.83 2009/05/29 16:45:01 siemens Exp$");
65c65
< #define CVS_REVISION "$Revision: 1.83$"
---
> #define CVS_REVISION "$Revision: 1.83$"
67c67
< #define CVS_DATE "$Date: 2009/05/29 16:45:01$"
---
> #define CVS_DATE "$Date: 2009/05/29 16:45:01$"
102c102
<           epsilon=pow(10.0,CLA.logepsilonstart+i*(CLA.logepsilonend-CLA.logepsilonstart)/(CLA.nepsilon));
---
>           epsilon=pow(10.0,CLA.logepsilonstart+i*(CLA.logepsilonend-CLA.logepsilonstart)/(CLA.nepsilon-1));



Finally all the efficiency files and products of the comology code that were used in the paper are in lscdocs/bursts/projects/cosmic_strings/s4cosmic_strings. This includes a version of contour_plotter.m that makes plots as requested by the burst uber review committee. Also committed to cvs is the file string_triggers_survivors.xml with the surviving triple coincident triggers.

### ligolw_tisi

• This program produces one or more (one in the case of the string search) XML files containing a list of the time slide vectors to use in the coincidence analysis.  The zero-lag vector is one of the vectors included in the list.
• The main program is ligolw_tisi.
• It's library backend is ligolw_tisi.py.
• An example output file is visible in time_slides.xml.
• A typical command line is
ligolw_tisi --verbose --instrument H1=0:0:0 --instrument H2=0:0:0 --instrument L1=-89.26990816987241548050:89.26990816987241548050:1.78539816339744830961 time_slides.xml.gz

### lalapps_ll2cache & ligolw_cafe

• lalapps_ll2cache extracts instrument and segment information from the search summary tables of trigger files, and produces a LAL cache of the trigger files from that information.
• The main program is lalapps_ll2cache.
• A typical command line is
lalapps_ll2cache --verbose "HL-INJECTIONS*" "triggers/*"
which writes a LAL cache to stdout describing the injection and trigger files.

• ligolw_cafe uses segment information from the trigger files and the list of time slide vectors produced by ligolw_tisi to identify groups of trigger files that must be combined in order to correctly apply the coincidence test.  The point here is that combining all the triggers into a single file would produce a file too large to fit in memory, but we know that triggers from segments that are separated by too great a time interval cannot possibly be coincident so it is smarter to break the triggers into independent groups that (hopefully) are small enough to fit in memory.  ligolw_cafe performs this task.
• The main program is ligolw_cafe.
• Its library backend is ligolw_cafe.py.
• A typical command line is
ligolw_cafe --verbose --time-slides time_slides.xml.gz --base cache/cafe_
into which is piped the output of the lalapps_ll2cache command above.
• The output is a collection of LAL cache files, one each for each group of trigger files that must be combined. An example is cafe_321.cache.

• This program merges the trigger files identified by ligolw_cafe.
• The main program is ligolw_add.
• The library backend is ligolw_add.py.
• A typical command line is
ligolw_add --verbose --input-cache cache/cafe_321.cache --output triggers/cafe_321.xml.gz time_slides.xml.gz
which merges time_slides.xml.gz and all the files listed in cafe_321.cache into a single XML file that gets written to cafe_321.xml.gz.
• An example of the sort of file resulting from this stage of the post processing is cafe_321_post_lladd.xml.

### ligolw_burca

• This program performs a string search coincidence analysis on a merged trigger file.  Most of the information this program requires to perform its task it retrieves automatically from the input file including:  the list of instruments whose triggers are in the file, the list of time slide vectors to consider, and which instruments should be used to form the coincidences.  The command line provides the coincidence thresholds, which in this case are the inter-instrument Delta t's and the H1--H2 amplitude consistency parameters.
• The main program is ligolw_burca.
• The library backend is split between snglcoinc.py which contains a generic trigger coincidence engine, and ligolw_burca.py which contains cosmic string and excess power specific customizations to the underlying coincidence engine.  Naively, for n instruments that produce m triggers each, the coincidence problem is m^n --- compare every n-tuple to see if it is a mutually-coincident combination.  Obviously that implementation scales very poorly, and so more sophisticated algorithms are required in order to reduce the number of comparisons.  The triggers are organized into binary trees for quickly identifying the triggers that fall within a time interval, and the coincidence tests are applied recursively to avoid repeated evaluation of the same comparisons.  Many more optimizations are possible, but the current implementation seems to run quickly enough.
• A typical command line is
ligolw_burca --verbose --coincidence-algorithm stringcusp --program StringSearch --thresholds H1,H2=0.002 --thresholds H1,L1=0.012 --thresholds H2,L1=0.012 --stringcusp-params 3.0,0.5 cafe_321.xml.gz
.
• An example of the output of this program is cafe_321_post_burca.xml.
• This is identical to the "post-lladd" file, but with coincidence information added in additional tables.  The coincidence information indicates which triggers are coincident with which other triggers, and the time shift vector that was applied in order to observe that coincidence.

### ligolw_binjfind

• This program performs injection identification on a trigger file.  In the string pipeline this program is run on the output of ligolw_burca.
• The main program is ligolw_binjfind.
• The library backend is ligolw_binjfind.py, with extra code in SimBurstUtils.py.
• A typical command line is
ligolw_binjfind --verbose --match-algorithm stringcusp cafe_321.xml.gz
note that there are no thresholds or other options beyond setting the type of coincidence test to use.
• An example of the output of this program is cafe_322.xml.
• This is identical to the "post-burca" file, but with additional coincidence information added, now about coincidences between triggers and software injections.

### ligolw_sqlite

• This program converts the XML trigger files to SQLite database files.  This conversion is done to allow information to be retrieved from the files faster, more easily, and more reliably, using SQL queries instead of iterating over the contents of the tables using hand-coding for loops and if statements.
• The main program is ligolw_sqlite.
• The most significant library component is dbtables.py in the Glue package.
• A typical command line is
ligolw_sqlite --verbose --tmp-space /tmp --replace --preserve-ids --database cafe_322.sqlite cafe_322.xml.gz
note that there are no thresholds or other options beyond setting the type of coincidence test to use.
• An example of the output of this program is cafe_322.sqlite.  This file is not human-readable, it's in a binary format.

### lalapps_stringfinal

• This program distills the final upper limit information from the sqlite database files by performing queries to get the amplitude of the loudest triple-coincidence survivor in H1, and then using the amplitude of that trigger as a threshold measuring the detection efficiency for software injections.
• The main program is lalapps_stringfinal, and library code can be found in SnglBurstUtils.py.
• The program is run in two stages.  A typical command line for the first stage, where the triple-coincidence survivors are found, is
lalapps_stringfinal --verbose --cal-uncertainty 0.08 noninjections/triggers/cafe*.sqlite >lalapps_stringfinal.log
and a typical command line for the second stage, where the efficiency is measured, is
lalapps_stringfinal --verbose --injections --injections-bin-size 16 --loudest-survivor H1=3.4744909e-20 --cal-uncertainty 0.08 injections/triggers/cafe*.sqlite >lalapps_stringfinal_injections.log
• Examples of the logs of this program's output are lalapps_stringfinal.log, and lalapps_stringfinal_injections.log.
• Examples of its summary plots are string_triggers_0.png and string_injections_0.png.
• And an example of the loudest-survivors output file is string_triggers_survivors.xml.
• A note describing the algorithm by which the efficiency curve is consructed can be found in an appendix of the review document.
• A dump of the raw (pre-smoothed, pre-anything'ed) bin counts used for the efficiency plot are available in efficiency_dump.txt.

• $Id: stringsearch.html,v 1.83 2009/05/29 16:45:01 siemens Exp$