A Proposal To Add Coincidence Tables ===================================== Kipp Cannon April 16, 2007 This is a proposal to add four new table definitions for the purpose of "coincidence" record keeping in trigger-based gravitational wave searches. These tables implement a means by which two or more events, not necessarily of the same type, can be marked as coinciding with one another. For example, one might wish to record the fact that a burst event corresponds to a software injection, or that two burst events from different intruments match one another, or that a burst event is coincident with the end of an inspiral event, and so on. Not included in this proposal is the means to specify the way in which events are found to match on another, for example the specifics of the tests that were used to decide on the matches. It is assumed that individual applications can encode any such required information in existing tables, such as the process_params, and search_summary tables, or otherwise solve that problem independently of this proposal. Overview ======== The idea is that there are a number of tables which can be thought of as contributing events to a coincidence. Examples are the sngl_burst, the sngl_inspiral, the sim_burst table, and so on. The events within such tables are (nominally) assigned unique IDs. Recording a coincidence is then a matter of recording the fact that two or more event IDs "go together", perhaps with some additional information as well such as any time offsets that were applied before the coincidence test was performed, and so on. At the centre of this proposal are the coinc_event and coinc_event_map tables. Each coincidence, or collection of events that are identified as coinciding with one another, receives exactly 1 entry in the coinc_event table. Each entry in this table receives a unique ID, stored in the coinc_event_id column. This table mostly plays the role of a nexus, providing links to other places where the real information about the coincidence is recorded. The coinc_event_map table is used to specify which events participate in each coincidence. Principally, two columns are used for this task: the event_id column, and the coinc_event_id column. The table plays the role of the edges in a graph, linking each entry in the coinc_event table to the constituent events in their respective tables. Two other tables are introduced to provide additional information about coincidences. This book-keeping mechanism is principally targetted at trigger-based searches in which each event has associated with it a "time". A standard technique used in such searches to estimate background coincidence rates is to measure the rate at which coincidences are observed when the events from one instrument are shifted in time relative to those in another. To facilitate time offset record keeping, the time_slide table is introduced. This table uses three columns to encode time offset information: time_slide_id, instrument, and offset. Each row provides a single instrument-to-offset mapping, and several rows can share a single ID to indicate that they are taken together to define a multi-instrument time shift vector. The coinc_event table has a time_slide_id column used to indicate which set of instrument/offset pairs applies to each coincidence. Finally, in practice it is found that it is necessary to quickly identify the coincidences that are of the same "type", that is burst<-->burst coincidences, or burst<-->injection coincidences, and so on. This can be done by checking the target events associated with each coincidence, but that procedure is slow in large complex documents. Therefore, the coinc_definer table is introduced, and used to record this information so that it can be queried more quickly. The table provides a number columns, the three principal columns being the coinc_def_id, search and search_coinc_type columns. The search column is a string stating the name of the search that produced this kind of coincidence, and the search_coinc_type is an integer used to indicate which of the coincidence types that search can produce this one is. The pair (search, search_coinc_type) is intended to be a globally-unique identifier. It should not be a problem making these pairs be unique: each search chooses a unique name like "inspiral bbh" or "excess power", and then assigns integers arbitrarily to its own various types of coincidences. It is left to the people writing code for each search to decide how their tools will know which search_coinc_type is which, but for example a look-up table in a C library or Python module would be a typical solution. The coinc_def_id is then meant to be an identifier for the (search, search_coinc_type) pair that is unique within the document. The coinc_event table contains a coinc_def_id column used to link rows to the coinc_definer table. Selecting rows in the coinc_event table by coinc_def_id is more efficient in terms of document size and query speed than merging the two tables into one. Example ======= The following is an example document. This is a portion of the final output of a triple-coincidence software injection burst analysis performed with excess power. For clarity, the search_summary and process_params tables have been removed, and most of the events from the sngl_burst table have been removed. For those familiar with such files, the process table can give an impression of the sequence of jobs executed to assemble this output. "lalapps_power","1.241","/usr/local/cvs/lscsoft/lalapps/src/power/power.c\,v",859279297,"INJECTIONS_PLAYGROUND",0,"node239","kipp",1,860405179,860405267,0,"lalapps","L1","process:process_id:0", "ligolw_bucluster","1.20","lscsoft",860279858,"INJECTIONS_PLAYGROUND",0,"node232.ldas-cit.ligo.caltech.edu","kipp",8791,860413658,860413684,0,"","","process:process_id:1", "lalapps_power","1.241","/usr/local/cvs/lscsoft/lalapps/src/power/power.c\,v",859279297,"INJECTIONS_PLAYGROUND",0,"node223","kipp",1,860397151,860397338,0,"lalapps","H1","process:process_id:2", "ligolw_bucluster","1.20","lscsoft",860279858,"INJECTIONS_PLAYGROUND",0,"node232.ldas-cit.ligo.caltech.edu","kipp",8791,860413685,860413696,0,"","","process:process_id:3", "lalapps_power","1.241","/usr/local/cvs/lscsoft/lalapps/src/power/power.c\,v",859279297,"INJECTIONS_PLAYGROUND",0,"node295","kipp",1,860391176,860391346,0,"lalapps","H2","process:process_id:4", "ligolw_bucluster","1.20","lscsoft",860279858,"INJECTIONS_PLAYGROUND",0,"node232.ldas-cit.ligo.caltech.edu","kipp",8791,860413698,860413714,0,"","","process:process_id:5", "ligolw_bucluster","1.20","lscsoft",860279858,"INJECTIONS_PLAYGROUND_POSTLLADD",0,"node211.ldas-cit.ligo.caltech.edu","kipp",15728,860416644,860416644,0,"","","process:process_id:6", "ligolw_tisi","1.14","lscsoft",860219662,"",0,"ldas-pcdev1.ligo.caltech.edu","kipp",21339,860357022,860357022,0,"","","process:process_id:7", "lalapps_binj","1.48","/usr/local/cvs/lscsoft/lalapps/src/power/binj.c\,v",856389850,"INJECTIONS_PLAYGROUND",0,"node262","kipp",1,860364285,860364285,0,"lalapps","H1,H2,L1","process:process_id:8", "ligolw_burca","1.19","lscsoft",858982144,"INJECTIONS_PLAYGROUND",0,"node75.ldas-cit.ligo.caltech.edu","kipp",22155,860423411,860423411,0,"","","process:process_id:9", "ligolw_bucut","1.42","lscsoft",860292188,"INJECTIONS_PLAYGROUND",0,"node283.ldas-cit.ligo.caltech.edu","kipp",9997,860428242,860428251,0,"","","process:process_id:83", "ligolw_binjfind","1.11","lscsoft",860355419,"INJECTIONS_PLAYGROUND",0,"node7.ldas-cit.ligo.caltech.edu","kipp",29835,860476321,860476321,0,"","","process:process_id:84"
"process:process_id:2","H1","excesspower","H1:LSC-STRAIN plus response to Burst amplitudes plus response t",793668302,125000000,793668302,625879271,0.875,262,384,1.7529863e-19,9828.0385,43.4927,"sngl_burst:event_id:0",280.24313,793668302,625000000,0.25,198,256,3.48038e-21,200.818,43.4927, "process:process_id:2","H1","excesspower","H1:LSC-STRAIN plus response to Burst amplitudes plus response t",793668302,125000000,793668302,545117868,0.875,1158,384,5.283999e-20,2129.8455,27.362,"sngl_burst:event_id:1",1218.4229,793668302,250000000,0.125,1254,64,2.19888e-21,75.5255,27.362, "process:process_id:2","H1","excesspower","H1:LSC-STRAIN plus response to Burst amplitudes plus response t",793668302,500000000,793668302,509658105,0.015625,1606,128,8.13959e-21,172.5127,21.7208,"sngl_burst:event_id:2",1606,793668302,500000000,0.015625,1542,128,2.11335e-21,45.9561,21.7208, "process:process_id:2","H1","excesspower","H1:LSC-STRAIN plus response to Burst amplitudes plus response t",793668302,687500000,793668302,750000000,0.125,1702,8,4.04235e-21,81.3662,21.3703,"sngl_burst:event_id:3",1702,793668302,687500000,0.125,1698,8,2.02219e-21,40.7405,21.3703, "process:process_id:2","H1","excesspower","H1:LSC-STRAIN plus response to Burst amplitudes plus response t",793668302,500000000,793668302,625000000,0.25,1990,256,4.37335e-21,122.65,21.3037,"sngl_burst:event_id:4",1990,793668302,500000000,0.25,1862,256,4.37335e-21,122.65,21.3037, ... "process:process_id:0","L1","excesspower","L1:LSC-STRAIN plus response to Burst amplitudes",793668361,0,793668362,548506854,2.5,1094,2048,7.841086e-18,122578.38,114.528,"sngl_burst:event_id:344",1215.3045,793668362,500000000,0.25,966,256,8.13479e-21,402.815,114.528, "process:process_id:0","L1","excesspower","L1:LSC-STRAIN plus response to Burst amplitudes",793668364,125000000,793668364,250000000,0.25,346,4,4.74317e-21,34.1618,18.0809,"sngl_burst:event_id:345",346,793668364,125000000,0.25,344,4,4.74317e-21,34.1618,18.0809, "process:process_id:0","L1","excesspower","L1:LSC-STRAIN plus response to Burst amplitudes",793668364,375000000,793668364,422040324,0.125,454,32,1.481951e-21,81.7617,19.493,"sngl_burst:event_id:346",449.95768,793668364,375000000,0.125,438,16,7.40419e-22,41.3134,19.493, "process:process_id:0","L1","excesspower","L1:LSC-STRAIN plus response to Burst amplitudes",793668364,250000000,793668364,429893138,0.25,1286,384,1.2840092e-19,2023.9009,36.9143,"sngl_burst:event_id:347",1226.1126,793668364,406250000,0.0625,1094,256,5.85415e-21,115.496,36.9143, "process:process_id:0","L1","excesspower","L1:LSC-STRAIN plus response to Burst amplitudes",793668364,453125000,793668364,460937500,0.015625,1222,256,3.6747e-21,46.1485,18.8575,"sngl_burst:event_id:348",1222,793668364,453125000,0.015625,1094,256,3.6747e-21,46.1485,18.8575
"H2","time_slide:time_slide_id:0","process:process_id:7",0, "H1","time_slide:time_slide_id:0","process:process_id:7",0, "L1","time_slide:time_slide_id:0","process:process_id:7",0
"process:process_id:8","SineGaussian",793668308,698442656,793668308,699264985,793668308,690434084,45394.00711997141,0.108271,0.108271,1.392422,-0.6052713,"EQUATORIAL",4.48004,3.155143e-22,2.422579e-21,2953.414,73.9239,0.02706774,6,"sim_burst:simulation_id:14408", "process:process_id:8","SineGaussian",793668344,349149908,793668344,332539600,793668344,336454097,45394.01705006073,0.05885703,0.05885703,0.05876163,0.2667309,"EQUATORIAL",5.468415,7.390011e-20,7.695927e-19,1576.869,135.9874,0.01471426,8,"sim_burst:simulation_id:14409"
"coinc_definer:coinc_def_id:0","excesspower",0,"sngl_burst<-->sngl_burst coincidences", "coinc_definer:coinc_def_id:1","excesspower",1,"sim_burst<-->sngl_burst coincidences", "coinc_definer:coinc_def_id:2","excesspower",2,"sim_burst<-->coinc_event coincidences (exact)"
"coinc_event:coinc_event_id:0",3,"process:process_id:9","coinc_definer:coinc_def_id:0","time_slide:time_slide_id:0",1, "coinc_event:coinc_event_id:1",3,"process:process_id:9","coinc_definer:coinc_def_id:0","time_slide:time_slide_id:0",1, "coinc_event:coinc_event_id:2",3,"process:process_id:84","coinc_definer:coinc_def_id:1","time_slide:time_slide_id:0",1, "coinc_event:coinc_event_id:3",1,"process:process_id:84","coinc_definer:coinc_def_id:2","time_slide:time_slide_id:0",1
"sngl_burst:event_id:203","sngl_burst","coinc_event:coinc_event_id:0", "sngl_burst:event_id:49","sngl_burst","coinc_event:coinc_event_id:0", "sngl_burst:event_id:309","sngl_burst","coinc_event:coinc_event_id:0", "sngl_burst:event_id:226","sngl_burst","coinc_event:coinc_event_id:1", "sngl_burst:event_id:59","sngl_burst","coinc_event:coinc_event_id:1", "sngl_burst:event_id:325","sngl_burst","coinc_event:coinc_event_id:1", "sim_burst:simulation_id:14409","sim_burst","coinc_event:coinc_event_id:2", "sngl_burst:event_id:59","sngl_burst","coinc_event:coinc_event_id:2", "sngl_burst:event_id:226","sngl_burst","coinc_event:coinc_event_id:2", "sngl_burst:event_id:325","sngl_burst","coinc_event:coinc_event_id:2", "sim_burst:simulation_id:14409","sim_burst","coinc_event:coinc_event_id:3", "coinc_event:coinc_event_id:1","coinc_event","coinc_event:coinc_event_id:3"
Let's examine this example (please excuse the column orders, they are machine-generated, and not in the most logical order for humans). 1. The document contains a sngl_burst table providing a list of burst events. There are events from several instruments listed in the one table. Eech event is assigned a unique ID of the form "table:column:integer" 2. There is a time_slide table describing a single, three-instrument, time slide with all offsets at 0 s. It has ID "time_slide:time_slide_id:0", following the standard ID format. 3. There is a sim_burst table describing a number of software injections. Each injection is assigned a unique ID, in the same manner as the burst events. 4. There is a coinc_definer table, and it defines three types of coincidences. The human-readable description strings should make it mostly clear what each type of coincidence represents. The last one, "coinc_definer:coinc_def_id:2", represents coincidences between sim_burst injections and rows in the coinc_event table. The coinc_event table, itself, can be used as a source of events in a coincidence, and here we find this being used to identify special cases of recovered software injections: injections that are found to match not just events that survive a coincidence test, but an injection that matches all of the burst events participating in the same burst<-->burst coincidence. 5. Finally there is the coinc_event and coinc_event_map tables. We see four coincidences. Numbers 0 and 1 are of type 0, so these are burst<-->burst coincidences. Examing the coinc_event_map table, we see that coinc_event number 0 involves burst events 203, 49, and 309, while coinc_event number 1 involves burst events 226, 59, and 325. There is then a single coincidence of type 1 --- an injection<-->burst coincidence. From the coinc_event_map table we see that coinc_event number 2 involves sim_burst injection number 14409, and sngl_bursts 59, 226, and 325. Notice that those three burst events constitute one of the triple-coincidences. So the final coinc_event, number 3, is of type 2 --- an injection<-->coinc coincidence. The coinc_event_map table tells us that coinc_event number 3 is a match between coinc_event number 1 (a burst<-->burst triple coincidence) and sim_burst injection number 14409. Details ======= The four tables, the columns, their types, and the column meanings are: time_slide Columns: process_id (ilwd:char) The ILWD ID from the process table of the program that generated this entry. time_slide_id (ilwd:char) The ILWD ID of the collection of offsets to which this entry belongs. instrument (lstring) The name of an instrument. offset (real_8) The offset, in seconds, to be added to the time of each event from this instrument prior to testing for coincidence. Note the sign. Some searches, notably the inspiral, treat offsets in special ways for example assuming that events are wrapped cyclically within a single segment. How an offset is to be applied is, therefore, application-dependent, but the interpretation of the sign of the offset is not. Notes: The pair (time_slide_id, instrument) must be unique within the table. coinc_definer Columns: coinc_def_id (ilwd:char) The ILWD ID of the collection of table names of which this entry is a part. search (lstring) The name of the search that produced this coincidence. Many of the event tables, like the sngl_burst and sngl_inspiral, also have search columns. It is recommended that the strings used by each search for those columns match the strings used here. search_coinc_type (int_4u) A search-specific integer used to indicate which of the types of coincidence that search can generate this coincidence type is. description (lstring) A human-readable description string. Notes: The coinc_def_id must be unique within the table. To simplify software design, the pair (search, search_coinc_type) should also be unique. The description string is intended to be used for the benefit of humans, and not to store machine-coded data. coinc_event Columns: process_id (ilwd:char) The ILWD ID from the process table of the program that generated this entry. coinc_def_id (ilwd:char) The ILWD ID from the coinc_definer table indicating which type of coincidence this is. time_slide_id (ilwd:char) The ILWD ID from the time_slide table of the list of offsets that were applied to the events prior to identifying this coincidence. nevents (int_4u) The number of events participating in the coincidence. By convention, an injection<-->anything coincidence counts only the non-injection events in this number (an injection found to match 3 burst events, records nevents = 3, not nevents = 4). If a coinc_event is a participant, it counts as 1 event. likelihood (real_8) Application-defined number indicating the statistical significance of this coincidence. By convention, set to "nan" (without quotes) if not used. Most string to float conversion routines turn this string into a float NaN (not a number), and so this should be compatible with almost any parser. coinc_event_id (ilwd:char) The ILWD ID identifying this coincidence. Notes: The coinc_event_id must be unique within the table. coinc_event_map Columns: coinc_event_id (ilwd:char) The ILWD ID from the coinc_event table to which this entry belongs table_name (char_v) The name of the table from which this event was taken event_id (ilwd:char) The ILWD ID from the table named in the table_name column of this entry. Points of Discussion ==================== "Extra" Information ------------------- It is practically guaranteed that a desire will arise to record extra information about a coincidence. For example, it is conceivable that a particular search might want to store some sort of precomputed parameter, like an "effective SNR", with each coincidence. The only sensible means to do this is for searches to introduce their own tables where such information can be encoded, and link them to the coinc_event table via the coinc_event_id. There is, for example, already a multi_burst table. From time to time, a parameter so generic that it is almost certain to have application in every search, will be identified. These can be added, like the "likelihood" column, to the coinc_event table itself. Note that this coincidence mechanism already requires a significant amount of inter-table cross referencing in order to perform follow-up type analyses. The added complexity of cross-referencing to yet one more table is insignificant. LAL Compatibility ----------------- A number of LAL-based codes, specifically those in use by the inspiral and ringdown pipelines, are incompatible with this mechanism. Those codes have implemented independent solutions to the problem of coincidence book-keeping that are somewhat orthogonal to this implementation. The information itself encoded by those pipelines' coincidence book-keeping mechanisms is compatible with the mechanism in this proposal, meaning those pipelines could be transitioned to this format without the loss of any existing capabilities. However, the incompatibilities in encoding are significant enough that a smooth transition is impossible. The most significant incompatibility is that the IDs assigned to each event are not of type ilwd:char, and so even if special sngl_inspiral or sngl_ringdown tables were prepared in which the required 1-to-1 mapping of ID to event was enforced, the IDs themselves could not be stored in the coinc_event_map table. Changing the type of the event_id column in the sngl_inspiral and sngl_ringdown tables from int_8s to ilwd:char would break all existing LAL-based inspiral and ring-down programs (and all programs in Glue and pyLAL). Implementation ============== A fully-functional implementation of this book-keeping infrastructure has been written, and has been available for evaluation in the Python glue.ligolw library for a little over 1 year. The ligolw_sqlite command-line application provided as part of the Glue package can transform documents containing these tables into SQLite databases, and back again, allowing for fast and convenient SQL-based queries of their contents. This solves the problem of implementing efficient inter-table cross-reference code. ligolw_sqlite includes a list of the "standard" indexes associated with the tables above, which significantly speeds queries. The S4 string cusp search has been carried through to completion using this infrastructure, and the excess power burst search pipeline also relies heavily on this infrastructure. This is therefore a mature proposal, that is not at the prototype stage. Also available (in pylal) is 50% of a bi-directional converter to transform existing inspiral pipeline coincidence data into this format. The 50% that is missing is the inverse transform, from this format back the existing inspiral format. Neither version of an equivalent sngl_ringdown translator exists.