next up previous contents
Next: Function: read_block() Up: GRASP Routines: Reading/using Caltech Previous: GRASP Routines: Reading/using Caltech   Contents

The data format

0 Data is written onto the exabyte tapes in blocks about 1/2 megabyte in size. The format of the data on the tapes is as shown in Table [*].

Table: Format of Exabyte data tapes (first row: content, second row: length in bytes).
mh 0's 0's mh 0's 0's mh gh 0's data mh gh 0's data $\cdots$
1024 1024 1024 1024 1024 $1024 \times n$ 1024 $1024 \times n$ $\cdots$


The tape begins with a main header (denoted ``mh" in the table). This is followed by a set of zeros, padding the length of the header block to 1024 bytes. There is then an empty block of 1024 bytes containing zeros. This pattern is repeated until the first block containing actual data. This is signaled by the appearance of a main header, followed by a gravity header (denoted ``gh" in the figure above). These two headers are padded with zeros to a length of 1024 bytes. This is then followed by a set of data (the length of this set is a multiple of 1024 bytes). Information about the length of the data sets is contained in the headers. The data sets themselves consist of data from a total of 16 channels, each of which comes from a 12-bit A to D converter. Four of the 16 channels are fast (sample rates a bit slower than 10kHz) and the remaining 12 channels are slow (sample rates a bit slower than 1kHz). The ratio of sample rates is exactly $10:1$. Within the blocks labeled ``data", these samples are interleaved. The information content of the different channels is detailed on page 136 of Lyon's thesis [32], and is summarized in Table [*].

The program extract reads data off the tapes and writes them into files. One file is produced for each channel; typically these files are named channel.0 $\rightarrow$ channel.15. The complete set of these files for the November 1994 run fits onto two Exabyte tapes (in the 8500c compressed format). The information in these files begins only at the moment when the useful data (starting with the gravity header blocks) begins to arrive. The format of the data in these channel.* files is shown in Table [*].

Table: Format of a channel.0$\rightarrow$15 file (first row: block number, second row: content, third row: length in bytes).
block 0 block 1 block 2 block 3 $\cdots$
mh bh 0's data mh bh 0's data mh bh 0's data mh bh 0's data $\cdots$
1024 cs 1024 cs 1024 cs 1024 cs $\cdots$


Here the main headers are the same as before, however the headers that follow them are called binary headers (denoted by ``bh" in the table). The length of the data stream (in bytes) is called the ``chunksize" and is denoted by ``cs" in Table [*]. We frequently reference the data in these files by ``block number" and ``offset". The block number is an integer $\ge 0$ and is shown in Table [*]. The offset is an integer which, within a given block, defines the offset of a data element from the first data element in the block. In a block containing 5000 samples, these offsets would be numbered from 0 to 4999.

The structure of the binary headers is
struct ld_binheader {

float elapsed_time: This is the total elapsed time in seconds, typically starting from the first valid block of data, from the beginning of the run.
float datarate: This is the sample rate of the channel, in Hz.
};

The structure of the main headers is
struct ld_mainheader {

int chunksize: The size of the data segment that follows, in bytes.
int filetype: Undocumented; often 1 or 2.
int epoch_time_sec: The number of seconds after January 1, 1970, Coordinated Universal Time (UTC) for the first sample. This is the quantity returned by the function time() in the standard C library.
int epoch_time_msec: The number of millseconds which should be added to the previous quantity.
int tod_second: Seconds after minute, 0-61 for leap second, local California time.
int tod_minute: Minutes after hour 0-59, local California time.
int tod_hour: Hour since midnight 0-23, local California time.
int date_day: Day of the month, 1-31, local California time.
int date_month: Month of the year, 0-11 is January-December, local California time.
int date_year: Years since 1900, local California time.
int date_dow: Days since Sunday, 0-6, local California time.
int sub_hdr_flag: Undocumented.
}; Note: in the original headers, these int were declared as long. They are in fact 4-byte objects, and on some modern machines, if they are declared as long they will be incorrectly interpreted as 8-byte objects. For this reason, we have changed the header definitions to what is show above. Also please note that the time values ${\tt tod\_minute}
\cdots {\tt date\_year}$ are the local California time, not UTC.

For several years, the extract program contained several bugs. One of these caused the channel.* to have no valid header information apart from the elapsed time and datarate entries in the binary header, and the chunksize entry in the main header. All the remaining entries in the main header were either incorrect or nonsensical. This bug was corrected by Allen on 14 November 1996; data files produced from the tapes after that time should have valid header information.

There was also a more serious bug in the original versions of extract. The typical chunksize of most slow channels is 10,000 bytes (5,000 samples) and the chunksize of most fast channels is 100,000 bytes (50,000 samples) but until it was corrected by Allen on 14 November 1996, the extract program would in apparently unpredictable (though actually quite deterministic) fashion ``skip" the last data point from the slow channels or the last ten data points from the fast channels, giving rise to sequences of 4,999 samples from the slow channels, and correspondingly 49,990 samples from the fast channels. Not surprisingly, these missing data points gave rise to strange ``gremlins" in the early data analysis work; these are described in Lyon's thesis [32] on pages 150-151. These missing points were simply cut out of the data stream as shown in Figure [*]; rather like cutting out 1 millisecond of a symphony orchestra every 5.1 seconds; this gives rise to ``clicks" which excited the optimal filters. This problem is shown below; data taken off the tapes after 14 November 1996 should be free of these problems.

There are a couple of caveats regarding use of these ``raw data" files. First, in the channel.* files, there can be, with no warning, large segments of missing data. In other words, a block of data with time stamp 13,000 sec, lasting 5 sec, can be followed by another data block with a time stamp of 14,000 sec (i.e., 995 sec of missing data). Also, the time stamps are stored in single precision floats, so that after about 10,000 sec they no longer have a resolution better than a single sample interval. When we read the data, we typically use the time-stamp on the first data segment to establish the time at which the first sample was taken. Starting from that time, we then determine the time of a data segment by using elapsed_time, since the millisecond time resolution of epoch_time_msec is not good enough. (See the comments in Section [*]).

For our purposes, the most useful channels are channel.0 and channel.10. Channel 0 contains the actual voltage output of the IFO. This is typically in the range of $\pm 100$. Later, we will discuss how to calibrate this signal. Channel 10 contains a TTL locked level signal, indicating if the interferometer was in lock. This is typically in the range from 1 to 10 when locked, and exceeds several hundred when the interferometer is out of lock. Note: after coming into lock you will notice that the IFO output is often zero (with a bit of DC offset) for periods ranging from a few seconds to a minute. This is because the instrument output amplifiers are typically overloaded (saturated) when the instrument is out-of-lock. Because they are AC coupled, this leads to zero output. After the instrument comes into lock, the charge on these amplifiers gradually bleeds off (or one of the operators remembers to hit the reset button) and then the output ``comes alive". So don't be puzzled if the instrument drops into lock and the output is zero for 40 seconds afterwards!

Figure: This shows the appearance of channel.0 before and after the extract program was repaired (on 14 November 1996) to correctly extract data from the Exabyte data tapes. The old version of extract dropped the ten data points directly above the words ``missing data"; in effect these were interpolated by the diagonal line (but with ten times the slope shown since everything in between was missing).

The contents of the channel.* files was not the same for all of the runs. Lyon's thesis [32] gives a chart on page 136 with some ``typical" channel assignments. The channel assignments during these November 1994 data runs are listed in a log book; they were initially chosen on November 14, then changed on November 15th and again on November 18th; these assignments are shown in Table [*]. (Note that the chart on page 136 of Lyon's thesis describes the channel assignments on 15 November 94, a day when no data was taken.)


Table: Channel assignments for the November 1994 data runs. Channels 0-3 are the ``fast" channels, sampled at about 10 kHz; the remaining twelve are the ``slow" channels, sampled at about 1KHz.
Channel Number Description $\le$ 14 November 94 Description $\ge$ 18 November 94
0 IFO output IFO output
1 unused magnetometer
2 unused microphone
3 microphone unused
4 dc strain dc strain
5 mode cleaner pzt mode cleaner pzt
6 seismometer seismometer
7 unused slow pzt
8 unused power stabilizer
9 unused unused
10 TTL locked TTL locked
11 arm 1 visibility arm 1 visibility
12 arm 2 visibility arm 2 visibility
13 mode cleaner visibility mode cleaner visibility
14 slow pzt unused
15 arm 1 coil driver arm 1 coil driver



next up previous contents
Next: Function: read_block() Up: GRASP Routines: Reading/using Caltech Previous: GRASP Routines: Reading/using Caltech   Contents
Bruce Allen 2000-11-19