Checking the status of the database and replication
Publication and Server Status
An overview of the publishing status is given by the LIGO Publishing Script Status page. This available from LHO or LLO:
The pages mirror each other, so the information should be the same at either site. The status lights relevant to the segment database are
- S5 realtime /frames Published segment information from the raw
data written by the frame builder to the /frames file system. If this is in
the failed state, contact Ben.
- What others are relevant to segment publishing... Ben...?
- LDBD Server Lightweight database dumper deamon status. This is the
process that listens for connections from the DMT or from authorized users to
insert data quality segments. If bad, the process should be restarted. To do
this:
- Log into the gateway machine as the user ldbd
- Check the file /export/ldbd/var/log/ldbdserver.log for error messages. If there are any messages, send these to the daswg-online mailing list
- Restart the server with the command
ldbdd -d -c /export/ldbd/etc/ldbdserver.ini
- LSCSegFind Server LSC segment server status. This is the process
that listens for user connections from users requesting segment information.
If bad, the process should be restarted. To do this:
- Log into the gateway machine as the user ldbd
- Check the file /export/ldbd/var/log/lscsegfindserver.log for error messages. If there are any messages, send these to the daswg-online mailing list
- Restart the server with the command
ldbdd -d -c /export/ldbd/etc/lscsegfindserver.ini
- Trigger Server A secondary lightweight database dumper deamon that
connects to the trigger database, allowing search codes to dump trigger
information from online searches into a database for later querying. To do this:
- Log into the gateway machine as the user ldbd
- Check the file /export/ldbd/var/log/trgserver.log for error messages. If there are any messages, send these to the daswg-online mailing list
- Restart the server with the command
ldbdd -d -c /export/ldbd/etc/trgserver.ini
DMT DQ segment publication status
John, please write something that explains how to check that the DMT is successfully publishing DQ segments and what to do if there is a failure.
Replication Status
Basic publication and replication checks
The easiest way to check the status of replication is to query the database at the site and see if real time frames are being replicated. To do this
- Log in to gateway as the user ldbd.
- Connect to the segment database.
- Request the last ten segments publsihed and their creator_db number.
The creator_db number is 1 for LHO, 2 for LLO and 3 for CIT. For each segment number that appears check the time of the segment against the current time. If the last segments are less than around 10 minutes old, then replication is working.
We illustrate the procedure with an example at LHO:
[ldbd@ldas.ldas-wa ~]$ db2 connect to seg_lho
Database Connection Information
Database server = DB2/SUN 8.2.3
SQL authorization ID = LDBD
Local database alias = SEG_LHO
[ldbd@ldas.ldas-wa ~]$ db2 "select creator_db,end_time from segment order by end_time desc fetch first 10 rows only for read only"
CREATOR_DB END_TIME
----------- -----------
1 825962752
1 825962752
1 825962720
1 825962720
1 825962688
1 825962688
2 825962656
1 825962656
1 825962656
2 825962644
10 record(s) selected.
[ldbd@ldas.ldas-wa ~]$ date
Thu Mar 9 10:06:38 PST 2006
[ldbd@ldas.ldas-wa ~]$ tconvert -l 825962752
Mar 09 2006 10:05:38 PST
[ldbd@ldas.ldas-wa ~]$ tconvert -l 825962656
Mar 09 2006 10:04:02 PST
In this case, the lastest segment from LHO is 1 minute old, indicating that local publishing is working, and the latest segment from LLO is 2 mintes old, indicating that publishing is working at LLO and replication from LLO to LHO is active.
Where to look next if segments are not appearing
If local publishing is functioning at both sites, but segments are not being replicated between sites, then check the pages created by the IBM Data Propagator Q-replication analyzer. This is run at midnight on each site and dumps the status of the replication engine at each sile onto a web page:
Check these pages for error or warning messages and follow the instructions in the toubleshooting pages to diagnose and correct errors.