These instructions are intended to help guide an LDR
administrator when a disk or set of disks storing data
replicated by LDR has failed and the data is no longer
available and must be replicated again.
These instructions assume that the LDR administrator is proficient in
the use of command shell tools like awk, sed, and xargs.
Since LDR depends on the mappings from files to URLs that are
held in the RLS catalog to determine which files a site does
or does not have, when files go missing on disk the
corresponding mappings must be removed from the RLS catalog.
Note that it is common for a file to be mapped to more than
one URL and that all mappings for a file must be
removed from RLS before LDR will recognize that a site no
longer has a file and it must be replicated again.
To determine which mappings must be removed and then to remove
them follow these steps (note that it is not necessary to shut
down LDR):
- Determine mappings to failed disk:
Use the
globus-rls-cli command to search and find
all the mappings corresponding to the failed disk(s). The form
of the command is
globus-rls-cli query wildcard lrc pfn <pattern> rls://localhost
Here <pattern> is the pattern to match against
the URL or physical file name (PFN). Patterns use the standard
Unix wildcard characters: an asterisk (*) matches 0 or more
characters, and a question mark (?) matches any single
character. You probably want to use double quotes ("") around
your pattern to protect it from the bash/csh shell.
We recommend piping the output to a file since you probably
have a lot of URLs corresponding to any single disk. Note that
if your RLS contains a "large" number of mappings it may take
a while for this command to return since the underlying
relational database must do a complete table scan through all
the URLs listed.
Here is
an example command that one might use to find all files that
have the text "nfsdata13" in their saved URLs/PFNs/paths:
globus-rls-cli query wildcard lrc pfn "*nfsdata13*" rls://localhost > disk13mappings
Here is the first 10 lines of output from that above command:
[datarobot@nemo-dataserver var]$ head disk13mappings
GHLTV-GA2_S5_A-815273413-600.gwf: file://localhost/nfsdata/nfsdata13/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815273413-600.gwf
GHLTV-GA2_S5_A-815273413-600.gwf: file://nfsdata13.nemo.phys.uwm.edu/export1/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815273413-600.gwf
GHLTV-GA2_S5_A-815273413-600.gwf: gsiftp://nemo-dataserver.phys.uwm.edu:15000/data/nemo/storage/data/nfsdata13/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815273413-600.gwf
GHLTV-GA2_S5_A-815274013-600.gwf: file://localhost/nfsdata/nfsdata13/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815274013-600.gwf
GHLTV-GA2_S5_A-815274013-600.gwf: file://nfsdata13.nemo.phys.uwm.edu/export1/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815274013-600.gwf
GHLTV-GA2_S5_A-815274013-600.gwf: gsiftp://nemo-dataserver.phys.uwm.edu:15000/data/nemo/storage/data/nfsdata13/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815274013-600.gwf
GHLTV-GA2_S5_A-815274613-600.gwf: file://localhost/nfsdata/nfsdata13/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815274613-600.gwf
GHLTV-GA2_S5_A-815274613-600.gwf: file://nfsdata13.nemo.phys.uwm.edu/export1/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815274613-600.gwf
GHLTV-GA2_S5_A-815274613-600.gwf: gsiftp://nemo-dataserver.phys.uwm.edu:15000/data/nemo/storage/data/nfsdata13/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815274613-600.gwf
GHLTV-GA2_S5_A-815275213-600.gwf: file://localhost/nfsdata/nfsdata13/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815275213-600.gwf
Repeat the query as necessary for each disk or partition that
has failed or for which you suspect data files have gone
missing.
- Verify mappings are bad and files are missing:
If you are confident you lost an entire disk or partition then
you should skip this step and go onto the next step.
If an entire disk or partition did not fail and you are not
sure if the data really is missing, you need to check and see
if the mappings you just found in RLS really are no longer
valid.
The easiest thing to do is to use the "ls" shell command for
each file and record a list of the failures. You will need to
filter the list of URLs appropriately so that you only test
those which make sense for the command "ls". For example:
grep file://localhost disk13mappings | sed -e 's/file:\/\/localhost//' | awk '{print $2}' | xargs -i ls {} 1> goodFiles 2> badFiles
Since by default 'ls' will print files it cannot list to
stderr then the file badFiles should contain a list
of files that really have gone missing and whose mappings must
be removed from RLS.
- Prepare list of mappings to delete from RLS:
Unfortunately the output from
globus-rls-cli command
you ran above cannot be used directly as input to RLS to
remove the mappings. Likewise, if you had to filter the output
to verify precisely which files are missing, then you need to
create an input file of the correct form containing the
mappings to remove.
The form of input file is simple. It must contain the file
(LFN) and a single URL (PFN) per line and separated by
whitespace. For example:
GHLTV-GA2_S5_A-815273413-600.gwf file://localhost/nfsdata/nfsdata13/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815273413-600.gwf
GHLTV-GA2_S5_A-815273413-600.gwf file://nfsdata13.nemo.phys.uwm.edu/export1/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815273413-600.gwf
GHLTV-GA2_S5_A-815273413-600.gwf gsiftp://nemo-dataserver.phys.uwm.edu:15000/data/nemo/storage/data/nfsdata13/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815273413-600.gwf
GHLTV-GA2_S5_A-815274013-600.gwf file://localhost/nfsdata/nfsdata13/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815274013-600.gwf
GHLTV-GA2_S5_A-815274013-600.gwf file://nfsdata13.nemo.phys.uwm.edu/export1/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815274013-600.gwf
GHLTV-GA2_S5_A-815274013-600.gwf gsiftp://nemo-dataserver.phys.uwm.edu:15000/data/nemo/storage/data/nfsdata13/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815274013-600.gwf
GHLTV-GA2_S5_A-815274613-600.gwf file://localhost/nfsdata/nfsdata13/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815274613-600.gwf
GHLTV-GA2_S5_A-815274613-600.gwf file://nfsdata13.nemo.phys.uwm.edu/export1/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815274613-600.gwf
GHLTV-GA2_S5_A-815274613-600.gwf gsiftp://nemo-dataserver.phys.uwm.edu:15000/data/nemo/storage/data/nfsdata13/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815274613-600.gwf
GHLTV-GA2_S5_A-815275213-600.gwf file://localhost/nfsdata/nfsdata13/S5/GA2_S5_A/GHLTV/815273000-815282999/GHLTV-GA2_S5_A-815275213-600.gwf
The output of the previous
globus-rls-cli command
nearly has this form; you simply need to remove the ":"
between the LFN and PFN. An easy way to do this is using sed:
sed -e 's/://' disk13mappings > disk13mappingRemovalInput
Remember that all mappings for a file need to be removed
from RLS before that file will be scheduled for replication.
Just removing the file:// URLs is not enough.
- Remove bad mappings from RLS:
With a file containing all of the bad mappings you can use the
globus-rls-cli command with the -i option to
easily remove all of the mappings.
If your globus-rls-cli command does not accept the -i flag
then you missed an update sent out by email. Send email to the
LDR list and ask for the update again.
The form of the command is
globus-rls-cli -i <file with mappings to remove> bulk delete rls://localhost
Note the use of the modifier bulk in the command
above. It is necessary if you use the -i option.
Without it you can only delete one mapping per invocation of
the command.
With the RLS catalog cleared of mappings for files that no
longer exist your LDR is now ready to replicate those files.
Be sure to configure your LDR collection.ini file
appropriately so that the files you need to replicated again
are defined within a collection. See the LDR administrator's
manual for details.