The LIGO Data Replicator (LDR) is a tool for replicating data
sets to the member sites of a Virtual Organization or DataGrid.
The basic idea is simple. Your organization has data files generated
or produced at one computing site, and you would like some
infrastructure to automatically, efficiently, and robustly make
copies of or distribute (or replicate) the data to other sites
in your organization, and then make it possible for users to
discover the files.
LDR is a collection of some tools provided by the Globus project along with some
extra logic to pull the pieces together. The Globus pieces include
- Globus GridFTP for fast transport of files between sites
- Globus Replica Location Service (RLS) for keeping track of or
cataloging the locations of files within your organization
- A metadata service developed by the LDR team
but based on a prototype Globus Metadata Catalog Service (MCS)
for organizing useful information
about your data files, especially as it pertains to when and where the
data should be replicated.
We like to say that LDR is the minimum
collection of components necessary for fast, efficient, robust, and
secure replication of data. We have tried to make a tool that is
straightforward to install, configure, and administor but at the same
time scales to handle tens of sites and hundreds of terabytes of data.
LDR is the right tool for your organization if...
- your administrators spend a lot of hours making sure data gets
copied from one site to another
- the scripts your administrators are using for moving the data
around have grown to become brittle and unmanageable
- you are never really sure what data is located at which site
- your people could get a lot more work done if everybody had a
replica of the data locally on site
- your organization has 50 or less sites and not more than a
petabyte of data
LDR is not the right tool for your organization if...
- you only are concerned about moving data from one site to another,
or you have 100s of sites to be concerned about
- you only have a few 10s of gigabytes of data to be concerned
about or you have tens of petabytes of data to be concerned about
- you require top dollar robustness and security, ie. no government
or military types need apply (though we expect many university
research groups could benefit from LDR)