LIGO Data Replicator

...A lightweight tool for replicating heavyweight data...

The LIGO Data Replicator (LDR) is a tool for replicating data sets to the member sites of a Virtual Organization or DataGrid.

The basic idea is simple. Your organization has data files generated or produced at one computing site, and you would like some infrastructure to automatically, efficiently, and robustly make copies of or distribute (or replicate) the data to other sites in your organization, and then make it possible for users to discover the files.

LDR is a collection of some tools provided by the Globus project along with some extra logic to pull the pieces together. The Globus pieces include

  • Globus GridFTP for fast transport of files between sites
  • Globus Replica Location Service (RLS) for keeping track of or cataloging the locations of files within your organization
  • A metadata service developed by the LDR team but based on a prototype Globus Metadata Catalog Service (MCS) for organizing useful information about your data files, especially as it pertains to when and where the data should be replicated.

We like to say that LDR is the minimum collection of components necessary for fast, efficient, robust, and secure replication of data. We have tried to make a tool that is straightforward to install, configure, and administor but at the same time scales to handle tens of sites and hundreds of terabytes of data.

LDR is the right tool for your organization if...

  • your administrators spend a lot of hours making sure data gets copied from one site to another
  • the scripts your administrators are using for moving the data around have grown to become brittle and unmanageable
  • you are never really sure what data is located at which site
  • your people could get a lot more work done if everybody had a replica of the data locally on site
  • your organization has 50 or less sites and not more than a petabyte of data

LDR is not the right tool for your organization if...

  • you only are concerned about moving data from one site to another, or you have 100s of sites to be concerned about
  • you only have a few 10s of gigabytes of data to be concerned about or you have tens of petabytes of data to be concerned about
  • you require top dollar robustness and security, ie. no government or military types need apply (though we expect many university research groups could benefit from LDR)

