LDR Design

High level design thoughts

A DataGrid is a set of geographically distributed sites, institutions, or groups with each site offering Grid Services, and where the set of sites has some collective need to share, distribute, and replicate large sets of data, files, or data files (if you prefer). LDR is designed to replicate data among the sites in a DataGrid.

The LDR installation at each site takes on any combintation of three different roles. The roles are that of publisher, provider, and subscriber. A LDR installation acting as a publisher introduces into the DataGrid information about available files and where they are located (via URLs), as well as information about the files themselves (so-called metadata such as the creation time or size). A LDR installation acting as a provider makes files available for replication to other sites. A LDR installation acting as a subscriber replicates data from a provider to itself.

A DataGrid must have at least one LDR installation acting as a publisher, at least one acting as a provider, and at least one acting as a subscriber. Again, any single LDR installation can be any combination of publisher, provider, and subscriber.

LDR is designed to maximize the quantity of data replicated at the expense of the reliability of any single file being transfered. Put another way, LDR tries to replicate as much data as it can without regard to the order in which files are replicated or the success of any single file transfer. LDR is designed for bulk replication of data, as opposed to replicating smaller sets of files for just-in-time computing.

Mid level design thoughts

Replicating data within a DataGrid requires...

  • ...a mechanism for keeping track of what data exists within the DataGrid.

    LDR uses a metadata database to store information about what files exist and information about the files such as size or creation time. Metadata information itself is replicated from LDR instance to LDR instance so that all sites in the DataGrid can be aware of what data might be replicated.

  • ...a mechanism for keeping track of where data is.

    LDR uses the Globus Replica Location Server (RLS) to store information about what files are located where. Each RLS has two parts. The first is a Local Replica Catalog (LRC) to store information about what data exists locally. The second is a Replica Location Index (RLI) to store information about what LRCs exists within the DataGrid and what files each LRC knows about locally.

  • ...a mechanism for determining what files need to replicated from one location to another.

    A LDR administrator defines collections of desired files by SQL queries to be performed within the LDR metadata database.

  • ...a mechanism for scheduling files to be replicated.

    LDR uses a simple priority queue for scheduling. The source locations are determined by querying the local RLI to determine a LRC within the DataGrid that knows about a file, and then directly querying the remote LRC to determine the URL for the source. If for some reason the state of the queue is lost or becomes corrupt it is not a problem since LDR regularily regenerates the queue based on need lists and the current state of the LRCs and RLIs.

  • ...a mechanism for actually replicating files.

    LDR uses the GridFTP protocol for transfer of files across a WAN. LDR attempts to transfer as many files as possible between two sites and can simultaneously transfer data between multiple sites at the same time. Failed transfers are not instantly retried, but rather LDR moves on to the next transfer. Files that do not transfer successfully simply end up being rescheduled and another attempt made later.

  • ...a mechanism for storing replicated files.

    LDR assumes a very general model for "storage". The details of any particular storage system can be coded and made available to LDR via a very simple API. A "storage" model is only required to accept a file for ingestion and return a URL for the ingested file which is published to the local LRC. The details of how the file is ingested or the URL generated are not needed by LDR.

Low level design thoughts

A LDRMaster daemon is responsible for launching other necessary LDR daemons and watching over them. The available (not all daemons are necessary for every LDR installation) daemons include

  • LDRMetadataServer, a GSI-SOAP server that makes metadata published at a site available to other LDR instances.
  • LDRMetadataUpdate, a GSI-SOAP client that updates a LDR instance with new metadata as it is published at a remote LDR site.
  • LDRSchedule, which schedules (queues) transfers.
  • LDRTransfer, which spawns agents for each source site to replicate or transfer files from source locations.
  • LDRdataFindServer, which allows clients to query the metadata and replica location catalogs to discover data or files

Each daemon is independent of any other (apart from LDRMaster), and each can be stopped and restarted without any need to coordinate.

LDR Logo
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.