When running codes on a single data set repeatedly, the Nemo cluster can overload some of its fileservers. To remedy this, we developed nemo_copydir to distribute a directory sturcture to the local hard disks on each of Nemo's execute nodes. This is accomplished via a "tree copy": The data is copied from the file server to one cluster node, A. The data is then copied from A to two more cluster nodes, B and C. B will then copy to D and E, and C will copy to F and G. This process continues until the data is copied across all online cluster nodes. Our tests have shown that copying 10 GB of data takes approximately two hours.


nemo_copydir will generate a Condor DAG, and will copy the data from, i.e. /path/to/data to the local hard disks on the Nemo nodes under /data/localscratch, i.e. /data/localscratch/rosso

    Copying Data

  1. Make a directory under your home directory where Condor can write files. In this example, I'll use /home/rosso/copydir_test
  2. In that directory, run
    nemo_copydir --source /path/to/data --target rosso --logfile /people/rosso/copydir.log --copy
    This will generate a DAG, called mover.dag, to perform the copy. Remember, it's best to use /people for condor log files. Condor standard error and standard output redirection files will be stored in my copydir_test directory.
  3. Submit the dag:
    condor_submit_dag mover.dag
    This will start the copy. To track progress, you may run
    tail -f mover.dag.dagman.out

    Removing Data

Simply replace

when running nemo_copydir and submit the new dag.

Further Information

For additional information about nemo_copydir, run

nemo_copydir --help
. If you have additional questions or are having difficulty with the script, send a detailed email to