When running codes on a single data set repeatedly, the Nemo cluster can overload some of its fileservers. To remedy this, we developed nemo_copydir to distribute a directory sturcture to the local hard disks on each of Nemo's execute nodes. This is accomplished via a "tree copy": The data is copied from the file server to one cluster node, A. The data is then copied from A to two more cluster nodes, B and C. B will then copy to D and E, and C will copy to F and G. This process continues until the data is copied across all online cluster nodes. Our tests have shown that copying 10 GB of data takes approximately two hours.
nemo_copydir will generate a Condor DAG, and will copy the data from, i.e. /path/to/data to the local hard disks on the Nemo nodes under /data/localscratch, i.e. /data/localscratch/rosso
nemo_copydir --source /path/to/data --target rosso --logfile /people/rosso/copydir.log --copyThis will generate a DAG, called mover.dag, to perform the copy. Remember, it's best to use /people for condor log files. Condor standard error and standard output redirection files will be stored in my copydir_test directory.
condor_submit_dag mover.dagThis will start the copy. To track progress, you may run
tail -f mover.dag.dagman.out
--removewhen running nemo_copydir and submit the new dag.
For additional information about nemo_copydir, run
nemo_copydir --help. If you have additional questions or are having difficulty with the script, send a detailed email to