Using Pegasus to Run LSC Code on the Grid
Requirements on users system
The users system is the machine that the user uses to generate their abstract workflow, run Pegasus to generate the concrete DAG and then runs the concrete DAG from.
The users machine should have the following software installed and configured correctly:
- LSC approved operating system: currenltly Fedora Code 3.
- Installation of Condor version 6.7.8 or greater.
- Installation of LSC Data Grid Server version 3.0 or greater. The LSC
Data Grid server should be correctly configured so that the following services
are enabled:
- The Condor pool on the local machine should be up and running. In
particular:
- Condor should accept and run jobs in the scheduler universe
- Condor should accept and run jobs in the globus universe
- Incoming and outgoing gsissh should work correctly.
- Incoming and outgoing gsiftp should work correctly.
- The Globus GRAM job manager should work correctly to accept jobs.
In particular:
- The globus fork job manager should be correctly configured and accept incoming jobs.
- The Condor pool on the local machine should be up and running. In
particular:
- The lscsoft repository should be installed so that lal and lalapps can be built.
- The ligo.sh scripts should be installed so that environment variables for the installed software are correctly configured when the user logs in.
Stuart Anderson should be able to set up a machine at Caltech which is properly configured by following the configuation that was used to set kitalpha up as a FC3 machine; most of the neccessary software is already installed in /ldcg
Other users starting from a blank Fedora Core 3 machine should:
- Follow the LSC Data Grid Server Install Instructions
- Follow the instructions to Install lscsoft from the yum repository
The following LSC Data Grid machine are know to be correctly configured (as of 7/21/2005):
ldas-grid.ligo.caltech.edu
You will also need a valid grid certificate and private key installed in ${HOME}/.globus
If do not have a certificate, follow the user instructions to get a digital certificate. If you already have a certificate and key on a different machine, you can copy them to the machine your will be running Pegasus from. Make sure they are in the directory ${HOME}/.globus with the correct permissions: 400 for userkey.pem and 644 for usercert.pem
Access to the LIGO data and LDR
To generate a DAX using the inspiral pipeline, you will need access to an LDRdataFindServer (i.e. the ability to run LSCdataFind). If you do not already have this, fill in the LSC Data Grid account request form to request an account on the LSC Data Grid, making sure that you specify that you need to use LSCdataFind.
To run Pegasus you will need access to the LIGO RLI servers that know where the data is and access to the grid ftp servers that have the data. You will need to contact Scott Koranda and Stuart Anderson (at least) to be added to the UWM and CIT servers. You will need to send them your DN and request that they do the following:
- Edit the file $LDR_LOCATION/globus/etc/globus-rls-server.conf
- Look for the section that starts
# permission for Pegasus users to query LRC RLI
- Add a line of the form (replace my DN with the users DN)
acl /DC=org/DC=doegrids/OU=People/CN=Duncan Brown 792417: lrc_read rli_read stats
- Send a kill -HUP to the globus-rls-server process. There is no need to stop and restart the rest of LDR.
- Edit the file $LDR_LOCATION/globus/etc/grid-mapfile.gridftp and add the users DN to this file mapping to the user who has access to the data (i.e. grid or datarobot). Note when this goes production, users should be mapped to a user who has read-only access to the data to prevent them from deleting it using uberftp.
- Edit the file $LDR_LOCATION/globus/etc/grid-mapfile.gram and add the users DN mapping to a user who has access to the data.
Compile the inspiral code
Follow the LAL and LALApps install instructions to build and install the inspiral software.
Note that you should always use the --enable-condor option when configuring LALApps so that static, standard universe executables are built which can be easily run on the grid.
Download and install glue
The Grid LSC User Environment (Glue) is not yet included in the lscsoft repository so you will need to download it from CVS.
- Create an empty file called .noglue in your home directory by
running
touch ${HOME}/.nolscsoft-glueLog out and log back in. This will disable any system installed copies of Glue. - Follow the Glue README file to install Glue.
Installation of VDS
The LSC Data Grid server is built on top of VDT which include the VDS (the package that contains Pegasus), however the version of VDS installed in LDG 3.0 and LDG 3.5 does not have the features we wish to test.
A recent version of vds-binary should be installed for running Pegasus. Nightly builds are available from the Pegasus cvs build web page. The correct version for FC3 is linux-i686-glibc235. The version vds-binary-1.3.10-linux-i686-glibc235-20050901.tar.gz is known to have all the necessary bug fixes to create concrete DAGs for the inspiral pipeline.
Install the VDS as follows:
- Create a directory to contain VDS. We assume this will be created under
${HOME} here:
mkdir ${HOME}/vds - Download the VDS binary tarball into this directory and uncompress it:
cd ${HOME}/vds wget http://vds.isi.edu/cvs-nightly/vds-binary-1.3.10-linux-i686-glibc235-20050901.tar.gz tar -zxvf vds-binary-1.3.10-linux-i686-glibc235-20050901.tar.gz - This will create a directory with a version number in it containing the
VDS binaries. Create a symbolic link called vds-current that points to this
directory:
ln -s vds-1.3.10 vds-current
- Add the following lines to your .bashrc
unset CLASSPATH VDS_HOME=${HOME}/projects/grid/vds/vds-current export VDS_HOME source ${VDS_HOME}/setup-user-env.sh export PATH=${VDS_HOME}/bin:${PATH}These should be added AFTER sourcing the LDG setup script. - Log out and log back in to update your environment variables.
Generate an inspiral DAX
Once the inspiral code is installed, you can generate inspiral workflows. The following shows how to create a simple DAX that can be used to test the LSC Data Grid and/or the OSG.
- Make sure that you can talk to an LSC data find server by running:
LSCdataFind --ping
it should respond withLDRdataFindServer at ldas-cit.ligo.caltech.edu is alive
where ldas-cit.ligo.caltech.edu is replaced by your local LSC data find server. - Make a directory to work in, for example:
mkdir ${HOME}/grid_inspiral - Download the files into this directory.
- Uncompress the cache file directory (this contains the names of the
calibration frames)
tar -zxvf cache_files.tar.gz
- Make sure you have a valid grid proxy by running
grid-proxy-init
- Run the script lalapps_inspiral_pipe to generate a DAX by
executing it with the arguments
lalapps_inspiral_pipe --datafind --template-bank --inspiral --triggered-bank --triggered-inspiral --coincidence --config-file inspiral_pipe.ini --log-path . --dax
- This should create a file called inspiral_pipe.dax which can be given to Pegasus.
Use Pegasus to generate a concrete DAG from the DAX
The first step is to obtain a vaild pool configuration file and transformation catalog for Pegasus.
If you want to run on the Open Science Grid
You can run the program vds-get-sites to obtain a site config and transformation catalog.
- Run
vds-get-sites --grid osg-itb
replace osg-itb with osg to use the OSG proper instead of the testbed. - Copy the resulting files to the directory containing the DAX:
cp ${VDS_HOME}/var/tc.data . cp ${VDS_HOME}/etc/sites.xml .
If you want to run on the LSC Data Grid
There is currently no automated way of generating a pool config and transformation catalog for the LSC Data Grid. You can generate your own by downloading the files
You will need to edit these files to get the correct paths to the required directories where you have permission to write files. Once you have done this, run the command
genpoolconfig --poolconfig sites.txt --output sites.xmlto generate the XML pool config file required by Pegasus.
For both the OSG and the LSC Data Grid
Tell Pegasus where to find the inspiral executable you have built by doing the following:
- Edit the file tc.data and add the locations of the
inspiral executables to be staged onto the grid. At the bottom of the file,
add the lines
local ligo::lalapps_tmpltbank:1.0 GSIFTPPATH/bin/lalapps_tmpltbank STATIC_BINARY INTEL32::LINUX local ligo::lalapps_inspiral:1.0 GSIFTPPATH/bin/lalapps_inspiral STATIC_BINARY INTEL32::LINUX local ligo::lalapps_inca:1.0 GSIFTPPATH/bin/lalapps_inca STATIC_BINARY INTEL32::LINUX
NOTE: You should replace the string GSIFTPPATH with the gsiftp URL of the inspiral binaries on your machine. The commandecho gsiftp://`hostname -f`${LAL_PREFIX}should give you the correct string to replace GSIFTPPATH. For me this command returnsgsiftp://ldas-grid.ligo.caltech.edu/archive/home/dbrown
You can check these URLs are correct before running Pegasus by using uberftp to copy them to your home directory. - If you are using the LSC data grid tc.data, add the locations of the
dirmanager and kickstart executables on your local pool to the
tc.data file. NOTE: this is not needed if you created your
tc.data with vds-get-sites. At the bottom of the file add the
lines
local transfer VDS_HOME/bin/transfer INSTALLED INTEL32::LINUX vds::bundle_stagein=1 local dirmanager VDS_HOME/bin/dirmanager INSTALLED INTEL32::LINUX
NOTE: you should replace the string VDS_HOME with the value of the environment variable VDS_HOME that you defined previously. For example, for me this is set to[dbrown@kitalpha.ligo pegasus]$ echo $VDS_HOME /archive/home/dbrown/projects/grid/vds/vds
and so the correct path to the transfer executable is/archive/home/dbrown/projects/grid/vds/vds/bin/transfer
Obtain a VDS properties file for Pegasus. Download the file
into the directory containing the DAX.Since the calibration frames are not yet retrieved via LDR, you will need a PFN cache that tells Pegasus where it can find the calibration data. Download the file
to the directory containing the DAX. You will need gsiftp access to the machine ldas-grid.ligo.caltech.edu to obtain this data.Now you should be able to run Pegasus to generate a concrete DAG.
- Make sure you have a vaild grid proxy and you don't have any other proxies
hanging around:
unset X509_USER_PROXY grid-proxy-init
- Run gencdag to create the concrete DAG. This example creates a concrete
dag that runs on the LSC data grid pools at Caltech, Penn State, Hanford, UWM
and LSU and returns the results to the local pool:
gencdag -Djava.net.preferIPv4Stack=true -Dvds.properties=./properties -vvvvv -a -r -p uwm,psu,cit,lho,supermike,helix -o local -d inspiral_pipe.dax --dir all_clusters --cache calibration_pfn_cache.txt
The option -p specifies a comma separated list of pools to use: for example -p UWMilwaukee,OSG_LIGO_PSU,BNL_ATLAS_1 will use three of the OSG production pools. The pool names can be obtained from GridCat for OSG or from the sites.xml file for the LSC data grid.
The option --dircontrols the name of the directory to which the concrete DAG is written. For example you could specify --dir osg_test to create the DAG and submit files in the directory osg_test.
The final stage is to run the concrete DAG:
- Change into the working directory containing the concrete DAG. For the
above example
cd UWMilwaukee
- Submit the concrete DAG by running
condor_submit_dag inspiral-0.dag
- You can watch the progress of the concrete DAG with
tail -f inspiral-0.dag.dagman.out
Where to go for help
If everything is configured correctly, it should work. If you have problems, mail the griphynligo mailing list.
$Id: pegasus_lsc.html,v 1.37 2006/10/26 18:17:23 bmoe Exp $