Lab Exercise: Constructing Pipelines with Glue
During this lab the user will become familiar with the pipeline.py module in Glue:
We will get a sample pipeline script and configuration file and walk through them in this lab.
$ cd ~
$ mkdir simple_pipe_demo
$ cd simple_pipe_demo
$ wget http://www.lsc-group.phys.uwm.edu/lscdatagrid/LSCGridCamp/simple_pipe.ini
$ wget http://www.lsc-group.phys.uwm.edu/lscdatagrid/LSCGridCamp/simple_pipe.py
$ chmod +x simple_pipe.py
# This is an "ini" or configuration file read by a python script to set # up various parameters. Here we set up some parameters used by the # simple DAG generation program # Square brackets denote a "section" in the ini file. The section below is # called input. Sections contain parameters used in the DAG we are # constructing. The input section contains the name of the IFO, the name # of the channel and the name of the file containing the segments [input] ifo = L1 channel = LSC-DARM_ERR segments = segments.txt # The condor section below tells condor what universe to run the executables # in and where to find them. You should change the path to LSCdataFind below # to point to the one you installed with Glue. [condor] universe = vanilla datafind = /data2/dbrown/bin/LSCdataFind # the datafind section contains the parameters for the datafind jobs we are # running in the DAG. Here we request level 1 RDS data whose LFNs match the # string localhost/archive. We request the results of datafind in the LAL # cache format. [datafind] type = RDS_R_L1 match = localhost/archive lal-cache =
#!/usr/bin/env python # import the python modules that we need in this program import sys import os import ConfigParser from glue import pipeline # The "ini" file containing the configuration arguments for DAG generation config_file = "simple_pipe.ini" # The path to the log file for condor log messages. DAGman reads this # file to find the state of the condor jobs that it is watching. It # must be on a local file system (not in your home directory) as file # locking does not work on a network file system. log_file = "/usr1/lcldsk/dbrown/condor_jobs.log" # Make directories to store the frame cache files and error messages # from the nodes in the DAG try: os.mkdir( 'cache' ) except: pass try: os.mkdir( 'logs' ) except: pass # Read in the configuration from the config file cp = ConfigParser.ConfigParser() cp.read( config_file ) # Create an object that describes the data we want to analyze data = pipeline.ScienceData() # Now load the segment file containing the science segments into this object # throwing away science segment shorter than 1000 seconds. We get the name # of the segments file from the segment option in the input section of the # ini file. data.read( cp.get('input','segments'), 1000 ) # Create a dag to which we can add jobs dag = pipeline.CondorDAG(log_file) # Set the name of the file that will contain the DAG dag.set_dag_file( 'simple_pipe.dag' ) # Create a datafind job. This describes parameters common to all the # data find nodes in the DAG. df_job = pipeline.LSCDataFindJob( 'cache','logs', cp ) # Create a variable to store the previous datafind that we ran prev_df = None # Now loop over the all the science segments in the data object for seg in data: # Create a data find node. This is a particular instance of datafind # running in the DAG df = pipeline.LSCDataFindNode( df_job ) # Set the start and end times of the data find query, getting the # times by querying the current segment in the list we are looping # over df.set_start( seg.start() ) df.set_end( seg.end() ) # set the observatory for the datafind command by pulling the option # 'ifo' from the 'input' section of the configure file. The [0] # index after we obtain the ifo name selects only the first character # so L1 becomes L, H2 becomes H, etc. df.set_observatory( cp.get('input','ifo')[0] ) # if there was a previous data find job, make it the parent of this one if prev_df: df.add_parent( prev_df ) # store this datafind in the previous datafind variable prev_df = df # now add the node we have generated to the DAG dag.add_node( df ) # write out the submit files needed by condor dag.write_sub_files() # and write out the DAG itself dag.write_dag() # exit cleanly sys.exit( 0 )
$ LSCsegFind --server=ldas.ligo-la.caltech.edu --interferometer L1 --type Science --gps-start-time 793756813 --gps-end-time 794102413 --output-format segwizard > segments.txt
You can open segments.txt with a text editor to check that contains segments.
$ ./simple_pipe.py
$ ls
cache logs simple_pipe.dag simple_pipe.py datafind.sub segments.txt simple_pipe.iniThe DAG file is called simple_pipe.dag.
$ unset X509_USER_PROXY
$ grid-proxy-init
Your identity: /DC=org/DC=doegrids/OU=People/CN=Duncan Brown 792417
Enter GRID pass phrase for this identity:
Creating proxy ..................................................... Done
Your proxy is valid until: Sat Mar 26 23:27:55 2005
$ condor_submit_dag simple_pipe.dag
Checking all your submit files for log file names.
This might take a while...
Done.
-----------------------------------------------------------------------
File for submitting this DAG to Condor : simple_pipe.dag.condor.sub
Log of DAGMan debugging messages : simple_pipe.dag.dagman.out
Log of Condor library debug messages : simple_pipe.dag.lib.out
Log of the life of condor_dagman itself : simple_pipe.dag.dagman.log
Condor Log file for all jobs of this DAG : /usr1/lcldsk/dbrown/condor_jobs.log
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 138526.
-----------------------------------------------------------------------
You will see LAL cache files appearing the the directory cache.
$ cd cache
$ ls
L-793756813-793761885.cache L-793898250-793908888.cache L-793763207-793767231.cache L-793930051-793950156.cache L-793767680-793784113.cache L-793951442-793958125.cache L-793784732-793787382.cache L-793958931-793975740.cache L-793788056-793796486.cache L-793976166-793985941.cache L-793808248-793831885.cache L-793986885-794001768.cache L-793832579-793873836.cache L-794004919-794011738.cache L-793877308-793879457.cache L-794018389-794027186.cache L-793879839-793881136.cache L-794028147-794088934.cache L-793881409-793895968.cache L-794096014-794102413.cache