Open Science Grid - LIGO
Overview of projects
1. Binary Inspiral Search: IHOPE on LDG/OSG
- AIM: To enable submission of LIGO work-flows to LDG/OSG sites via the Pegasus work-flow planner
- Project lead: Britta Daudert
- General Information http://www.ligo.caltech.edu/~bdaudert/INSPIRAL
2. Einstein at OSG (E@OSG)
- AIM: to run the Einstein at Home (E@H) application as Grid job on OSG
- Project lead: Robert Engel
- For information on the Einstein at Home project: http://einstein.phys.uwm.edu/
- Robert Engel, Thomas Radke (AEI, Golm, Germany) successfully developed method to run E@H jobs on D-GRID
- For more documentation on this method: http://www.ligo.caltech.edu/~bdaudert/E@OSG/E@OSG_doc/
- In a collaborative effort (Britta Daudert, Robert Engel, Thomas Radke) this method was ported to OSG
- Overall BOINC statistics for Caltech Open Science Grid Team
- Einstein at Home statistics for Caltech Open Science Grid Team
3. Pulsar Powerflux
- AIM: to run the Pulsar Powerflux application on OSG via the glideinWMS work-flow manager
- Project lead: Britta Daudert
- Pulsar Powerflux Page
- Current week's total usage: 4 users utilized 12 sites
- 82839 jobs total (44810 / 38029 = 54.1% success)
- 89259.7 wall clock hours total (76126.2 / 13133.4 = 85.3% success)
- Previous week's total usage: 4 users utilized 12 sites
- 40043 jobs total (16260 / 23783 = 40.6% success)
- 43641.2 wall clock hours total (35387.1 / 8254.1 = 81.1% success)
- Wall Clock Time for this Week
- CPU for this Week
- Job Count for this week
- Detailed LIGO statistics
- Running 50,000 job dags over 13 OSG sites via local Glidein factory
- Completed runs
- freq band 1800 - 1825
- Current run
- freq band 1775 - 1800
- Documentation page
- Statistics page
- Test runs on S6D week data set
- Glidein scale tests at FF
- Comparison: FF versus LDG runs
- Other Glidein tests
- S6C ihope pages
- S6D ihope pages
- OSG-LIGO storage task force
- Troubleshooting
- USCMS-FNAL-WC1-CE
- noexec in $DATA, $APP--> try FF initial_dir fix:<profile namespace="pegasus" key="change.dir" >true</profile>
- pegasus-plan_ID000011 failed with status 1: playground/inspiral_hipe_playground_cat2_veto.PLAYGROUND_CAT_2_VETO
- Unable to select any location from the list passed for lfn H1-CATEGORY_2_VETO_SEGS-949449543-86400.txt
- cwd in .out of failed job shows local dir, ??
- Pegasus bug, will be fixed in 3.0 release
- UMissHEP: glideins do not start up
- Near Real Time Analysis
- ./NRT-file-tranfer.sh "Feb 26 2010 23:00" osg-ce.ligo.caltech.edu /mnt/hadoop/ligo/s6-test /home/ligo/s6-test
- This script does the following
- queries LDG detector data generated from Feb 26 2010 23:00 until now
- Transfers the files into Caltech ITB Storage Element
- First test run start to finish: 18 mins. 671 files were transferred.
- Wall Clock Hours
- CPU Hours
- Job Count
- Host certificate (this requires a fixed HOSTNAME)
- Full VDT installation (comes with LDG)
- Globus gatekeeper???
- condor_G condor_dagman condor_schedd (included in LDG, and available in other places on the LSC resources)
- an UP-TO-DATE PEGASUS The one included in VDT is too old. Karan Vahi at ISI will be tagging a stable version of there SVN for LIGO science usage over the next 1-2 years. They will only apply bug fixes to that version and some of us will continue to test the new technologies regardless but we won't rely on them for day-to-day use.
- RLS permissions?
- Grid FTP
- Phil recently set up BOSCO as an OSG submit host. Here is a link to his instructions:
BOSCO as submit host - GT2 gram and a supported job manager OR WS gram
- RLS permissions, grid FTP
- LIGO software: The desired installation of lal, lalapps, glue, pylal
- Pegasus software
- Pegasus Downloads
- For general information on Pegasus: Pegasus Home Page
- Pegasus in a nutshell
- If you intend to run large work-flows, to avoid running out of memory, you additionally may need to set: export JAVA_HEAPMAX=4000 in your .bashrc
Recent activity reports 02/27/12 - 03/05/12
Weekly Gratia Reports
Pulsar Powerflux
Pegasus and Glideins
Older activity
Gratia Reports for DGrid + OSG since start of project until separation of DGrid and OSG: 12/01/2007 -- 11/05/2008
Extra documenation
Essentials for running LIGO code (IHOPE) on the grid (LDG/OSG)
lalapps_ihope is an inspiral pipeline script that generates a workflow which has currently only been used to create a condor dag and run on a local condor pool. pipeline.py in glue has had the ability to create a dax, which is a more abstract version of the work flow. Pegasus converts the dax to a dag which includes the appropriate condor globus submission to be able to submit jobs to remote clusters. Here is a brief outline of what is needed on 1) the submit host, 2) the remote host and 3) from the user. I am sure this is not a complete list, and some thing may be redundant.1) Submit host
2) Remote host
3) USER
Long Term Plans
In the past ...Milestones achieved
- 2007
- HIPE test workflow with 1000 VO slots peak (measured in MonALISA) on OSG
- HIPE test workflow running at 25 OSG sites
- 2008
- Porting E@H to OSG
- 2009
- Running ihope on ITB with SRM technology
- 2010
- Running ihope on OSG with SRM technology
- Running ihope on OSG with Pegasus glidein service Corral on OSG
- Automated file transfer from LDG to OSG
- 2011
- Ported Pulsar Powerflux application to OSG
- Successfully scaled up to 32 + OSG sites
$Id$