Results of OSG Demo
Summary of Problems
Authentication
- Several pools failed the gsiftp test with Timeout experienced while reading from ip stream
- Most pools rejected my DN to auth with their job manager, leaving 13 working pools.
Pegasus/VDS
- The VDS program exitcode considers an empty file to be a success! This is bad, as several jobs that failed created empty .out files and exitcode incorrectly tells dagman that the job has succeeded.
- The job inspiral_0_UTA_DPCC_cdir failed globus submission. Condor re-tried it several times then aborted it, creating an empty out file.
Data transfer
- Several sites failed to transfer data from UWM and CIT to the local pools. Firewall issues?
Running Job
- The Altas sites accepted the jobs, but they dissapeared into the void (possibly into a long queue?)
- Several sites (including UWM) gave the error No such file or directory when they tried to execute the tmpltbank job.
| Pool Name | gencdag auth test | make workdir | stage data | run tmpltbank |
| PROD_SLAC | (7) | ![]() | ![]() |
|
| BNL_ATLAS_1 | ![]() | ![]() | ![]() | (27)
|
| BNL_ATLAS_2 | ![]() | ![]() | ![]() | (27)
|
| Purdue_ITaP | (2) | ![]() | ![]() |
|
| GRASE_CCR_U2 | (13) | ![]() | ![]() |
|
| NERSC_PDSF | (9) | ![]() | ![]() |
|
| USCMS_FNAL_WC1_CE | (10) | ![]() | ![]() |
|
| UCSandiegoOSG_Prod | (11) | ![]() | ![]() |
|
| IU_ATLAS_Tier2 | (12) | ![]() | ![]() |
|
| OSG_LIGO_PSU | ![]() | ![]() | (22) |
|
| UWMilwaukee | ![]() | ![]() | ![]() | (24)
|
| UTA_DPCC | ![]() | (14) | ![]() |
|
| CIT_CMS_PG | (4) | ![]() | ![]() |
|
| UFlorida_PG | (1) | ![]() | ![]() |
|
| GRASE_CCR_ACDC | ![]() | ![]() | (23) |
|
| Purdue_Physics | (17) | ![]() | ![]() |
|
| Nebraska | ![]() | ![]() | ![]() | (25)
|
| agt_bu_edu | ![]() | ![]() | ![]() | (21)
|
| TTU_ANTAEUS | (6) | ![]() | ![]() |
|
| GRASE_BINGHAMTON | ![]() | ![]() | (18) |
|
| FNAL_GPFARM | (5) | ![]() | ![]() |
|
| OUHEP_OSG | ![]() | ![]() | ![]() | (26)
|
| ASCC_OSG | (8) | ![]() | ![]() |
|
| GRASE_CCR_MAMA | ![]() | ![]() | ![]() | (14)
|
| GRASE_ALBANY | (3) | ![]() | ![]() |
|
| UIOWA_OSG_PROD | ![]() | ![]() | (19) |
|
| FNAL_FERMIGRID | (15) | ![]() | ![]() |
|
| UC_ATLAS_Tier2 | ![]() | ![]() | ![]() | (27)
|
| FNAL_DDS2 | (16) | ![]() | ![]() |
|
- (1) Timeout experienced while reading from ip stream of ufloridapg.phys.ufl.edu:2811
- (2) Could not authenticate against jobmanager osg.rcac.purdue.edu/jobmanager-condor because Authentication with the remote server failed
- (3) Could not authenticate against jobmanager grid.rit.albany.edu/jobmanager-condor because Authentication with the remote server failed
- (4) Could not authenticate against jobmanager tier2b.cacr.caltech.edu/jobmanager-condor because Authentication with the remote server failed
- (5) Could not authenticate against jobmanager fngp-osg.fnal.gov/jobmanager-condor because Authentication with the remote server failed
- (6) Timeout experienced while reading from ip stream of antaeus.hpcc.ttu.edu:2811
- (7) Could not authenticate against jobmanager osgserv01.slac.stanford.edu/jobmanager-lsf because Authentication with the remote server failed
- (8) Could not authenticate against jobmanager osgc01.grid.sinica.edu.tw/jobmanager-condor because Authentication with the remote server failed
- (9) Could not authenticate against jobmanager pdsfgrid2.nersc.gov/jobmanager-sge because Authentication with the remote server failed
- (10) Could not authenticate against jobmanager cmsosgce.fnal.gov/jobmanager-condor because Authentication with the remote server failed
- (11) Could not authenticate against jobmanager t2cms02.sdsc.edu/jobmanager-condor because Authentication with the remote server failed
- (12) Timeout experienced while reading from ip stream of atlas.iu.edu:2811
- (13) Could not authenticate against jobmanager u2-grid.ccr.buffalo.edu/jobmanager-fork because Authentication with the remote server failed
- (14) Event: ULOG_GLOBUS_SUBMIT_FAILED for Condor Job inspiral_0_UTA_DPCC_cdir (1326.0)
Event: ULOG_JOB_ABORTED for Condor Job inspiral_0_UTA_DPCC_cdir (1326.0)
Running POST script of Job inspiral_0_UTA_DPCC_cdir...
POST Script of Job inspiral_0_UTA_DPCC_cdir completed successfully.
2005.07.21 10:20:48.238 CDT: [app] will use /Users/dbrown/projects/grid/vds/vds-1.3.6/etc/iv-1.4.xsd
2005.07.21 10:20:48.254 CDT: [app] file has zero length inspiral_0_UTA_DPCC_cdir.out, assuming success
2005.07.21 10:20:48.263 CDT: [app] exit status = 0
- (15) Could not authenticate against jobmanager fermigrid1.fnal.gov/jobmanager-mis because Authentication with the remote server failed
- (16) Could not authenticate against jobmanager cmsp4.fnal.gov/jobmanager-condor because Authentication with the remote server failed
- (17) Could not authenticate against jobmanager grid.physics.purdue.edu/jobmanager-mis because Authentication with the remote server failed
- (18) POST Script of Job rc_tx_GRASE_BINGHAMTON_0 failed with status 1
- (19) POST Script of Job rc_tx_UIOWA_OSG_PROD_0 failed with status 1
- (20) Running POST script of Job lalapps_tmpltbank_ID000113...
Event: ULOG_POST_SCRIPT_TERMINATED for Condor Job lalapps_tmpltbank_ID000113 (1356.0)
POST Script of Job lalapps_tmpltbank_ID000113 completed successfully.
2005.07.21 08:46:46.584 PDT: [app] will use /archive/home/dbrown/projects/grid/vds/vds/etc/iv-1.4.xsd
2005.07.21 08:46:46.587 PDT: [app] file has zero length lalapps_tmpltbank_ID000113.out, assuming success
2005.07.21 08:46:46.587 PDT: [app] exit status = 0
- (21) POST Script of Job lalapps_tmpltbank_ID000153 failed with status 2
- (22) POST Script of Job rc_tx_OSG_LIGO_PSU_0 failed with status 1
- (23) POST Script of Job rc_tx_GRASE_CCR_ACDC_0 failed with status 1
- (24) POST Script of Job lalapps_tmpltbank_ID000221 failed with status 2
- (25) Event: ULOG_GLOBUS_SUBMIT_FAILED for Condor Job lalapps_tmpltbank_ID000057 (1433.0)
- (26) Event: ULOG_JOB_TERMINATED for Condor Job lalapps_tmpltbank_ID000037 (1423.0)
Job lalapps_tmpltbank_ID000037 completed successfully.
Running POST script of Job lalapps_tmpltbank_ID000037...
ULOG_POST_SCRIPT_TERMINATED for Condor Job lalapps_tmpltbank_ID000037 (1423.0)
POST Script of Job lalapps_tmpltbank_ID000037 completed successfully.
2005.07.21 09:38:16.318 PDT: [app] will use /archive/home/dbrown/projects/grid/vds/vds/etc/iv-1.4.xsd
2005.07.21 09:38:16.321 PDT: [app] file has zero length lalapps_tmpltbank_ID000037.out, assuming success
2005.07.21 09:38:16.322 PDT: [app] exit status = 0 - (27) No apparent job failure, but no sucessful return. The jobs may be queued at the sites and I killed them before they got CPU time.
000 (1468.000.000) 07/21 09:04:05 Job submitted from host: <131.215.115.58:53691> pool:BNL_ATLAS_1 ... 017 (1468.000.000) 07/21 09:04:15 Job submitted to Globus RM-Contact: gridgk01.racf.bnl.gov/jobmanager-condor JM-Contact: https://gridgk01.racf.bnl.gov:20001/26629/1121961849/ Can-Restart-JM: 1 ...
$Id: osg_demo.html,v 1.34 2006/10/26 18:17:23 bmoe Exp $
(7)

(27)