Next Meeting: Thursday Sep 14th, 2000 at 11AM.
Purpose of the meeting: to review wrapperAPI design requirements with a view to allow LSC members writing search code more flexibility in their choice of algorithm. The meeting began at 8:30am with a critical review of the requirements as outlined in the wrapperAPI baseline requirements document LIGO-T990097-12-E. Some minor changes and enhancements were proposed and accepted in Sec. 1 of the document:
| mpich 1.2.0 | mpich 1.2.1 | mpich 2.0 | |
| Exception handling in C++ | E | W? | P |
| Nodes dies, adjust comm at system level | - | - | P |
| Nodes dies, adjust comm by user | E | W? | P |
Here P=planned, E=exists, W=working.
LALWrapperInterface: This layer of code exists for two reasons: (1) Since LDAS is written in C++ and LAL is written in C, it is desireable that all code in the wrapperAPI is either C++ or fundamental C constructs. (2) It is necessary to translate between the LALStatus pointer and the error reporting mechanism outlined in the current implementation of the wrapperAPI. Blackburn emphasized that the wrapperAPI acts as the socket into which LAL search codes get plugged. As part of the LDAS, he does not want it to depend on code which is not included and maintained within LDAS. For this reason, he is not willing to compile against LAL while building the wrapperAPI in order to include the LALStatus pointer. A compromise was reached:
indexFilters(): The use of indexing to measure the progress of the search code has been great cause for concern among those writing the search code. The reason: indexing is not always a natural way to measure the progress of the all search codes. In real searches, some pieces of code are executed only if certain criteria are met. Consequently, it is not known in advance how many times such routines will be executed. It was decided that:
| Master | Slave | ||
| Parse cmd line args | Parse cmd line args | ||
| Load shared object | Load shared object | ||
| Get ILWD | |||
| Create MPI data structures | Create MPI data structures | ||
| if () Broadcast data | if () receive data | ||
| Create and fill generic data types | Create generic data types | ||
| iinitFilters() | initFilters() | ||
| create LBComm | create LBcomm | ||
| if (not) ConditionData() | ConditionData() | ||
| while(fraction <1) { | while (notfinished) { | ||
| for(i=0;i<numnodes;i++){ | |||
| time applyFilters | |||
| notfinished=applyFilters() | |||
| finish timing | |||
| receive status, results | report to Master | ||
| calculates timing information | |||
| converts to ILWD | freeOutput() | ||
| } | |||
| execute load balance | execute load balance | ||
| } | } | ||
| sends results to API | |||
| freeFilters() | freeFilters() | ||
| cleanup and stop MPI | cleanup and stop MPI |
Exception handling and informing applyFilters(): Blackburn pointed out that C++ implementation of MPI should allow exception handling so that the wrapperAPI will "sense" if a node fails for some reason. At the present time, the code simply calls mpiFinalize() if an exception is caught. A method of informing applyFilters() should be found so that the code can either (i) recover or (ii) perform a graceful exit. It is difficult for C code to learn this sort of information without the use of signals which conflict with current implementations of MPICH and are expected to be resolved in 2.0
applyFilters(): This routine can, and most times will, run as a sub-mpi process. That is, the lowest number slave will be identified as the "search master" node and the rest as search slaves. The only requirement on this code will be timely reporting of progress in the form of percentage of search completed, and this reporting should occur between 10-100 times per instantiation of the wrapperAPI. A placeholder function applyFiltersMaster() will be added to the wrapperAPI which will be executed only on the Master. Eventually, when all search codes have been validated, it might be possible to migrate the "search master" code into this function to run on the master node. This will occur only if such code does not perform significant computations so that it will interfere with communication to the mpiAPI.
while (notfinished) loop: Some mechanism for reporting information to be logged from each of the slaves must be instituted in this loop.
Other open issues:
Conclusions: Significant progress was made during this marathon 10 hour meeting. At the end, all parties appeared happy with the compromise and it was agreed that these issues would not be revisited again (at least until completion of the mpi MDC).