Minutes of the MPI working group

Saturday September 9, 2000 at 8:30am - 6:30pm PST

Attending: Kent Blackburn, Patrick Brady, Albert Lazzerini and Alan Wiseman.

Next Meeting: Thursday Sep 14th, 2000 at 11AM.

Purpose of the meeting: to review wrapperAPI design requirements with a view to allow LSC members writing search code more flexibility in their choice of algorithm. The meeting began at 8:30am with a critical review of the requirements as outlined in the wrapperAPI baseline requirements document LIGO-T990097-12-E. Some minor changes and enhancements were proposed and accepted in Sec. 1 of the document:

Item I.C.13 was being discussed in great detail and led directly to a discussion of the implementation of the wrapperAPI. Brady and Wiseman sketched the current implementation as extracted from the code. This served as a talking point for the remainder of the meeting. It was indicated that the current implementation implies a class of algorithms of which search code must follow one. The class of algorithms is tied to the way the indexing is used to organize the search. As a example, the hierarchical binary inspiral search code has not been designed in this model. Blackburn and Lazzerini outlined a way to fit this particular search into the indexed model. At this point, the discussion shifted to the parallel versus distributed nature of the search codes. The wrapperAPI implementation (constrained by the implementations of MPI that are available at present) requires all search code to reach a Barrier before dynamical load balancing can be performed.
 
  mpich 1.2.0  mpich 1.2.1  mpich 2.0 
Exception handling in C++  W? 
Nodes dies, adjust comm at system level 
Nodes dies, adjust comm by user  W? 

Here P=planned, E=exists, W=working.

LALWrapperInterface: This layer of code exists for two reasons: (1) Since LDAS is written in C++ and LAL is written in C, it is desireable that all code in the wrapperAPI is either C++ or fundamental C constructs. (2) It is necessary to translate between the LALStatus pointer and the error reporting mechanism outlined in the current implementation of the wrapperAPI. Blackburn emphasized that the wrapperAPI acts as the socket into which LAL search codes get plugged. As part of the LDAS, he does not want it to depend on code which is not included and maintained within LDAS. For this reason, he is not willing to compile against LAL while building the wrapperAPI in order to include the LALStatus pointer. A compromise was reached:

This solution was acceptable to all present. The time scale for importation of the code will be set by the time scale for maturation of LAL.

indexFilters(): The use of indexing to measure the progress of the search code has been great cause for concern among those writing the search code. The reason: indexing is not always a natural way to measure the progress of the all search codes. In real searches, some pieces of code are executed only if certain criteria are met. Consequently, it is not known in advance how many times such routines will be executed. It was decided that:

Master                       Slave 
Parse cmd line args  Parse cmd line args 
Load shared object  Load shared object 
Get ILWD   
Create MPI data structures  Create MPI data structures 
if () Broadcast data  if () receive data 
Create and fill generic data types  Create generic data types 
iinitFilters() initFilters()
create LBComm create LBcomm
if (not) ConditionData() ConditionData()
while(fraction <1) { while (notfinished) {
  for(i=0;i<numnodes;i++){  
  time applyFilters
  notfinished=applyFilters()
  finish timing
receive status, results report to Master
calculates timing information  
converts to ILWD freeOutput()
}  
execute load balance execute load balance
} }
sends results to API  
freeFilters() freeFilters()
cleanup and stop MPI cleanup and stop MPI

Exception handling and informing applyFilters(): Blackburn pointed out that C++ implementation of MPI should allow exception handling so that the wrapperAPI will "sense" if a node fails for some reason. At the present time, the code simply calls mpiFinalize() if an exception is caught. A method of informing applyFilters() should be found so that the code can either (i) recover or (ii) perform a graceful exit. It is difficult for C code to learn this sort of information without the use of signals which conflict with current implementations of MPICH and are expected to be resolved in 2.0

applyFilters(): This routine can, and most times will, run as a sub-mpi process. That is, the lowest number slave will be identified as the "search master" node and the rest as search slaves. The only requirement on this code will be timely reporting of progress in the form of percentage of search completed, and this reporting should occur between 10-100 times per instantiation of the wrapperAPI. A placeholder function applyFiltersMaster() will be added to the wrapperAPI which will be executed only on the Master. Eventually, when all search codes have been validated, it might be possible to migrate the "search master" code into this function to run on the master node. This will occur only if such code does not perform significant computations so that it will interfere with communication to the mpiAPI.

while (notfinished) loop: Some mechanism for reporting information to be logged from each of the slaves must be instituted in this loop.

Other open issues:

Conclusions: Significant progress was made during this marathon 10 hour meeting. At the end, all parties appeared happy with the compromise and it was agreed that these issues would not be revisited again (at least until completion of the mpi MDC). 


Patrick Brady

Last modified: Sun Sep 10 23:54:37 CDT 2000