UWMLSC > Beowulf Systems > Nemo
  

High Level Overview of Condor Configuration on Nemo

Highlights of the rhe current Condor configuration on the Nemo cluster includes:

  • Higher memory jobs: For jobs needing more then 1004 MB of memory we have configured nodes 2-60 so that a single Condor slot (previously known as a virtual machine) is available on the CPU. This is accomplished by setting
    NUM_CPUS = 1
    
    in the local Condor configuration file for each of those individual nodes. This single slot advertises in its ClassAd 2008 MB of memory.

    Additionally in the /etc/init.d/condor file used to start the condor_master process and all of its children (including user jobs) we include for nodes 2-60 the command

    ulimit -v 2056192
    
    This limits jobs started by Condor to using 2056192 KB of memory. (Note that 2056192 = 2008 * 1024.)

    Note that because we have configured only a single Condor slot on these nodes there is one core (each CPU has two cores) that is not being utilized efficiently.

    Users that want to run exclusively on these nodes must put the following line in their Condor submit file:

    requirements = Memory > 1004
    
  • Standard jobs: Jobs not requiring more then 1004 MB of memory can utilize all nodes 2-780. In particular nodes 61-780 are configured to have two identical Condor slots for each CPU (one per core) with each one advertising 1004 MB of memory. This is accomplished by setting (we override the default so that it could be more easily changed later if so desired)
    VIRTUAL_MACHINE_TYPE_1 = cpu=1, mem=1004
    VIRTUAL_MACHINE_TYPE_2 = cpu=1, mem=1004
    
    NUM_VIRTUAL_MACHINES_TYPE_1 = 1
    NUM_VIRTUAL_MACHINES_TYPE_2 = 1
    

    Additionally in the /etc/init.d/condor file used to start the condor_master process and all of its children (including user jobs) we include for nodes 61-780 the command

    ulimit -v 1048576
    
    This limits jobs started by Condor to using 1048576 KB of memory. (Note that 1048576 > 1004 * 1024. We include a little bit of extra space for jobs that go over the 1004 limit, but not by too much.)
  • Non-condor heavy user use node: Node s0001 is not part of the Condor pool. It is available to users who need to perform intensive serial tasks (such as intensive compiling) that would otherwise damage the performance of the head node. We make this node available so that users will not have to burden the head nodes and make it hard for other users to do simple tasks like logging in and running condor_submit.
Check this page for dead links, sloppy HTML, or a bad style sheet; or strip it for printing.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.