Highlights of the rhe current Condor configuration on the Nemo cluster includes:
NUM_CPUS = 1in the local Condor configuration file for each of those individual nodes. This single slot advertises in its ClassAd 2008 MB of memory.
Additionally in the /etc/init.d/condor file used to start the condor_master process and all of its children (including user jobs) we include for nodes 2-60 the command
ulimit -v 2056192This limits jobs started by Condor to using 2056192 KB of memory. (Note that 2056192 = 2008 * 1024.)
Note that because we have configured only a single Condor slot on these nodes there is one core (each CPU has two cores) that is not being utilized efficiently.
Users that want to run exclusively on these nodes must put the following line in their Condor submit file:
requirements = Memory > 1004
VIRTUAL_MACHINE_TYPE_1 = cpu=1, mem=1004 VIRTUAL_MACHINE_TYPE_2 = cpu=1, mem=1004 NUM_VIRTUAL_MACHINES_TYPE_1 = 1 NUM_VIRTUAL_MACHINES_TYPE_2 = 1
Additionally in the /etc/init.d/condor file used to start the condor_master process and all of its children (including user jobs) we include for nodes 61-780 the command
ulimit -v 1048576This limits jobs started by Condor to using 1048576 KB of memory. (Note that 1048576 > 1004 * 1024. We include a little bit of extra space for jobs that go over the 1004 limit, but not by too much.)