Configuring and Deploying Condor
When the LSC DataGrid Server (or more properly, the VDT Server upon which the LSC DataGrid Server is based) is installed Condor is only setup to run on that single machine.
Primarily this is because it would be very difficult for the installation software (Pacman) to detect the details of your cluster configuration, and for the cache authors to build caches to handle all possible variations.
Most likely the default installation and configuration is not what you want.
Below are instructions for deploying Condor onto your cluster using one particular method and configuration option or style, and making some basic assumptions. Condor is very flexible and so you may choose to install, configure, and deploy it in a variety of ways. If you find that the instructions below do not suit your needs please see the Condor manual.
- If you have not already, source the setup.sh file:
source /opt/ldg/setup.sh
- Run condor_configure to slightly modify your current installation:
/opt/ldg/condor/condor_configure --local-dir=/opt/ldg/condor/home
- Create a condor_config.local file:
touch /opt/ldg/condor/home/condor_config.local
- Edit /opt/ldg/condor/etc/condor_config:
- Set CONDOR_HOST to the FQDN of the machine on which you installed the server package. If this machine has a seperate network interface just for access to the cluster nodes, use that FQDN or equivalent IP address.
- Set RELEASE_DIR to be /opt/ldg/condor or equivalent for your system.
- Set LOCAL_DIR to be $(RELEASE_DIR)/home
- Set LOCAL_CONFIG_FILE to be $(LOCAL_DIR)/condor_config.local
- Set CONDOR_ADMIN to an appropriate email address
- Set UID_DOMAIN to the subnet domain for your cluster. For example at UWM a typical node has FQDN medusa-slave001.medusa.phys.uwm.edu so the subnet domain is medusa.phys.uwm.edu.
- Set FILESYSTEM_DOMAIN to be $(FULL_HOSTNAME) if your cluster does NOT have a shared filesystem for users, or set it to the subnet domain if it does have a shared filesystem for users.
- Set USE_NFS to be True if your cluster has a shared filesystem for users.
- Edit /opt/ldg/condor/etc/examples/condor.boot and set MASTER=/opt/ldg/condor/sbin/condor_master
- Create a tar file that you can deploy onto each node of your cluster:
tar -cf condor.tar /opt/ldg/condor
- Create a condor user and group on each node of your cluster. A home directory is not necessary nor is a login shell. If you use NIS that is fine too.
- Deploy the tar file onto each node of your cluster, creating /opt/ldg/condor on each node.
- On each node copy /opt/ldg/condor/etc/examples/condor.boot to /etc/init.d/condor.
- On each node execute chkconfig --add condor.
- On each node create the symlink /etc/condor/condor_config -> /opt/ldg/condor/etc/condor_config
- Back on the machine on which you installed the server package, create the file
/opt/ldg/condor/home/condor_config.local if necessary and edit it to
add the following:
COLLECTOR_NAME = FQDN DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, STARTD, SCHEDD COLLECTOR = $(SBIN)/condor_collector NEGOTIATOR = $(SBIN)/condor_negotiator
If this machine has a seperate network interface for the cluster nodes also add the lineNETWORK_INTERFACE = your ip address
where the right hand side is the IP address for the seperate network interface. - Start condor on the machine on which you installed the server package:
/etc/init.d/condor start
- Start condor on the nodes by doing the same on each node.
- On the server or Condor Central Manager machine do /opt/ldg/condor/bin/condor_status to see the status of your Condor pool.
This completes a basic deployment and configuration of Condor. You are strongly encouraged to read the Condor Manual and learn how to configure Condor is the best way for your particular cluster.
$Id: condordeploy.html,v 1.4 2007/11/06 03:41:04 patrick Exp $