UWMLSC > Beowulf Systems > Medusa

Medusa Slave Cloning

You will have to construct a new slave root filesystem whenever you change the software or system configuration on the slaves. The bootloader, boot kernel, or initial ramdisk filesystems only need to be changed when the slave hardware changes.

This document describes:

  1. Kickstarting a Medusa slave
  2. Creating a cloning CD
  3. Rebuilding a slave from the new CD
  4. How the cloning process works
  5. Reading/writing partition tables
  6. Note on node.cpio.gz
  7. Cloning a slave over the network
  8. Slave Testing

Kickstarting a new machine

  1. Locate a spare slave to kickstart and ensure that it is connected to an ethernet port on the external network. The hardware requirements for the current kickstart and cloning scripts are:
         Architecture:  Intel PIII
    Hard drive:   Maxtor 98196H8  on ide0 as master  (/dev/hda )
    Maxtor 4K080H4  on ide0 as master  (/dev/hda )
    CD-ROM drive:  any IDE drive  on ide1 as slave  (/dev/hdd )
        MAC Address: This should be listed in the dhcp information on the master ( FILE /etc/dhcp.conf )

  2. Locate a Red Hat 6.2 CD. There are usually copies stored in a folder of CDs in the Medusa Beowulf Room. Otherwise, you can download an iso image from Red Hat's website and burn it onto a cd.
  3. Ensure that the latest version of the ramdisk is in the slaveconfig CVS archive. To do this, you must log into a machine that can access the CVS archives (like antares.phys.uwm.edu). You may need to be root to make the ramdisk image. Do the following to make the ramdisk image:
    1. cvs login username
    2. cvs co ramdisk
    3. cd ramdisk
    4. make rdfs ; make rd (you may need to become root)
    This will compile a new ramdisk image (ramdisk/isolinux/initrd.img). Now, commit changes to CVS by typing: cvs commit
    If the newly compiled version of the ramdisk is different than the one in CVS, the new one will be committed and you will be prompted to make a note of what has changed. Now, we will need to put this version in the slaveconfig CVS archive:
    1. cd ..
    2. cvs co slaveconfig
    3. cp ramdisk/isolinux/initrd.img slaveconfig/nonstandard/initrd-clone.img
    4. cd slaveconfig
    5. cvs commit
    Now symbolically tag the slaveconfig CVS archive for the current version of the cloning image (where xx below is the current version number):
    1. cvs tag CLONE_1_xx
    Make sure the configuration files are accessible and up to date:
    1. cd /home/htdocs/uwmlsc/root/beowulf/slaveconfig
    2. make
  4. Copy the latest version of the ks.cfg (kickstart config script) onto a floppy disk. The floppy disk will likely need to formatted to vfat:
    1. fdformat /dev/fd0 (this will do a low level format, if floppy is new)
    2. mkfs.msdos /dev/fd0

    It will also need to contain the eepro100 (e100_2.2.14-5.0BOOT.o) driver. If you have followed the instructions above, you can download the latest versions of the scripts from here:
    slave ks.cfg
    For more information about the kickstarting, look at the RedHat kickstart howto
  5. Copy the ks.cfg file to a floppy with the following command:
    mcopy ks.cfg a:
  6. Place the Red Hat 6.2 CD in the cdrom drive of the node to be cloned and boot the machine. When the boot menu appears, insert the floppy into the floppy disk drive and type the following:
    linux ks=floppy

    The node will now undergo the kickstart process. Take special note that the node will need to be connected to the external network for the initial build as it needs to download rpms and configuration files.
  7. The kickstart installation will be handled by three different sections of ks.cfg. The sections are:
    1. System info, e.g. disk partitioning and network setup
    2. RPM package installation
    3. Post-installation scripts
  8. Once complete, you will be prompted to remove the CD and floppy. At this point, switch the machine's network cable to connect to the internal network (rather than the external). Ensure that the machine is referenced correctly in the master's /etc/dhcpd.conf file. Remove the CD and floppy, and reboot the machine.
  9. It's also important to note two log files in the /tmp directory (install.log and postinstall.log). These files were created during the installation process. If you've recieved an error during the kickstart process, check these files for more information.
  10. There are a series of slave tests and a template file for testing slave functionality below.

Creating the cloning CD

  1. Log into the newly configured machine as root. A compressed cpio archive of the node filesystem is located in the /data directory. The file is called node.cpio.gz. This file will have to be moved to both Medusa and hypatia, which has a cd burner.
    While still logged in as root on the newly configured slave in question, do a cp /data/node.cpio.gz /net/m001/root/clone_image/node.cpio.gz .
  2. Also contained in the /root/clone_image directory on Medusa is a file called CLONE_CD_VERSION. This file contains the current version number of the clone image(ie. 1.19). You will have to update this file by editing the file and changing the version number. Both CLONE_CD_VERSION and node.cpio.gz will have to be moved to hypatia for the creation of a cd.

  3. Logon as root to hypatia and download the clone development tree from the medusa CVS archive with:
           cvs -d :pserver:anonymous@gravity.phys.uwm.edu:/usr/local/cvs/medusa
           cvs -d :pserver:anonymous@gravity.phys.uwm.edu:/usr/local/cvs/medusa
             co clone
    The password for user anonymous is medusa.
  4. Enter the directory thus created ( cd clone), and download or move the node.cpio.gz root filesystem archive and CLONE_CD_VERSION into this directory.
  5. For new hardware only: First, extract the initial ramdisk filesystem image to a directory tree initrd/ by typing make rdfs. You may then have to replace or edit certain files: For major changes in system architecture, you may have to replace executables and libraries in initrd/sbin and initrd/lib, and touch or modify the C source code in src as well.
  6. When everything is set, just type make to create the bootable ISO9660 image clonecd.iso.
  7. Place a blank (re)writable CD in the CD writer. To burn clonecd.iso onto a CD, consult the documentation for your CD writer. If it is a SCSI device, or an ATAPI-compliant IDE device and your kernel has ide-scsi emulation turned on, you can use the cdrecord utility: cdrecord -v dev=Id,Lun speed=speed clonecd.iso where the SCSI device Id and Lun can be obtained by cat /proc/scsi/scsi, and speed is the writing speed of your CD writer (e.g. 4 for 4 audio speed). You can also record over a rewritable CD by adding the flag blank=fast to the cdrecord command.
For creating a new clone CD on hypatia, use the following:
cdrecord -v dev=0,0 speed=4 clonecd.iso

Rebuilding a node from a cloning CD

  1. Make sure your BIOS is set up properly; see here for details.
  2. If the BIOS is suitably configured, building a node from an existing cloning CD is easy, and doesn't even require a keyboard and monitor (unless you are debugging the installation process). Simply insert the CD in the drive and reboot the node; everything else is automatic. When the build is complete, the CD tray should eject; just remove the CD. The cloning program will check the state of the cdrom every 30 seconds and power down when the CD is removed, or you can close the tray and cycle the power yourself.

    If something goes wrong during the cloning process, the cloning script will beep at you, print an error message, and drop you into a shell, from which you can try to debug the problem. Obviously this will require a keyboard and monitor. During the cloning process, data is being written to a file called /tmp/clone.log. This is a good place to start the debugging process.

How it works

The cloning CD has two vital components to it: a compressed cpio archive node.cpio.gz of the node filesystem to be cloned, and a bootloader directory isolinux. (It also has a src directory containing sources for the custom cloning scripts, but these are not used directly.) ISOlinux is a syslinux-based bootloader that is designed to boot off of an ISO9660 filesystem. Its directory contains the bootloader executable isolinux.bin, several configuration files, a linux kernel vmlinuz compiled with initial ramdisk (initrd) support, and an image of the root filesystem initrd.img that will be loaded onto the initial ramdisk. When the machine boots, the following takes place:

  • The BIOS recognizes the CD as being bootable, and executes the bootloader isolinux.bin.
  • isolinux.bin starts the kernel vmlinuz, pointing it to the image initrd.img to load as its root filesystem.
  • The kernel starts /sbin/init, which performs some generic system setup. It:
    1. mounts the proc and cdrom filesystems, and
    2. calles the script /sbin/clone.sh.
  • The /sbin/clone.sh script does most of the work of cloning. It:
    1. calls hdparm to improve transfer speeds with the hard drive and cdrom drive,
    2. runs a series of checks against the hard drive to test partition table and file system integrity. if the tests are successful, the /data partition is preserved,
    3. makes swap space or ext2 filesystems on each partition, as appropriate,
    4. mounts the hard drive's root partition and unpacks node.cpio.gz from the cdrom drive into it, and
    5. runs lilo on the hard drive.
  • Control then returns to /sbin/init, which unmounts all filesystems, and ejects the CD.

Once this is done, the init program will periodically check whether the CD has been removed from the drive, and will shut down if it has. Of course since all important filesystems have been unmounted, there's nothing to stop you from closing the drive and performing a hard powerdown.

Reading/writing disk partition tables

During the cloning process, reading and writing the partition tables becomes important to test drive and file system integrity. sfdisk is used for this purpose. The following command will dump the partitions of a device to a file:
sfdisk -d /dev/hda > /tmp/hda.out

This information (/tmp/hda.out) can then be used to compare a known model of a drive, with the drive during the cloning process. If the drive passes a series of tests, then the /data partition can be preserved.

Note on node.cpio.gz

The cpio archive was created during the kickstart process by the following commands:

  cd /
  /usr/bin/find . -xdev | cpio -o --format=crc > /data/node.cpio
  /bin/gzip /data/node.cpio
This clones only the contents of the root partition, which should be all that you want. There should be no need to do this if you kickstarted the node following the instructions

Cloning a slave over the network

Making a slave reclone itself over the network is a simple process. There should be two files on medusa at /root/clone_image/node.cpio.gz and /root/clone_image/CLONE_CD_VERSION . The file /root/clone_image/CLONE_CD_VERSION will tell you what the version is of the latest filesystem image, assuming that it was properly updated by the last person making the current filesystem image.

To reclone a specific slave, log into the master and issue the following commands(where XXX is the number of the slave you wish to rebuild):

rsh sXXX "/sbin/lilo -R ramclone"
rsh sXXX "shutdown -r now"
This will reboot the slave, and upon reboot, select the "ramclone" lilo image. This lilo image will start the network cloning process. During this time, it will be transferring a large file to itself from medusa. As a result, it will take a few minutes. In a pinch, you can loop these commands. HOWEVER, be warned that if you reboot too many machines at once, communication on medusa's internal network will slow down tremendously. If you want to reclone many machines over the network at once, please use the reclone_all.bash script described below, which has the right delays built into it to avoid any problems.

In the CVS repository, under the installation-tools/ directory, there is a script called reclone_all.bash. This script can be checked out and used to reclone a range of nodes. The snytax for its usage is reclone_all.bash lowest-slave highest-slave

How it works

The cloning process has two vital components to it: a compressed cpio archive node.cpio.gz of the slave filesystem to be cloned, and a ramdisk bootloader. When the machine boots, the following takes place:

  • The kernel loads, and sets up a root filesystem on a ramdisk.
  • The kernel starts /sbin/init, which performs some generic system setup. It:
    1. mounts the proc filesystem, and
    2. calles the script /sbin/clone.sh.
  • The /sbin/clone.sh script does most of the work of cloning. It:
    1. calls hdparm to improve transfer speeds with the hard drive and cdrom drive,
    2. runs a series of checks against the hard drive to test partition table and file system integrity. if the tests are successful, the /data partition is preserved,
    3. makes swap space or ext2 filesystems on each partition, as appropriate,
    4. mounts the /root/clone_image directory from medusa and transfers node.cpio.gz to the local machine
    5. unpacks node.cpio.gz from the local machine onto the hard drive
    6. runs lilo on the hard drive.
  • Control then returns to /sbin/init, which unmounts all filesystems and reboots the machine.

Slave Testing

Slave Testing Template should be used as a base when doing rigorous testing of clone images.

Previous slave tests for 6.2 system:

Slave tests for 7.3 system:
Slave tests for 9 system:
Slave tests for Fedora Core 3 systems:

Check this page for dead links, sloppy HTML, or a bad style sheet; or strip it for printing.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.