How to automatically clone a node (no "human" intervention required)

This document described a simple trick, via which an alpha-based linux system which uses AlphaBIOS can be set up to boot with different arguments each time just by giving a "shutdown -r now" command.  You might ask "why not just issue a shutdown, then when you get back to the MILO> prompt, type in the different arguments?  Well, this would be  OK if you only had a single system, but if you have a cluster of machines (in our case 48, none of which even have monitors or usable keyboards!) and you want to reboot ALL of them in this way, it's nice to have a "hands off" approach.  We use this hands-off approach specifically when re-cloning sytems.

The "old" node cloning process for a system that uses AlphaBIOS to load MILO to load the kernel is cumbersome: put a special boot floppy into the system, then do a soft-reset, choose the correct boot options for cloning, sit around while the kernel loads from the disk, boots from the net, copies files, etc.  Then when it is all over, do another soft-reset, and watch the machine re-boot in its new state.   Thanks to Jay Estabrook of Digital for his help in figuring out how to do this.

What I describe here is a simple way to automate this process so that no human intervention is required.  I am assuming that the slave is running off a kernel that has been loaded in the normal way by milo, from /boot/vmlinux.gz .

For this process to work, you need to build a special "hacked" kernel which is designed to ignore the command line arguments that are normally passed to the kernel by MILO.  When you are ready to clone a node, copy this specially hacked kernel (which we will call vmlinux.gz.clone) over /boot/vmlinux.gz then give the "shutdown -r now" command.  The system will reboot itself, and and load the hacked kernel.  The hacked kernel is hardwaried to ignore the boot arguments passed to it by MILO, and instead to instruct the machine to use an nfs-mounted disk as its root.  When we clone, the file /sbin/init on that disk contains the necessary cloning scripts.

To hack a special version of the kernel, modify the file /usr/src/linux/arch/alpha/kernel/setup.c by adding the line:

#define CLONE_NODE 1

at the beginning of the file (you will set "1" to "0" when you need a regular not hacked kernel) then find the lines that read:

} else {
                strcpy(command_line, COMMAND_LINE);
                strcpy(saved_command_line, COMMAND_LINE);
}
printk("Command line: %s\n", command_line)
 
and modify them to read:

 } else {
                strcpy(command_line, COMMAND_LINE);
                strcpy(saved_command_line, COMMAND_LINE);
                if (CLONE_NODE==1) {
                        strcpy(command_line,
"bootdevice= bootfile= nfsroot=192.168.2.1:/nfsr nfsaddrs=:192.168.2.1::255.255.255.0::eth0:bootp");
                }
        }
 printk("Command line: %s\n", command_line)

Note that there should be NO carriage returns in the line starting "bootdevice= ".  [Widen your web browser window if you don't believe me.] Unless you are working on the UWM beowulf system, you will probably want to change the boot options of your hacked kernel from those that I give above.

Now rebuild the kernel in the usual way, calling it /boot/vmlinux.gz.clone.  To build a kernel, go to /usr/src/linux, become the superuser, and do the following
    make xconfig (choose save and exit) [Warning: see note below]
    make dep
    make clean (skip this step and the previous two if all you have changed since the last kernel build is setup.c)
    make boot
 
After the make has finished, copy /usr/src/linux-2.0.30/arch/alpha/boot into /boot/vmlinux.clone
When you want to re-clone a system, simply type;
    cp /boot/vmlinux.gz.clone /boot/vmlinux.gz
    shutdown -r now
This will cause the system to boot, ignoring the command line arguments used in the previous boot, and using instead the command line arguments hardwired into the clone kernel.  In our case, the cloning process that this initiates eventually replaces the kernel with a normal one, and terminates with another
     shutdown -rn now
The system then reboots normally with a properly-cloned kernel.

 



Here is the script that we use for cloning (normally kept in /home/install/auto-clone on the master node):
 
#!/bin/tcsh
echo "Preparing to re-clone node" $1
rsh $1 'cp -f /boot/vmlinux.gz /boot/vmlinux.gz.normal'
rsh $1 'cp -f /home/vmlinux.gz.clone /boot/vmlinux.gz.clone'
rsh $1 'cp -f /boot/vmlinux.gz.clone /boot/vmlinux.gz'
rsh $1 '/sbin/shutdown -r now &'
echo "Node cloning has been initiated..."

 

Jay Estabrook comments about "make xconfig":
The only quibble with the write-up is that "make xconfig" does not always work, especially for patched ernels. Same happens with "menuconfig", from a number of reports, so I only recommend the straight make config" whenever I'm asked. YMMV, of course...