UWMLSC > Beowulf Systems > Medusa
   Slave tests CD 1.3


  • Does sensors report the memory and the voltages/temperature correctly? No alarms shown?

  • PASS
  • Are there any error or warning messages on bootup?  Any errors or warnings in /var/log/messages

  • In /var/log/messages:
    Jun 19 01:38:09 localhost xfs: ignoring font path element /usr/share/fonts/default/TrueType (unreadable)

    In /var/log/dmesg:
    CPU serial number disabled.
    This is enabled in bios,but probably disabled in kernel config.  It should be turned on in the kernel config: will make it easier to keep track of CPU serial numbers etc.

  • Does the /data directory have reduced # of inodes and reserved space for root?

  • PASS
  • Do /boot, /lib/modules contain vestiges of old kernels, etc?
    In /boot, can't we eliminate the bootblocks boot.b, chain.b,and os2_d.b?
  • Is /var/lib/games full of stuff?
  • Does X run properly using startx as root?

  • I would prefer seeing windowmaker rather than Gnome as the default window system for root. 
    Current default for first root login is Gnome.  How abour running wmaker.inst in the post-install script?
  • Do gcc and ddd work correctly?

  • PASS
  • Does networking run properly using dhcp?  Is it running full duplex 100baseT?

  • PASS
  • Does the system shut down and power off with shutdown -h now?

  • PASS
  • Does the system reboot with shutdown -r now

  • PASS
  • After power-down, if AC power is cycled, does system remain off?

  • PASS
  • If a running system is unplugged, then plugged in, does it remain off?

  • PASS
  • Can that system now be powered on from another machine using etherwake?  Does it fsck and boot up correctly?

  • I can't test this.  It should be tested for the final cloning CD.
  • When plugged into the UPS with a serial line, does the UPS properly shut down the machine when its battery gets low?

  • If power to the UPS is then restored, does the node remain turned off?
    David, when the final cloning CD is ready, you should test this again.
  • Are there any files with dates in the future?

  • Untested
  • Is NTP running correctly?
  • Is the hardware clock synchronized with the software clock after the software clock has had its time synched with an ntp server?

  • PASS
    I think we should have a crontab file that periodicall sychronizes the hardware clock from the software clock.  If the system is up and running for 90 days, then a node goes down, when it reboots the system time is obtained from the hardware clock.  If it has drifted ten minutes ahead of the software clock, this can cause strange behavior because some files and processes will have times ten minutes in the future.
  • Is there a running script that calls vga_screenoff 10 minutes after keyboard input stops, and then calls vga_screenon when input starts again?
  • Does running drag 1.2 show > 290 Mflops provided the screen is blanked by the above?

  • PASS
  • Does /hdparm -tT /dev/hda report good speeds (~120 and ~28 MB/s)?

  • PASS
  • Does /hdparm /dev/hdd report jazzed up parameters  for CDROM?

  • FAIL -- be sure to set parameters that work on the ACE machine cdrom.
  • Is automount on the slave configured so that cd /mnt/floppy and cd /mnt/cdrom work correctly if a floppy or cd are present?

  • PASS. Rename floppy_ext2 to floppy in /etc/auto.mnt
    Does this work with both ext2 and msdos floppies?
  • Should we install fftw, lam, and mpich on each machine (just to have the libraries local on each machine to cut down network use)?

  • PASS
  • Does man -k work?
  • Is there a decent /root/.netscape file with some sensible bookmarks?
  • Does /proc/fans/ exist and contain sensible information?
    FAIL.  The fan.o module needs to be loaded in rc.local or elsewhere using insmod fan.o
  • Are big files properly supported? Does the cp /etc/termcap a; cat a a a a a > b; cat b b b b b > c; etc allow creation of files > 2 GB?

  • PASS using bash2

Here are a set of tests that the cloned node should pass when plugged into the master via a switch

  • Correctly gets its identity (eg, n012) using dhcp from the master on bootup.
  • Does automount work?

  • On the master, does cd /net/n012 work properly (with whatever the correct node # is).
    Does cd /net/n001 on the slave properly mount the master?
    Does cp file /net/n012/ work correctly?
  • When connected to the master via a switch, does an mpi job run properly?
  • Can root on the master log into a node with just rsh n012 (no password needed)?
  • Does rsh work correctly, eg does rsh n012 uptime correctly return the uptime on n012 from n001
  • Do the above automount and rsh tests work correctly between two slave nodes?
  • Are warning messages from the logging daemon correctly logged to the master?
Check this page for dead links, sloppy HTML, or a bad style sheet; or strip it for printing.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.