UWMLSC > Beowulf Systems > Medusa
   Slave Tests CD v1.1

Slave tests CD v1.1

  1. Does sensors report the memory and the voltages/temperature correctly? No alarms shown?
    PASS
  2. Are there any error or warning messages on boot up?  Any errors or warnings in /var/log/messages
    On boot up:
    1. could not init font path
      we do not see this error on either of our nodes
    2. starting NFS file locking services gives no report (no [OK] or [FAIL]).  But this may be normal & OK - I'm not sure.
      This is normal. It is just an echo. Look at /etc/rc.d/init.d/nfslock
    In /var/log/messages:
    1. I have a kernel: cdrom: open failed but this may be because I have two
      kernel: end_request I/O error on floppy
      This is the kernel probing the floppy and cd drives. The error is returned because there is no disk in either drive. It is normal and can be ignored. Now we have removed the cdrom and floppy from from the fstab the error is gone.
    2. Two syscntl errors on unknown keys
      Fixed. Removed the offending lines in /etc/sysctl.conf
    3. Automount fails to mount /mnt/floppy and /mnt/cdrom; you may want to remove these from "mounted at boot" of /etc/fstab (but they don't seem to be mounted at boot.  Is this a stale mtab?)
      Automount works. Neither the CD nor floppy are mounted at boot. Are you using automount correctly? To automount a floppy, type cd /mnt/floppy. To unmount the floppy umount /mnt/floppy or wait 5 minutes. To automount a CD, type cd /mnt/cdrom. To unmount the floppy umount /mnt/cdrom or wait 5 minutes. Make sure your CD is in the right drive.
      You never need to use the mount command.
      The entries in fstab for cdrom and floppy are correct, but redundant. They are now removed.
    4. Do we really want to start xfs?
      Yes, if you want to use X windows. Turn it off and you can't start X.
    5. Modprobe complains about missing char-major-81 in /dev
      Kevin tracked this down to the gnome window manager. Never run gnome and the problem never appears. As soon as you run gnome it appears.
    6. Lots of errors from floppy.  Suspect it is because we keep trying it mount it.
      Probably trying to mount a disk when there is no floppy in the drive.
    7. Lots of error messages from Gnome nameserver
      Gnome is brain dead. Don't use it. I will change the default window manager to something more sensible.
  3. Does the /data directory have reduced # of inodes and reserved space for root?
    PASS.
  4. Do /boot, /lib/modules contain vestiges of old kernels, etc?
      There is both vmlinux-2.4.5-1medusa and vmlinuz... in /boot but /etc/lilo.conf only references the latter not the former.
      This is correct. They are both installed by the kernel rpm. vmlinux is an uncompressed copy of vmlinuz which is needed if you want to look at the kernel object code (for example to run nm on the kernel). This should not be removed as it is not possible to gzcat the compressed kernel due to the executable code at the start.
    1. Also, /etc/fstab has both /dev/cdrom and /dev/cdrom1
      These are created by the redhat installer as it found two cdrom drives in your machine. These should now be removed by the post install script.
    2. Is it right to have swap on /dev/hda5? Is this faster than on /dev/hda1?
      Yes it is. It should make no difference.
  5. Does X run properly using startx as root? This should work on a generic monitor such as a Viewtronics 771.
    1. This failed for me. startx as /root brought up a version of GNOME with empty windows. I fixed this by first getting a decent XF86Config from the slaveconfig_old directory then doing wmaker.inst.
      Please READ the notes section of the web page. This is documented there. You do not need to get a new XF86Config, you just need to stop autofs, as documented in the instructions.
    2. When the flat-panel monitors that I have ordered arrive, it should also work on them.
      The flat panel monitors have a problem with the video card in the micro ATX case. We need to discuss this.
  6. Do gcc and ddd work correctly?
    1. When I tried these, they did work, but starting ddd and gcc took a minute -- almost like a system hang. It was solved for me by turninng off autofs. But I don't understand why this worked.
      Again, this is in the notes section of the web page. You are trying to automount directories (typically /usr/local) to which you don't have access to from the AEI. It works fine in Milwaukee on our network. This should not be changed.
  7. Does networking run properly using dhcp?  Is it running full duplex 100baseT?
    1. I don't know how to verify that it is full not half duplex PASS.
      Plug it into a switch that has a full/half duplex indicator and it shows the connection is at full duplex. Also, if the port is running at half duplex the driver reports half duplex (but not the other way round, for some reason). I found that the ethernet driver was complied into the kernel. It should be the a module so that it can be loaded/unloaded and make it easy to have multiple ethernet cards.
  8. Does the system shut down and power off with shutdown -h now?
    PASS.
  9. Does the system reboot with shutdown -r now
    PASS.
  10. After power-down, if AC power is cycled, does system remain off?
    PASS.
  11. If a running system is unplugged, then plugged in, does it remain off?
    PASS
  12. Can that system now be powered on from another machine using etherwake? Does it fsck and boot up correctly?
    1. I can't test this -- could someone please test.
      PASS.
      Kevin has tested this.
  13. When plugged into the UPS with a serial line, does the UPS properly shut down the machine when its battery gets low? If power to the UPS is then restored, does the node remain turned off?
    1. I can't test this -- could someone please test.
      In progress
  14. Are there any files with dates in the future?
    PASS.
  15. Is NTP running correctly? Is the hardware clock synchronized with the software clock after the software clock has had its time synched with an ntp server?
    1. I didn't know how to check this -- could someone please tell me?
      PASS.
      Look in /var/log/messages for messages from xntpd. Look for a non-zero entry in the /etc/ntp/drift file. Set the hardware clock in the bios to something incorrect (like last week). Boot up. Check the BIOS clock again. It has been set correctly to GMT. This might not work if you are off the UWM subnet. It depends how tight they are on securing the ntp servers. It definately works here.
  16. Is there a running script that calls vga_screenoff 10 minutes after keyboard input stops, and then calls vga_screenon when input starts again?
    1. Missing
      I believe you are working on this
  17. Does running drag 1.2 show > 290 Mflops provided the screen is blanked by the above?
    1. No -- running drag 1.2 gives xxx because screen blanking is missing.
      I believe you are working on this
  18. Does hdparm -tT /dev/hda report good speeds (~120 and ~28 MB/s)?
    PASS.
  19. Does hdparm -tT /dev/hdd show good results?
    1. This should be set once we see what device we will get from ACE computer.
      In progress.
  20. Is automount on the slave configured so that cd /mnt/floppy and cd /mnt/cdrom work correctly if a floppy or cd are present?
    1. I think that the distributed /etc/mtab might show that the floppy is already mounted -- this worked for me the second time but not the first time. Automounting the cdrom failed -- though I don't see what's wrong in auto.mnt, if anything.
      PASS.
      Automount is working fine. It mounts /dev/cdrom which is a symlink to the correct device set up by the installer (so it works no matter where the cdrom is). If you have two drives in the machine, make sure the disk is in the right drive.
  21. Should we install fftw, lam, and mpich on each machine (just to have the libraries local on each machine to cut down network use)?
    I think this makes sense, but the files aren't there -- at least not where I could see them. If so, are they there?
    We should not install these libraries on the nodes. For the following reasons:
    • The network overhead of the runtime linker is not that great.
    • Programs can also be linked -static, so there is no network overhead.
    • LAM has has been updated a lot recently. One recompile and install on the server is simpler than all the nodes, although I could make an rpm.
    • LAM and mpich cannot co-exist on the same system in the same directory structure (e.g. /usr/local). It is not worth doing separte installs on the nodes. They can be installed simultaneously with stow. This is what should be done.
  22. Are big files properly supported? Does the cp /etc/termcap a; cat a a a a a > b; cat b b b b b > c; etc allow creation of files > 2 GB?
    1. FAILED. But system has both bash and bash2. Shouldn't we eliminate bash or rename it to bash1 and use bash2 as the default bash?
      No. We should leave things as they are. bash2 is not completely compatible with bash. Users who want to manipulate file > 2Gb should use chsh or passwd to change their shell to bash2 and start their scripts with #!/bin/bash2. /bin/sh and /bin/bash should be left as they are to ensure that things don't break.

$Id: slavetests-1.1.html,v 1.1 2002/09/19 15:40:43 kflasch Exp $
Check this page for dead links, sloppy HTML, or a bad style sheet; or strip it for printing.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.