Does sensors report the memory and the voltages/temperature correctly?
No alarms shown? PASS
Are there any error or warning messages on boot up? Any errors or
warnings in /var/log/messages
On boot up:
could not init font path
we do not see this error on either of our nodes
starting NFS file locking services gives no report (no [OK] or [FAIL]).
But this may be normal & OK - I'm not sure.
This is normal. It is just an echo. Look at /etc/rc.d/init.d/nfslock
In /var/log/messages:
I have a kernel: cdrom: open failed but this may be because I have two
kernel: end_request I/O error on floppy
This is the kernel probing the floppy and cd drives. The error is returned
because there is no disk in either drive. It is normal and can be ignored.
Now we have removed the cdrom and floppy from from the fstab the error is
gone.
Two syscntl errors on unknown keys
Fixed. Removed the offending lines in /etc/sysctl.conf
Automount fails to mount /mnt/floppy and /mnt/cdrom; you may want to remove
these from "mounted at boot" of /etc/fstab (but they don't seem to be mounted
at boot. Is this a stale mtab?)
Automount works. Neither the CD nor floppy are mounted at boot.
Are you using automount correctly? To automount a floppy, type cd
/mnt/floppy. To unmount the floppy umount /mnt/floppy or wait 5
minutes. To automount a CD, type cd /mnt/cdrom. To unmount the
floppy umount /mnt/cdrom or wait 5 minutes. Make sure your CD is in
the right drive.
You never need to use the mount command.
The entries in fstab for cdrom and floppy are correct, but redundant. They are
now removed.
Do we really want to start xfs?
Yes, if you want to use X windows. Turn it off and you can't start X.
Modprobe complains about missing char-major-81 in /dev
Kevin tracked this down to the gnome window manager. Never run gnome and the
problem never appears. As soon as you run gnome it appears.
Lots of errors from floppy. Suspect it is because we keep trying
it mount it.
Probably trying to mount a disk when there is no floppy in the drive.
Lots of error messages from Gnome nameserver
Gnome is brain dead. Don't use it. I will change the default window manager to
something more sensible.
Does the /data directory have reduced # of inodes and reserved space for
root? PASS.
Do /boot, /lib/modules contain vestiges of old kernels, etc?
There is both vmlinux-2.4.5-1medusa and vmlinuz...
in /boot but /etc/lilo.conf only references the latter not the former.
This is correct. They are both installed by the kernel rpm. vmlinux is an
uncompressed copy of vmlinuz which is needed if you want to look at the kernel
object code (for example to run nm on the kernel). This should not be removed
as it is not possible to gzcat the compressed kernel due to the executable
code at the start.
Also, /etc/fstab has both /dev/cdrom and /dev/cdrom1
These are created by the redhat installer as it found two cdrom drives in your
machine. These should now be removed by the post install script.
Is it right to have swap on /dev/hda5? Is this faster than on /dev/hda1?
Yes it is. It should make no difference.
Does X run properly using startx as root? This should work on a generic
monitor such as a Viewtronics 771.
This failed for me. startx as /root brought
up a version of GNOME with empty windows. I fixed this by first
getting a decent XF86Config from the slaveconfig_old directory then doing
wmaker.inst.
Please READ the notes section of the web page. This is documented there.
You do not need to get a new XF86Config, you just need to stop autofs, as
documented in the instructions.
When the flat-panel monitors that I have ordered arrive, it should
also work on them.
Do gcc and ddd work correctly?
When I tried these, they did work, but starting ddd and gcc took a minute --
almost like a system hang. It was solved for me by turninng off autofs. But I
don't understand why this worked.
Again, this is in the notes section of the web page. You are trying to
automount directories (typically /usr/local) to which you don't have access to
from the AEI. It works fine in Milwaukee on our network. This should not be
changed.
Does networking run properly using dhcp? Is it running full duplex
100baseT?
I don't know how to verify that it is full not half duplex
PASS.
Plug it into a switch that has a full/half duplex indicator and it shows
the connection is at full duplex. Also, if the port is running at half
duplex the driver reports half duplex (but not the other way round, for some
reason). I found that the ethernet driver was complied into the kernel. It
should be the a module so that it can be loaded/unloaded and make it easy to
have multiple ethernet cards.
Does the system shut down and power off with shutdown -h now? PASS.
Does the system reboot with shutdown -r now PASS.
After power-down, if AC power is cycled, does system remain off? PASS.
If a running system is unplugged, then plugged in, does it remain off? PASS
Can that system now be powered on from another machine using etherwake?
Does it fsck and boot up correctly?
I can't test this -- could someone please test. PASS.
Kevin has tested this.
When plugged into the UPS with a serial line, does the UPS properly shut down
the machine when its battery gets low? If power to the UPS is then restored,
does the node remain turned off?
I can't test this -- could someone please test.
In progress
Are there any files with dates in the future? PASS.
Is NTP running correctly? Is the hardware clock synchronized with the
software clock after the software clock has had its time synched with an ntp
server?
I didn't know how to check this -- could someone please tell me? PASS.
Look in /var/log/messages for messages from xntpd. Look for a non-zero
entry in the /etc/ntp/drift file. Set the hardware clock in the bios to
something incorrect (like last week). Boot up. Check the BIOS clock again. It
has been set correctly to GMT. This might not work if you are off the UWM
subnet. It depends how tight they are on securing the ntp servers. It
definately works here.
Is there a running script that calls vga_screenoff 10 minutes after keyboard
input stops, and then calls vga_screenon when input starts again?
Missing
I believe you are working on this
Does running drag 1.2 show > 290 Mflops provided the screen is blanked
by the above?
No -- running drag 1.2 gives xxx because screen blanking is
missing.
I believe you are working on this
Does hdparm -tT /dev/hda report good speeds (~120 and ~28 MB/s)? PASS.
Does hdparm -tT /dev/hdd show good results?
This should be set once we see what device we will get from
ACE computer.
In progress.
Is automount on the slave configured so that cd /mnt/floppy and cd /mnt/cdrom
work correctly if a floppy or cd are present?
I think that the distributed /etc/mtab might show that the floppy is already
mounted -- this worked for me the second time but not the first time.
Automounting the cdrom failed -- though I don't see what's wrong in auto.mnt,
if anything. PASS.
Automount is working fine. It mounts /dev/cdrom which is a symlink to the
correct device set up by the installer (so it works no matter where the cdrom
is). If you have two drives in the machine, make sure the disk is in the right
drive.
Should we install fftw, lam, and mpich on each machine (just to have the
libraries local on each machine to cut down network use)?
I think this makes sense, but the files aren't
there -- at least not where I could see them. If so, are they
there?
We should not install these libraries on the nodes. For the following reasons:
The network overhead of the runtime linker is not that great.
Programs can also be linked -static, so there is no network overhead.
LAM has has been updated a lot recently. One recompile and install on the
server is simpler than all the nodes, although I could make an rpm.
LAM and mpich cannot co-exist on the same system in the same directory
structure (e.g. /usr/local). It is not worth doing separte installs on
the nodes. They can be installed simultaneously with stow. This is what
should be done.
Are big files properly supported? Does the cp /etc/termcap a; cat a a a
a a > b; cat b b b b b > c; etc allow creation of files > 2 GB?
FAILED. But system has both bash and bash2. Shouldn't
we eliminate bash or rename it to bash1 and use bash2 as the default
bash?
No. We should leave things as they are. bash2 is not completely compatible
with bash. Users who want to manipulate file > 2Gb should use chsh or
passwd to change their shell to bash2 and start their scripts with
#!/bin/bash2. /bin/sh and /bin/bash should be left as they are to ensure that
things don't break.
Any opinions, findings, and conclusions or recommendations
expressed in this material are those of the author(s) and do not necessarily
reflect the views of the National Science Foundation.