| etc/passwd |
This file was modified by adding our users and the other standard password
information. Note that we ensure that this file is identical on the master
and the other nodes. |
| etc/group |
This file was modified by adding our standard groups. As with
the password file, it is identical on the master and the other nodes. |
| etc/hosts |
This file has a list of all the IP numbers assigned to the machines
in our private network. Because our subnet is a private one, there
are NO nameservers anywhere that can translate the symbolic hostnames like
n030 into IP numbers. We choose not to run the nameserver daemon
named to simplify our systems. Notice that the entry for the master
node n001 differs slightly from that of the other nodes because it has
an additional name (beowulf.phys.uwm.edu) which is the name by which it
is known to the outside world. There is also an entry that appears
for our networking switch (switch1) which has a web/telnet interface and
can be accessed over the web. |
| etc/hosts.allow |
This file gives our "software development environment" machines (chandra
.. weyl) access to all of the internet services such as FTP and rlogin
on the master, or anything else which is started by the inet daemon.
It ALSO gives access to these services to ANY machine on our private network,
ie any machine which matches network address 192.168.2.0 / netmask 255.255.255.0.
The boodpd 0.0.0.0 line appears necessary so that the boodp server which
runs on this machine (explained below) can be accessed by any other node.
Note: this entry may be wrong or unecessary (we're not sure!). |
| etc/hosts.deny |
This file works in conjunction with etc/hosts.allow to define which
machines have access to our internet services. In this case, we are
denying access to all machines except those listed in the hosts.allow table
(ie, we are taking the conservative, xenophobic approach!). |
| etc/host.conf |
This is the standard file produced by a RedHat installation.
It is different on the non-master nodes. It tells the master to resolve
its IP addresses first from the /etc/hosts table if possible, and then
if not, through a nameserver. In particular, the master node has access
to Domain Name Service through its gateway to the outside world, so it
is allowed (in the bind option) to use this to attempt to resolve IP addresses
that are not listed in the etc/hosts file. |
| etc/hosts.equiv |
This file permits users from any of the listed machines, including
all the nodes, to have instant access to the master through rlogin, rsh,
and other commands without having to give a login password. This
is particularly important for running mpi and mpich code which uses rsh
to start processes on different machines. |
| etc/resolv.conf |
This is the standard file produced by a RedHat installation.
It lists the two nameservers that are available to the master through its
gateway connection to the outside world. It also specifies the alternate
domain name to search for a hostname with the "search" line. In particular,
if the DNS server fails to return an IP address for a given name, the phys.uwm.edu
domain will be appended to that name, and the DNS server will be asked
to try again to resolve that new, longer name. Note that this file should
be removed from all the other nodes, to prevent them from even THINKING
about getting Domain Name Services to resolve unknown addresses (ie, those
not listed in the etc/hosts file of these other nodes). If they do
attempt such resolution, they will hang for minutes at a time. |
| etc/sysconfig/network |
This is standard file produced the a RedHat installation. It's
what would be produced if there were only a single ethernet card and it
were connected to the 129.89.57.* network. In particular it specifies
that the eth0 device functions as a gateway device. |
| etc/sysconfig/network-scripts/ifcfg-eth0 |
This is a standard file produced by a normal linux installation, and
is used by ifconfig to configure the eth0 100-base T card network interface.
It assigns an IP address and a netmask and network to the interface, and
specifies that the inteface should be turned on at bootup. It also
tells the master where broadcast packets to the "outside world" should
go. It does not tell the card how to broadcast packets to the private network,
since this ethernet card is not connected to that private network.
WARNING: in experimenting with different ethernet card configurations,
DO NOT create files of the form ifcfg-eth0.SAVE or anything of the form
ifcfg-*. Such files will be read by the startup scripts and used
to configure your interfaces! Use the form SAVE.*. |
| etc/sysconfig/network-scripts/ifcfg-eth1 |
This is a file that we added to "turn on" the other network card, and
assigns an IP address to it. This other network card is on the "private"
subnet and does not act as a gateway. |
| etc/sysconfig/network-scripts/ifcfg-lo |
Standard loopback device file produced by a normal RedHat installation. |
| etc/fstab |
Local and NFS mounted disks and partitions. The local file systems
include:
-
/dos is DOS disk partition (the only type that can be recognized by the
Windows-NT firmware to run linload.exe)
-
/home is large disk for user home directories that is exported to all the
other nodes
-
/data is large disk for data that is exported to the other nodes
-
/nfsr stands for NFS-root. It stores a complete copy of the
file systems for any of the "slave" nodes, and they boot from it as part
of the automated cloning process.
-
/usr/local is used for any locally installed software. It is
also exported to all the nodes.
|
| etc/exports |
This is a list of the file systems that are exported from the master
node to all of the other nodes. In particular:
-
/nfsr is the nsf-root file system: it is exported so that whenever we want
to clone a node, it can mount this as its root file system and then work
comfortably on formatting and doing other things to its own disk.
-
home is the directory for user's files.
-
/data is the directory for data files.
-
/usr/local is the directory for locally installed software. To simplify
maintenance this lives on a single partition on the master.
All of these files are exported with the no_root_squash option, which
enables root on any node to have all the usual disk read/write priviledges
associated with being root. |
| etc/rc.d/rc3.d/S99rdate |
This script has been added. It uses the rdate command to set
the time on the master at bootup, obtaining the correct time from a trusted
host on the net. |
| etc/rc.d/rc.local |
Lines have been added which (1) start the bootp server (needed to clone
nodes) and which (2) start the time daemon to synchronize times on all
the nodes with the time on the master node. |
| etc/bootptab |
This lists the hardware ethernet address of every machine. That
way, during the cloning process, a node can determine or discover its name. |
| etc/ethers |
This file is almost certainly not needed! |
| etc/inetd.conf |
I think the only modification here was to enable the bootp server,
but this didn't work properly and we decided to run it in the background
from an init script. |
| etc/pam.d/login |
Modified so that root can login from non-console locations. |
| etc/pam.d/rlogin |
Ditto from rlogin |
| etc/pam.d/rexec |
Ditto for rexec |
| nfsr/boot/vmlinux.gz |
We've created a directory called /nfsr which means "Network File System
-- Root" on the master. It's a copy of the entire file system of
a single node, and is used for cloning, as the root file system for network
booting. |
| nfsr/boot/ptable |
This is a copy of the partition table of the disk that we intend to
clone. It's created by putting the first 512 bytes of the raw disk
device into a file, with the command
dd ibs=512 count=1 if=/dev/sda
of=ptable
|
| nfsr/sbin/init.normal |
This is a copy of /sbin/init, which is the normal init program that
runs on bootup. |
| nfsr/sbin/init.cloning |
This is a special script used in cloning. It contains the
sequence of commands executed by the clone to copy its files from the /nfsr
partition of the master. |
| nfsr/sbin/init |
A copy of one of the two previous files. |
| nfsr/mnt/root |
Create this directory (mount point). It will be used in cloning
the nodes, as the mounting point the for machine's hard disk. |
| root/.rhosts |
A list of all the beowulf nodes, placed in root's home directory, allows
root login without a password from any node onto the master. |
| var/spool/cron/root |
This uses the S99rdate script described above to update the
time every 12 hours on the master node, using a local machine as the trusted
time-source |
| sbin/reread.c |
This is a simple program which uses an ioctl() call to force the disk
controller to write the partition table at the start of the disk (rather
than simply leaving it cached in memory). This program is used in
the cloning script sbin/init.cloning above. |
| sbin/reread |
Executable for previous program. |