UWMLSC > Beowulf Systems > Nemo
   Note: this is not an official bid specification. The official bid specification may be obtained from the URL given in this notice:

Agency Bid Number: E6-016-O -- Bid Title: Scientific Microcomputer
Cluster Running The Linux Operating System
Dear Vendor:
The above Official Sealed Bid is being let in your commodity area. To obtain additional information, login to VendorNet (http://vendornet.state.wi.us/) and click on SEARCH. Select "Official Sealed Bids". Next select "By agency bid number", enter the agency bid number in the text box as the keyword, and click on "Search Vendornet". The system will locate the bid announcement for your review.

The official bid specification may also be obtained from the UWM Computer Equipment Purchasing Officer, Mr. Ed Seeberg, email: ELS@bfs.uwm.edu, tel: 414-229-4077.

NEMO Node and Rack Specifications


This is a bid specification for a scientific computing cluster running
the Linux operating system, to consist of:

[1] rack-mounted dual-processor cluster nodes
[2] equipment racks

The total amount paid for all the equipment listed above will be

Bidders must respond by indicating the TOTAL NUMBER OF COMPUTE NODES
which will be provided.  Additional technical information must also be
provided as indicated below.

[Note: The entire system will be placed in an existing dedicated
1400-square foot cluster room at UW - Milwaukee.  This room contains
one dedicated Powerware 9315 500kVA/400kW central UPS system and two
dedicated Liebert 225 kVA power distribution units.  The room is
equipped with four dedicated 26 ton Data-Aire down-flow AC units and
an 18-inch raised floor air distribution system. A floor plan is
included.  A Cisco 6500 or Force10 E600/E1200 based ethernet switch
and suitable cabling will be provided by the purchaser.]


[1] Compute node specifications

Each node will be a dual-CPU AMD Opteron machine with two (single
core) AMD Opteron 248 processors (versions with DDR400 support,
latest E stepping, 1 GHz HyperTransport clock).  Each
node must:

- have a motherboard that accommodates dual core CPUs for a potential
  future upgrade.

- have a motherboard which is certified by AMD for use with AMD
  Opteron processors up to and including model 252 and dual-core
  processors up to and including model 275.

- include an on-board graphics controller

- include 4 GB of registered ECC memory, PC3200/DDR400 or faster.
  Memory must be from a first tier memory manufacturer such as
  Corsair, Kingston, Mushkin, Infineon, Dataram, Samsung, Viking,
  Micron, etc.  Memory must be certified as compatible by motherboard
  manufacturer.  Specify memory manufacturer.

- have maximum memory interleave, so memory must occupy at least two
  memory slots per CPU.

- be in a 1U or 2U or 3U rack mount case with slide rails for pull-out
  access.  Case should include a 120 VAC power-supply cord.  Case size
  should be chosen by vendor to minimize cost while providing good
  cooling airflow and easy access for maintenance. If slide rails are
  "full extension" and allow 100% access even in a fully
  stuffed rack, this is an advantage and should be noted.

- have a power supply with sufficient capacity to power all components
  when both CPUs are at 100% and all disk drives and other components
  are fully active.  Power supply must also have sufficient capacity
  to handle dual core CPUs. In all situations the power supply should
  not be running at more than 80% of its rated power output. Power
  supply must have a power factor greater than or equal to 0.85 at
  nominal and maximum load. Vendor should specify Mean Time Between
  Failure (MTBF) of power supply at nominal load and 75°F operating

- one or more gigabit Ethernet ports supporting jumbo frames (at least
  6kB in length, though 9kB is preferable) and compatible with the
  purchaser-provided Cisco 6500 or Force10 E600/1200 series network
  switch.  At least one Ethernet port must be capable of full-duplex
  wire-speed operation.  Systems must be capable of PXE booting and
  kickstart cloning using standard Linux tools.  Ethernet port must
  support Wake-On-LAN.

- sufficient internal cooling fans to maintain all component
  temperatures well below manufacturers operating limits, when in a
  75°F room. Cooling fans and airflow must not be blocked or
  impeded by internal cabling.  Vendor should specify MTBF of system
  internal cooling fans.

- support the following monitoring from within the Linux 2.6 kernel
  (eg, using lm_sensors):
  - Rotation speeds of all internal cooling fans
  - System and CPU temperatures
  - Power supply and CPU voltages

- have an integrated or add-on remote management card (also called a 
  Baseline Management Controller or BMC) that supports
  (at least) the following remote management functions via an Ethernet
  LAN interface:
  (a) remote power off
  (b) remote power on
  (c) remote system (re)boot
  (d) remote motherboard bios setting
  (e) remote motherboard bios upgrade/update/flashing
  (f) viewing serial console boot and runtime input/output from a
      remote management location

  The above functions must be supported even in the absence of any
  operating system on the nodes. Functions a-d must be supported using
  command-line scripts from a Linux management environment.  Vendor
  must be prepared to demonstrate correct operation of these

  Function (e) may be unsupported, as long as the vendor provides an
  alternative simple AUTOMATED OR SCRIPTED HANDS-OFF procedure for
  upgrading/flashing/setting BIOS.

  Please specify if the management card/hardware is also capable of
  hardware monitoring, including system temperature, CPU temperature,
  fan rotation speeds, power supply voltages (in the absence of an
  Operating System).

  Vendor should specify if management card/hardware requires a
  separate LAN or can piggy-back off the same LAN and Ethernet port
  used for data traffic WITHOUT impacting data throughput rate from
  the shared port. Piggy-backing is preferable.  If piggy-backing is
  NOT possible without a performance loss then the system price must
  also include a low-cost low-performance oversubscribed management
  network.  For example this might consist of one 24-port ethernet
  switch per rack, and a central 24-port switch concentrating these
  together, and network cables to tie these together.

- be delivered with BIOS settings (motherboard, RAID controller)
  and Baseboard Management Controller/IPMI settings as specified by

- have four front-panel accessible SATA 250GB disks in hot-swap
  carriers.  When mounted in carriers, these disks must be
  hot-swappable without opening the case or using any tools.  SATA
  disks must be (at least) 7200 RPM disks with (at least) 8MB cache
  memory.  They must be certified (drive model and firmware) for use
  with the hardware RAID controller.  Preference may be given to 'RAID
  Edition' type drives whose firmware is designed to carry out timely
  and agressive sector reallocation for UNC (Uncorrectable) sector

- have a hardware RAID controller.  This must offer the following

  (a) Fully supported by manufacturer in the Linux 2.6 kernel tree

  (b) RAID-5 on the four system disks, to yield approximately 750 GB
      of usable storage space.

  (c) Linux command-line tools for management and monitoring of the
      RAID array.  These must allow RAID arrays to be configured and
      rebuilt, and provide automatic notification of failed disks via
      email or a similar mechanism.

  (d) Capable of at least 30 MB/sec writes and 60 MB/sec reads
      (sequential block access on 8GB files) using a Reiser file
      system, as measured by Bonnie++ with a 64kB or smaller stripe
      size (details below).

  (e) System must be able to boot from the RAID-5 array, and carry on
      normal OS operation during any single disk failure. System must
      be able to automatically rebuild redundant RAID array during
      normal OS operation.

  (f) If system fails to read a block of data at some LBA (UNC error)
      from disk A, the system will read the corresponding block of
      data from one of the redundant disks (B, C, or D) and then WRITE
      the corresponding data to the failing LBA of disk A to force
      sector reallocation on disk A.

   Acceptable RAID controllers include the 3ware 9500S-LP and Areca
   ARC-1110.  3ware 9550 controllers are also be acceptable (but note
   that they are often a poor fit in typical 1U server chassis).

   Speed benchmarks will be obtained using Bonnie++ running under
   Linux using a recent 2.6 kernel.  Preference will be given to
   systems with faster read/write performance and to systems with a
   lower overall cost.  Benchmark should be run in RAID-5 mode with
   write through caching enabled and a 64kB or smaller stripe size.
   Vendors who wish to provide this data should give output from
   Bonnie++ v 1.03 with the command line:
         bonnie++ -s 8192:64K -x 5 -d /REISERFS/ -fu root

   Ability to monitor disk SMART data and run disk self-tests with
   Linux tools (eg, smartmontools) is desirable and should be noted.

   Please indicate if the controller/backplane/hot-swap-bay combo
   provides functioning disk power/activity/failure indicator lights.

- be clearly labeled on both the front and back with large legible
  consecutive labels of the form S001, S002, S012, S321, .... .

- be clearly labeled on the back with the MAC addresses of all
  ethernet ports (motherboard and BMC).

- have hot swap carriers clearly labeled on both the front and back
  with large legible consecutive labels of the form S001/P0, S001/P1,
  S001/P2, S001/P3 and so on.  Here P0-3 indicate the RAID controller
  port, and should correspond to the OS identifier for the appropriate
  controller port.  On each node, RAID controller port numbers must be
  identically numbered.  The numbers must increase either from left to
  right (prefered) or from right to left.  Thus, looking at node 123,
  the drive carriers and node should be labeled like this:

   | S123/P0 | S123/P1 | S123/P2 | S123/P3 |    S123   |  

- have all labeling done in a permanent way.  Labels should not peel
  off, fall off, or degrade after a few years of normal use.

- shut down cleanly (with 'shutdown -hf' or similar) to standby power

- power up using 'wakeon lan'.

- contain identical parts, including identical firmware
  revisions/versions and board-level hardware revision numbers.

- contain manufacturers latest versions of BIOS and/or firmware.

- contain all components cleanly within the case, without interference
  or crowding. For example the RAID controller card and/or cabling
  MUST NOT 'bump into' the CPU heatsinks.

  NOTE: our cluster room (details above) includes generous amounts of 
  space, as well as clean UPS backed power and plenty of cooling.
  We are not interested in getting the highest possible density of nodes.
  Vendor must respect the maximum recommended rack density for the
  systems and should not fully load the racks unless the systems are
  fully qualified in that configuration.  For example the Vendor may
  choose to provide only 20 nodes per 42U rack.


[2] Equipment racks

Nodes must be delivered to UW-Milwaukee in 42U equipment racks with a
total weight not exceeding 1250 pounds per rack.  Unused rack openings
should be covered with blanking plates to maintain proper
front-to-back cooling airflow.  Racks should not have doors,
side panels, fans, or other un-neccesary items which add expense.

Rack should include internal clips, guides, tracks or similar means to
neatly bundle and support network and power cables.

Inexpensive power distribution must be provided within the racks using
(for example) Wiremold, Belkin, APC, Tripp Lite, or similar power
strips.  Power available is 120 VAC with standard 20-amp outlets. 
There are approximately 250 such outlets in the room.

Each rack should have a suitable number of 120 VAC 20-amp flexible
power cords.  The power cords should be at least 12 feet long measured
from the point where they exit the bottom of the racks.  The nominal
operating current of any power cord should not exceed 80% of 20A

Example: if each rack contains 20 nodes, and each node requires
nominal 3.0 A @ 120 VAC, then each rack should contain a total of four
20A power strips.  Each power strip would power five nodes.  The racks
would then have four 12-foot 20A power cords.




Vendor must be prepared to demonstrate the following functionality on
the test node:

[1] PXE booting of machine with virgin disks (no OS or RAID array)
    and ability to run a remote kickstart script, ending with the
    system configured to boot from a Linux kernel from a RAID-5 disk
[2] Ability to monitor cooling fans, system and CPU temperatures,
    power supply and CPU voltages.
[3] Remote management functionality (with NO OS) on disk:
    (a) remote power off
    (b) remote power on
    (c) remote system (re)boot
    (d) remote motherboard bios setting
    (e) remote motherboard bios flashing
    (f) redirection of serial console boot input/output
        to a remote location
[4] 'shutdown -hf' or equivalent must power the system down to standby
    power only (no fans spinning).
[5] Ability of system (with the OS installed) to boot from and run
    from the hardware RAID controller array.

Before a purchase order is issued, the Vendor must attend a
half-day meeting with project principals at UWM or an agreed-upon
alternative location. The purpose of this meeting is to review the
detailed specifications above, the delivery schedule, and other
expectations.  Attending this meeting must be at least one member of
the Vendor's Management Team with fiduciary authority for this bid,
and at least one member of Vendor's Technical Team with overall
management responsibility for the acquisition, integration, testing
and delivery of the system.

The Vendor will designate a single point of contact for all technical,
configuration, management and delivery questions and issues.

Systems must have a three-year warranty on all parts and components.
Warranty will be both from the Vendor and from original manufacturer.
Note: for node repairs that can not be carried out by project
personnel on-site, systems will be returned by mail to Vendor for

Vendor will guarantee proper operation of all components in a Linux
environment using a modern Linux 2.6 kernel.  Baseline OS distribution
is Fedora Core 3 or 4.  Vendor will provide any patches needed to the
stock Fedora Core 3 or 4 distribution to provide the functionality
described in this document.  Test system will be delivered with Fedora
Core 3 or 4 and any necessary patches installed.

Vendor will repair or replace any subsystems or components which do
not operate in a reliable fashion with equivalent or better items at
Vendor expense.

Note: disk drive failures are specifically EXCLUDED from this last
requirement, PROVIDED that in conjunction with the hardware RAID
controllers, normal system operation is maintained through single-disk
failures.  A hardware RAID controller that does not provide reliable
operation after an initial burn-in period of one month will be
replaced with more reliable hardware RAID system at Vendor expense.
Reliable operation is defined as no more than four RAID controller
card failures with data loss in one year of operation.

Vendor agrees to pass on to purchaser any price drops in CPU
components which occur after bidding but before a purchase order is
issued, by increasing the number of nodes delivered.

Vendor agrees to maintain a stock of spare parts on site at UWM, in
sufficient number to cover expected hardware failures.  At a minimum
this should include:
- ten sticks of memory
- two CPUs, including any heat-sinks and/or fans
- two motherboards (including any management or daughter cards)
- four of each type of fan in the system
- three power supplies
- ten disk drives
- one complete case assembly including hot-swap carriers
- two RAID cards
- internal cabling sufficient for one system

This stock of spare parts will be maintained via replacement if and as
parts fail.  Failed parts will be periodically returned to the Vendor.

Vendors may wish to benchmark the completed system for inclusion in a
Top 500 (or other) list. The system will be made available to the
Vendor for this purpose during an initial one-month burn in period,
and the Vendor may circulate and/or publicize the results as they
wish. Please note that the University cannot appear to endorse or
promote any product nor can a vendor use the University's logo.


Vendor response must include:

[1]  Total number of nodes delivered?  Note: the number of CPUs should
     be twice this.
[2]  Motherboard manufacturer and model? 
[3]  Memory type and manufacturer?
[4]  Node case size (1U, 2U or 3U)?
[5]  MTBF of system power supply?
[6]  Additional remote management hardware monitoring capability
     (temperatures, fan speeds, voltages)?
[7]  Remote Management requires separate LAN (yes/no)?
[8]  Disk model
[9]  MTBF of system cooling fans
[10] Does hardware RAID controller support smartmontools?
[11] The number of nodes per rack and the total number of
[12] Are there working drive power/activity/failure indicator front
     panel lights?

Vendor must also specify the number of nodes delivered and per node
cost if the following changes are made to the specifications above (to
apply to ALL systems): 

- Use AMD Opteron 250 processors (latest stepping, 1GHz HT clock)
- Use AMD Opteron 252 processors (latest stepping, 1GHz HT clock)
- Use ONE AMD Opteron dual-core 265 processor (latest stepping, 1GHz
  HT clock) per motherboard. Processor must be in CPU1 socket, with
  CPU2 socket empty and available for future expansion. All memory
  must be in the CPU1 memory slots.
- Use of alternative RAID controllers in bidder's responses.
    (a) If Areca 1110 controller is specified, give price for systems
        delivered with 3ware 9500S-4LP instead.
    (b) If 3ware 9500S-4LP is specified, give price for systems
        delivered with  Areca 1110 instead.
    (c) If a different RAID controller is specified, give price for
        the Areca alternative.
    (d) If a different RAID controller is specified, give price for
        the 3ware alternative.
- Include a CD-ROM drive in each system
- Supply systems WITH NO DISK DRIVES (these may be provided by a disk
  manufacturer as part of a collaborative research project).

Vendor must provide Mean Time Between Failure (MTBF) data for power
supplies and fans.  Additional confidential MTBF data may also be
requested under a non-disclosure agreement before a purchase order is
Check this page for dead links, sloppy HTML, or a bad style sheet; or strip it for printing.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.