UWMLSC > Beowulf Systems > Nemo
  

NEMO Node and Rack Specifications

LINUX COMPUTING AND STORAGE CLUSTER
-----------------------------------

This is a bid specification for a scientific computing cluster running
the Linux operating system.  System will consist of rack-mounted
single processor dual core Opteron 175 cluster nodes (approximate
quantity 600 to 800).

Vendor should provide a price per node, including equipment racks.

[Note: The entire system will be placed in an existing dedicated
1400-square foot cluster room at UW - Milwaukee.  This room contains
one dedicated Powerware 9315 500kVA/400kW central UPS system and two
dedicated Liebert 225 kVA power distribution units.  The room is
equipped with four dedicated 26 ton Data-Aire down-flow AC units and
an 18-inch raised floor air distribution system. A floor plan is
included.  A Cisco 6500 or Force10 E600/E1200 or Foundry Networks
based ethernet switch and suitable cabling will be provided by the
purchaser and is NOT part of this bid solicitation.]

System will be delivered the the loading dock, UWM Physics Department,
1900 East Kenwood Blvd, Milwaukee WI 53211.  Vendor/Shipper must
deliver racks to the loading dock.  UWM personnel will handle all
inside delivery using forklift/pallet jack.

------------------------------------------------------------------------

[1] Compute node specifications

Each node will be a single-CPU AMD Opteron machine with one dual-core
AMD Opteron 175 processor (versions with DDR400 support, latest E
stepping, 1 GHz HyperTransport clock).  Each node must:

- have a motherboard which is certified by AMD for use with AMD
  dual-core processors model 165 and greater.

- include an on-board graphics controller

- include 2 GB of non-registered ECC memory, PC3200/DDR400 or faster.
  Memory must be from a first tier memory manufacturer such as
  Corsair, Kingston, Mushkin, Infineon, Dataram, Samsung, Viking,
  Micron, etc.  Memory must be certified as compatible by motherboard
  manufacturer.  Specify memory manufacturer. Memory must be 2 x 1GB
  so that it is upgradeable to 4GB in the future.

- be in a 1U rack mount case with slide rails for pull-out access.
  Case should include a 120 VAC power-supply cord.

- have a power supply with sufficient capacity to power all components
  when both CPU cores are at 100% and disk drive and all other
  components are fully active. In all situations the power supply
  should not be running at more than 80% of its rated power
  output. Power supply must have a power factor greater than or equal
  to 0.85 at nominal and maximum load. Vendor should specify Mean Time
  Between Failure (MTBF) of power supply at nominal load and 75F
  operating temperature.

- one or more gigabit Ethernet ports supporting jumbo frames (at least
  6kB in length, though 9kB is preferable) and compatible with the
  purchaser-provided Cisco 6500 or Force10 E600/1200 series or Foundry
  Networks network switch.  At least one Ethernet port must be capable
  of full-duplex wire-speed operation.  Systems must be capable of PXE
  booting and kickstart cloning using standard Linux tools.  Ethernet
  port must support Wake-On-LAN.

- sufficient internal cooling fans to maintain all component
  temperatures well below manufacturers operating limits, when in a
  75F room. Cooling fans and airflow must not be blocked or impeded by
  internal cabling.  Vendor should specify MTBF of system internal
  cooling fans.

- support the following monitoring from within the Linux 2.6 kernel
  (eg, using lm_sensors):
  - Rotation speeds of all internal cooling fans
  - System and CPU temperatures
  - Power supply and CPU voltages

- have an integrated or add-on remote management card (also called a 
  Baseboard Management Controller or BMC) that supports
  (at least) the following remote management functions via an Ethernet
  LAN interface:
  (a) remote power off
  (b) remote power on
  (c) remote system (re)boot
  (d) remote motherboard bios setting
  (e) remote motherboard bios upgrade/update/flashing
  (f) viewing serial console boot and runtime input/output from a
      remote management location

  The above functions must be supported even in the absence of any
  operating system on the nodes. Functions a-d must be supported using
  command-line scripts from a Linux management environment.  Vendor
  must be prepared to demonstrate correct operation of these
  functions.

  Function (e) may be unsupported, as long as the vendor provides an
  alternative simple AUTOMATED OR SCRIPTED HANDS-OFF procedure for
  upgrading/flashing/setting BIOS.

  Please specify if the management card/hardware is also capable of
  hardware monitoring, including system temperature, CPU temperature,
  fan rotation speeds, power supply voltages (in the absence of an
  Operating System).

  A BMC which is compatible with IPMI 2.0 is prefered but not required
  over earlier versions of the IPMI specification.

  Vendor should specify if management card/hardware requires a
  separate LAN or can piggy-back off the same LAN and Ethernet port
  used for data traffic WITHOUT impacting data throughput rate from
  the shared port. Piggy-backing is preferable.  If piggy-backing is
  NOT possible without a performance loss then the system price must
  also include a low-cost low-performance oversubscribed management
  network.  For example this might consist of twp inexpensive 16-port
  ethernet switches per rack, and a central 24-port switch
  concentrating these together, and network cables to tie these
  together.  If these additional components are required, their cost
  should be included in the per-node cost stated by the vendor.

- be delivered with BIOS settings (motherboard, ethernet, PXE) and
  Baseboard Management Controller/IPMI settings as specified by UWM.

- have one SATA 80GB disk in a front-panel accessible hot-swap
  carrier.  When mounted in carrier, this disk must be hot-swappable
  without opening the case or using any tools.  SATA disk must be (at
  least) 7200 RPM disks with (at least) 8MB cache memory.

- have a second (empty) SATA hot-swap carrier for an additional
  front-panel accessible disk which may be added in a future
  expansion.

  Please indicate if the on-board controller/hot-swap-bay combo
  provides functioning disk power/activity lights.

- be clearly labeled on both the front and back with large legible
  consecutive labels of the form S001, S002, S012, S321, .... .

- be clearly labeled on the back with the MAC addresses of all
  ethernet ports (motherboard and BMC).

- have hot swap carriers clearly labeled on both the front and back
  with large legible consecutive labels of the form S001/P0, S001/P1
  and so on.  Here P0-3 indicate the controller port, and should
  correspond to the OS identifier for the appropriate controller port.
  Thus, looking at node 123, the drive carriers and node should be
  labeled like this:
 
  -----------------------------------------------------
  | S123/P0 | S123/P1 |                         S123  |  
  -----------------------------------------------------

- have all labeling done in a permanent way.  Labels should not peel
  off, fall off, or degrade after a few years of normal use.

- shut down cleanly (with 'shutdown -hf' or similar) to standby power
  only.

- power up using 'wakeon lan'.

- contain identical parts, including identical firmware
  revisions/versions and board-level hardware revision numbers.

- contain manufacturers latest versions of BIOS and/or firmware.

- contain all components cleanly within the case, without interference
  or crowding.

  NOTE: our cluster room (details above) includes generous amounts of
  space, as well as clean UPS backed power and plenty of cooling.  We
  are not interested in getting the highest possible density of nodes.
  Vendor must respect the maximum recommended rack density for the
  systems and should not fully load the racks unless the systems are
  fully qualified in that configuration.  For example the Vendor may
  choose to provide only 32 nodes per 42U rack.


Nodes must be delivered to UW-Milwaukee in 42U equipment racks with a
total weight not exceeding 1250 pounds per rack.  Unused rack openings
should be covered with blanking plates to maintain proper
front-to-back cooling airflow.  Racks should not have doors, side
panels, fans, or other un-neccesary items which add expense.  Racks
should fit through a standard-height wide-opening 7' door.

Rack should include internal clips, guides, tracks or similar means to
neatly bundle and support network and power cables.

Inexpensive power distribution must be provided within the racks using
(for example) Wiremold, Belkin, APC, Tripp Lite, or similar power
strips.  Power available is 120 VAC with standard 20-amp outlets. 
There are approximately 250 such outlets in the room.

Each rack should have a suitable number of 120 VAC 20-amp flexible
power cords.  The power cords should be at least 12 feet long measured
from the point where they exit the bottom of the racks.  The nominal
operating current of any power cord should not exceed 80% of 20A
(16A).

Example: if each rack contains 32 nodes, and each node requires
nominal 2.0 A @ 120 VAC, then each rack should contain a total of four
20A power strips.  Each power strip would power eight nodes.  The racks
would then have four 12-foot 20A power cords.

The cost of racks should be included in the per-node cost quoted by
the Bidder.

------------------------------------------------------------------------

GENERAL

VENDOR MUST BE PREPARED TO FURNISH ONE OPERATIONAL NODE FULLY
CONFIGURED AS DESCRIBED ABOVE FOR EVALUATION PURPOSES. THIS NODE MUST
BE DELIVERED TO UW-MILWAUKEE WITHIN TEN DAYS OF REQUEST.

Preference will be given to systems whose components and chipsets are
documented to the Linux/Open Source community and thus are fully
supported in the publicly available Linux kernel source tree.

Vendor must be prepared to demonstrate the following functionality on
the test node:

[1] PXE booting of machine with virgin disks (no OS) and ability to
    run a remote kickstart script, ending with the system configured
    to boot from a Linux kernel.

[2] Ability to monitor cooling fans, system and CPU temperatures,
    power supply and CPU voltages.

[3] Remote management functionality (with NO OS) on disk:
    (a) remote power off
    (b) remote power on
    (c) remote system (re)boot
    (d) remote motherboard bios setting
    (e) remote motherboard bios flashing
    (f) redirection of serial console boot input/output
        to a remote location

[4] 'shutdown -hf' or equivalent must power the system down to standby
    power only (no fans spinning).

Before a purchase order is issued, the Vendor must attend a
half-day meeting with project principals at UWM or an agreed-upon
alternative location. The purpose of this meeting is to review the
detailed specifications above, the delivery schedule, and other
expectations.  Attending this meeting must be at least one member of
the Vendor's Management Team with fiduciary authority for this bid,
and at least one member of Vendor's Technical Team with overall
management responsibility for the acquisition, integration, testing
and delivery of the system.

The Vendor will designate a single point of contact for all technical,
configuration, management and delivery questions and issues.

Systems must have a three-year warranty on all parts and components.
Warranty will be both from the Vendor and from original manufacturer.
Note: for node repairs that can not be carried out by project
personnel on-site, systems will be returned by mail to Vendor for
repair.

Vendor will guarantee proper operation of all components in a Linux
environment using a modern Linux 2.6 kernel.  Baseline OS distribution
is Fedora Core 3 or 4.  Vendor will provide any patches needed to the
stock Fedora Core 3 or 4 distribution to provide the functionality
described in this document.  Test system will be delivered with Fedora
Core 3 or 4 and any necessary patches installed.

Vendor will repair or replace any subsystems or components which do
not operate in a reliable fashion with equivalent or better items at
Vendor expense.

Vendor agrees to pass on to purchaser any price drops in CPU
components which occur after bidding but before a purchase order is
issued, by increasing the number of nodes delivered.

Vendor agrees to maintain a stock of spare parts on site at UWM, in
sufficient number to cover expected hardware failures.  At a minimum
this should include:
- ten sticks of memory
- two CPUs, including any heat-sinks and/or fans
- two motherboards (including any management or daughter cards)
- four of each type of fan in the system
- three power supplies
- ten disk drives
- one complete case assembly including hot-swap carriers
- internal cabling sufficient for one system

This stock of spare parts will be maintained via replacement if and as
parts fail.  Failed parts will be periodically returned to the Vendor.

Vendors may wish to benchmark the completed system for inclusion in a
Top 500 (or other) list. The system will be made available to the
Vendor for this purpose during an initial one-month burn in period,
and the Vendor may circulate and/or publicize the results as they
wish. Please note that the University cannot appear to endorse or
promote any product nor can a vendor use the University's logo.

-------------------------------------------------------------------------


Vendor response must include:

[1]  Cost per node?  (Each node has one CPU and two CPU cores).
[2]  Motherboard manufacturer and model? 
[3]  Memory type and manufacturer?
[4]  MTBF of system power supply?
[5]  Additional remote management hardware monitoring capability
     (temperatures, fan speeds, voltages)?
[6]  Remote Management requires separate LAN (yes/no)?
[7]  Disk model
[8]  MTBF of system cooling fans
[9]  The number of nodes per rack and the total number of racks?
[10] Are there working drive power/activity indicator front panel
     lights?

Vendor must also specify the per node cost if the following changes
are made to the specifications above:

- Use AMD Opteron 165 processor (latest stepping, 1GHz HT clock)

- Use AMD Opteron 170 processor (latest stepping, 1GHz HT clock)

- Use AMD Opteron 180 processor (latest stepping, 1GHz HT clock) if
  MB will accomodate it.

- Include a CD-ROM drive in each system

Vendor must provide Mean Time Between Failure (MTBF) data for power
supplies and fans.  Additional confidential MTBF data may also be
requested under a non-disclosure agreement before a purchase order is
issued.
Check this page for dead links, sloppy HTML, or a bad style sheet; or strip it for printing.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.