NEMO Node and Rack Specifications
LINUX COMPUTING AND STORAGE CLUSTER
-----------------------------------
This is a bid specification for a scientific computing cluster running
the Linux operating system. System will consist of rack-mounted
single processor dual core Opteron 175 cluster nodes (approximate
quantity 600 to 800).
Vendor should provide a price per node, including equipment racks.
[Note: The entire system will be placed in an existing dedicated
1400-square foot cluster room at UW - Milwaukee. This room contains
one dedicated Powerware 9315 500kVA/400kW central UPS system and two
dedicated Liebert 225 kVA power distribution units. The room is
equipped with four dedicated 26 ton Data-Aire down-flow AC units and
an 18-inch raised floor air distribution system. A floor plan is
included. A Cisco 6500 or Force10 E600/E1200 or Foundry Networks
based ethernet switch and suitable cabling will be provided by the
purchaser and is NOT part of this bid solicitation.]
System will be delivered the the loading dock, UWM Physics Department,
1900 East Kenwood Blvd, Milwaukee WI 53211. Vendor/Shipper must
deliver racks to the loading dock. UWM personnel will handle all
inside delivery using forklift/pallet jack.
------------------------------------------------------------------------
[1] Compute node specifications
Each node will be a single-CPU AMD Opteron machine with one dual-core
AMD Opteron 175 processor (versions with DDR400 support, latest E
stepping, 1 GHz HyperTransport clock). Each node must:
- have a motherboard which is certified by AMD for use with AMD
dual-core processors model 165 and greater.
- include an on-board graphics controller
- include 2 GB of non-registered ECC memory, PC3200/DDR400 or faster.
Memory must be from a first tier memory manufacturer such as
Corsair, Kingston, Mushkin, Infineon, Dataram, Samsung, Viking,
Micron, etc. Memory must be certified as compatible by motherboard
manufacturer. Specify memory manufacturer. Memory must be 2 x 1GB
so that it is upgradeable to 4GB in the future.
- be in a 1U rack mount case with slide rails for pull-out access.
Case should include a 120 VAC power-supply cord.
- have a power supply with sufficient capacity to power all components
when both CPU cores are at 100% and disk drive and all other
components are fully active. In all situations the power supply
should not be running at more than 80% of its rated power
output. Power supply must have a power factor greater than or equal
to 0.85 at nominal and maximum load. Vendor should specify Mean Time
Between Failure (MTBF) of power supply at nominal load and 75F
operating temperature.
- one or more gigabit Ethernet ports supporting jumbo frames (at least
6kB in length, though 9kB is preferable) and compatible with the
purchaser-provided Cisco 6500 or Force10 E600/1200 series or Foundry
Networks network switch. At least one Ethernet port must be capable
of full-duplex wire-speed operation. Systems must be capable of PXE
booting and kickstart cloning using standard Linux tools. Ethernet
port must support Wake-On-LAN.
- sufficient internal cooling fans to maintain all component
temperatures well below manufacturers operating limits, when in a
75F room. Cooling fans and airflow must not be blocked or impeded by
internal cabling. Vendor should specify MTBF of system internal
cooling fans.
- support the following monitoring from within the Linux 2.6 kernel
(eg, using lm_sensors):
- Rotation speeds of all internal cooling fans
- System and CPU temperatures
- Power supply and CPU voltages
- have an integrated or add-on remote management card (also called a
Baseboard Management Controller or BMC) that supports
(at least) the following remote management functions via an Ethernet
LAN interface:
(a) remote power off
(b) remote power on
(c) remote system (re)boot
(d) remote motherboard bios setting
(e) remote motherboard bios upgrade/update/flashing
(f) viewing serial console boot and runtime input/output from a
remote management location
The above functions must be supported even in the absence of any
operating system on the nodes. Functions a-d must be supported using
command-line scripts from a Linux management environment. Vendor
must be prepared to demonstrate correct operation of these
functions.
Function (e) may be unsupported, as long as the vendor provides an
alternative simple AUTOMATED OR SCRIPTED HANDS-OFF procedure for
upgrading/flashing/setting BIOS.
Please specify if the management card/hardware is also capable of
hardware monitoring, including system temperature, CPU temperature,
fan rotation speeds, power supply voltages (in the absence of an
Operating System).
A BMC which is compatible with IPMI 2.0 is prefered but not required
over earlier versions of the IPMI specification.
Vendor should specify if management card/hardware requires a
separate LAN or can piggy-back off the same LAN and Ethernet port
used for data traffic WITHOUT impacting data throughput rate from
the shared port. Piggy-backing is preferable. If piggy-backing is
NOT possible without a performance loss then the system price must
also include a low-cost low-performance oversubscribed management
network. For example this might consist of twp inexpensive 16-port
ethernet switches per rack, and a central 24-port switch
concentrating these together, and network cables to tie these
together. If these additional components are required, their cost
should be included in the per-node cost stated by the vendor.
- be delivered with BIOS settings (motherboard, ethernet, PXE) and
Baseboard Management Controller/IPMI settings as specified by UWM.
- have one SATA 80GB disk in a front-panel accessible hot-swap
carrier. When mounted in carrier, this disk must be hot-swappable
without opening the case or using any tools. SATA disk must be (at
least) 7200 RPM disks with (at least) 8MB cache memory.
- have a second (empty) SATA hot-swap carrier for an additional
front-panel accessible disk which may be added in a future
expansion.
Please indicate if the on-board controller/hot-swap-bay combo
provides functioning disk power/activity lights.
- be clearly labeled on both the front and back with large legible
consecutive labels of the form S001, S002, S012, S321, .... .
- be clearly labeled on the back with the MAC addresses of all
ethernet ports (motherboard and BMC).
- have hot swap carriers clearly labeled on both the front and back
with large legible consecutive labels of the form S001/P0, S001/P1
and so on. Here P0-3 indicate the controller port, and should
correspond to the OS identifier for the appropriate controller port.
Thus, looking at node 123, the drive carriers and node should be
labeled like this:
-----------------------------------------------------
| S123/P0 | S123/P1 | S123 |
-----------------------------------------------------
- have all labeling done in a permanent way. Labels should not peel
off, fall off, or degrade after a few years of normal use.
- shut down cleanly (with 'shutdown -hf' or similar) to standby power
only.
- power up using 'wakeon lan'.
- contain identical parts, including identical firmware
revisions/versions and board-level hardware revision numbers.
- contain manufacturers latest versions of BIOS and/or firmware.
- contain all components cleanly within the case, without interference
or crowding.
NOTE: our cluster room (details above) includes generous amounts of
space, as well as clean UPS backed power and plenty of cooling. We
are not interested in getting the highest possible density of nodes.
Vendor must respect the maximum recommended rack density for the
systems and should not fully load the racks unless the systems are
fully qualified in that configuration. For example the Vendor may
choose to provide only 32 nodes per 42U rack.
Nodes must be delivered to UW-Milwaukee in 42U equipment racks with a
total weight not exceeding 1250 pounds per rack. Unused rack openings
should be covered with blanking plates to maintain proper
front-to-back cooling airflow. Racks should not have doors, side
panels, fans, or other un-neccesary items which add expense. Racks
should fit through a standard-height wide-opening 7' door.
Rack should include internal clips, guides, tracks or similar means to
neatly bundle and support network and power cables.
Inexpensive power distribution must be provided within the racks using
(for example) Wiremold, Belkin, APC, Tripp Lite, or similar power
strips. Power available is 120 VAC with standard 20-amp outlets.
There are approximately 250 such outlets in the room.
Each rack should have a suitable number of 120 VAC 20-amp flexible
power cords. The power cords should be at least 12 feet long measured
from the point where they exit the bottom of the racks. The nominal
operating current of any power cord should not exceed 80% of 20A
(16A).
Example: if each rack contains 32 nodes, and each node requires
nominal 2.0 A @ 120 VAC, then each rack should contain a total of four
20A power strips. Each power strip would power eight nodes. The racks
would then have four 12-foot 20A power cords.
The cost of racks should be included in the per-node cost quoted by
the Bidder.
------------------------------------------------------------------------
GENERAL
VENDOR MUST BE PREPARED TO FURNISH ONE OPERATIONAL NODE FULLY
CONFIGURED AS DESCRIBED ABOVE FOR EVALUATION PURPOSES. THIS NODE MUST
BE DELIVERED TO UW-MILWAUKEE WITHIN TEN DAYS OF REQUEST.
Preference will be given to systems whose components and chipsets are
documented to the Linux/Open Source community and thus are fully
supported in the publicly available Linux kernel source tree.
Vendor must be prepared to demonstrate the following functionality on
the test node:
[1] PXE booting of machine with virgin disks (no OS) and ability to
run a remote kickstart script, ending with the system configured
to boot from a Linux kernel.
[2] Ability to monitor cooling fans, system and CPU temperatures,
power supply and CPU voltages.
[3] Remote management functionality (with NO OS) on disk:
(a) remote power off
(b) remote power on
(c) remote system (re)boot
(d) remote motherboard bios setting
(e) remote motherboard bios flashing
(f) redirection of serial console boot input/output
to a remote location
[4] 'shutdown -hf' or equivalent must power the system down to standby
power only (no fans spinning).
Before a purchase order is issued, the Vendor must attend a
half-day meeting with project principals at UWM or an agreed-upon
alternative location. The purpose of this meeting is to review the
detailed specifications above, the delivery schedule, and other
expectations. Attending this meeting must be at least one member of
the Vendor's Management Team with fiduciary authority for this bid,
and at least one member of Vendor's Technical Team with overall
management responsibility for the acquisition, integration, testing
and delivery of the system.
The Vendor will designate a single point of contact for all technical,
configuration, management and delivery questions and issues.
Systems must have a three-year warranty on all parts and components.
Warranty will be both from the Vendor and from original manufacturer.
Note: for node repairs that can not be carried out by project
personnel on-site, systems will be returned by mail to Vendor for
repair.
Vendor will guarantee proper operation of all components in a Linux
environment using a modern Linux 2.6 kernel. Baseline OS distribution
is Fedora Core 3 or 4. Vendor will provide any patches needed to the
stock Fedora Core 3 or 4 distribution to provide the functionality
described in this document. Test system will be delivered with Fedora
Core 3 or 4 and any necessary patches installed.
Vendor will repair or replace any subsystems or components which do
not operate in a reliable fashion with equivalent or better items at
Vendor expense.
Vendor agrees to pass on to purchaser any price drops in CPU
components which occur after bidding but before a purchase order is
issued, by increasing the number of nodes delivered.
Vendor agrees to maintain a stock of spare parts on site at UWM, in
sufficient number to cover expected hardware failures. At a minimum
this should include:
- ten sticks of memory
- two CPUs, including any heat-sinks and/or fans
- two motherboards (including any management or daughter cards)
- four of each type of fan in the system
- three power supplies
- ten disk drives
- one complete case assembly including hot-swap carriers
- internal cabling sufficient for one system
This stock of spare parts will be maintained via replacement if and as
parts fail. Failed parts will be periodically returned to the Vendor.
Vendors may wish to benchmark the completed system for inclusion in a
Top 500 (or other) list. The system will be made available to the
Vendor for this purpose during an initial one-month burn in period,
and the Vendor may circulate and/or publicize the results as they
wish. Please note that the University cannot appear to endorse or
promote any product nor can a vendor use the University's logo.
-------------------------------------------------------------------------
Vendor response must include:
[1] Cost per node? (Each node has one CPU and two CPU cores).
[2] Motherboard manufacturer and model?
[3] Memory type and manufacturer?
[4] MTBF of system power supply?
[5] Additional remote management hardware monitoring capability
(temperatures, fan speeds, voltages)?
[6] Remote Management requires separate LAN (yes/no)?
[7] Disk model
[8] MTBF of system cooling fans
[9] The number of nodes per rack and the total number of racks?
[10] Are there working drive power/activity indicator front panel
lights?
Vendor must also specify the per node cost if the following changes
are made to the specifications above:
- Use AMD Opteron 165 processor (latest stepping, 1GHz HT clock)
- Use AMD Opteron 170 processor (latest stepping, 1GHz HT clock)
- Use AMD Opteron 180 processor (latest stepping, 1GHz HT clock) if
MB will accomodate it.
- Include a CD-ROM drive in each system
Vendor must provide Mean Time Between Failure (MTBF) data for power
supplies and fans. Additional confidential MTBF data may also be
requested under a non-disclosure agreement before a purchase order is
issued.