Agency Bid Number: E6-016-O -- Bid Title: Scientific Microcomputer
Cluster Running The Linux Operating System
NOTICE OF BID OPPORTUNITY
Dear Vendor:
The above Official Sealed Bid is being let in your commodity area. To
obtain additional information, login to VendorNet
(http://vendornet.state.wi.us/) and click on SEARCH. Select "Official Sealed Bids". Next select "By agency bid number", enter the agency bid
number in the text box as the keyword, and click on "Search Vendornet".
The system will locate the bid announcement for your review.
The official bid specification may also be obtained from the UWM Computer Equipment Purchasing Officer, Mr. Ed Seeberg, email: ELS@bfs.uwm.edu, tel: 414-229-4077.
LINUX COMPUTING AND STORAGE CLUSTER
-----------------------------------
This is a bid specification for a scientific computing cluster running
the Linux operating system, to consist of:
[1] rack-mounted dual-processor cluster nodes
[2] equipment racks
The total amount paid for all the equipment listed above will be
$1,150,000.00
Bidders must respond by indicating the TOTAL NUMBER OF COMPUTE NODES
which will be provided. Additional technical information must also be
provided as indicated below.
[Note: The entire system will be placed in an existing dedicated
1400-square foot cluster room at UW - Milwaukee. This room contains
one dedicated Powerware 9315 500kVA/400kW central UPS system and two
dedicated Liebert 225 kVA power distribution units. The room is
equipped with four dedicated 26 ton Data-Aire down-flow AC units and
an 18-inch raised floor air distribution system. A floor plan is
included. A Cisco 6500 or Force10 E600/E1200 based ethernet switch
and suitable cabling will be provided by the purchaser.]
------------------------------------------------------------------------
[1] Compute node specifications
Each node will be a dual-CPU AMD Opteron machine with two (single
core) AMD Opteron 248 processors (versions with DDR400 support,
latest E stepping, 1 GHz HyperTransport clock). Each
node must:
- have a motherboard that accommodates dual core CPUs for a potential
future upgrade.
- have a motherboard which is certified by AMD for use with AMD
Opteron processors up to and including model 252 and dual-core
processors up to and including model 275.
- include an on-board graphics controller
- include 4 GB of registered ECC memory, PC3200/DDR400 or faster.
Memory must be from a first tier memory manufacturer such as
Corsair, Kingston, Mushkin, Infineon, Dataram, Samsung, Viking,
Micron, etc. Memory must be certified as compatible by motherboard
manufacturer. Specify memory manufacturer.
- have maximum memory interleave, so memory must occupy at least two
memory slots per CPU.
- be in a 1U or 2U or 3U rack mount case with slide rails for pull-out
access. Case should include a 120 VAC power-supply cord. Case size
should be chosen by vendor to minimize cost while providing good
cooling airflow and easy access for maintenance. If slide rails are
"full extension" and allow 100% access even in a fully
stuffed rack, this is an advantage and should be noted.
- have a power supply with sufficient capacity to power all components
when both CPUs are at 100% and all disk drives and other components
are fully active. Power supply must also have sufficient capacity
to handle dual core CPUs. In all situations the power supply should
not be running at more than 80% of its rated power output. Power
supply must have a power factor greater than or equal to 0.85 at
nominal and maximum load. Vendor should specify Mean Time Between
Failure (MTBF) of power supply at nominal load and 75°F operating
temperature.
- one or more gigabit Ethernet ports supporting jumbo frames (at least
6kB in length, though 9kB is preferable) and compatible with the
purchaser-provided Cisco 6500 or Force10 E600/1200 series network
switch. At least one Ethernet port must be capable of full-duplex
wire-speed operation. Systems must be capable of PXE booting and
kickstart cloning using standard Linux tools. Ethernet port must
support Wake-On-LAN.
- sufficient internal cooling fans to maintain all component
temperatures well below manufacturers operating limits, when in a
75°F room. Cooling fans and airflow must not be blocked or
impeded by internal cabling. Vendor should specify MTBF of system
internal cooling fans.
- support the following monitoring from within the Linux 2.6 kernel
(eg, using lm_sensors):
- Rotation speeds of all internal cooling fans
- System and CPU temperatures
- Power supply and CPU voltages
- have an integrated or add-on remote management card (also called a
Baseline Management Controller or BMC) that supports
(at least) the following remote management functions via an Ethernet
LAN interface:
(a) remote power off
(b) remote power on
(c) remote system (re)boot
(d) remote motherboard bios setting
(e) remote motherboard bios upgrade/update/flashing
(f) viewing serial console boot and runtime input/output from a
remote management location
The above functions must be supported even in the absence of any
operating system on the nodes. Functions a-d must be supported using
command-line scripts from a Linux management environment. Vendor
must be prepared to demonstrate correct operation of these
functions.
Function (e) may be unsupported, as long as the vendor provides an
alternative simple AUTOMATED OR SCRIPTED HANDS-OFF procedure for
upgrading/flashing/setting BIOS.
Please specify if the management card/hardware is also capable of
hardware monitoring, including system temperature, CPU temperature,
fan rotation speeds, power supply voltages (in the absence of an
Operating System).
Vendor should specify if management card/hardware requires a
separate LAN or can piggy-back off the same LAN and Ethernet port
used for data traffic WITHOUT impacting data throughput rate from
the shared port. Piggy-backing is preferable. If piggy-backing is
NOT possible without a performance loss then the system price must
also include a low-cost low-performance oversubscribed management
network. For example this might consist of one 24-port ethernet
switch per rack, and a central 24-port switch concentrating these
together, and network cables to tie these together.
- be delivered with BIOS settings (motherboard, RAID controller)
and Baseboard Management Controller/IPMI settings as specified by
UWM.
- have four front-panel accessible SATA 250GB disks in hot-swap
carriers. When mounted in carriers, these disks must be
hot-swappable without opening the case or using any tools. SATA
disks must be (at least) 7200 RPM disks with (at least) 8MB cache
memory. They must be certified (drive model and firmware) for use
with the hardware RAID controller. Preference may be given to 'RAID
Edition' type drives whose firmware is designed to carry out timely
and agressive sector reallocation for UNC (Uncorrectable) sector
reads.
- have a hardware RAID controller. This must offer the following
capabilities:
(a) Fully supported by manufacturer in the Linux 2.6 kernel tree
(b) RAID-5 on the four system disks, to yield approximately 750 GB
of usable storage space.
(c) Linux command-line tools for management and monitoring of the
RAID array. These must allow RAID arrays to be configured and
rebuilt, and provide automatic notification of failed disks via
email or a similar mechanism.
(d) Capable of at least 30 MB/sec writes and 60 MB/sec reads
(sequential block access on 8GB files) using a Reiser file
system, as measured by Bonnie++ with a 64kB or smaller stripe
size (details below).
(e) System must be able to boot from the RAID-5 array, and carry on
normal OS operation during any single disk failure. System must
be able to automatically rebuild redundant RAID array during
normal OS operation.
(f) If system fails to read a block of data at some LBA (UNC error)
from disk A, the system will read the corresponding block of
data from one of the redundant disks (B, C, or D) and then WRITE
the corresponding data to the failing LBA of disk A to force
sector reallocation on disk A.
Acceptable RAID controllers include the 3ware 9500S-LP and Areca
ARC-1110. 3ware 9550 controllers are also be acceptable (but note
that they are often a poor fit in typical 1U server chassis).
Speed benchmarks will be obtained using Bonnie++ running under
Linux using a recent 2.6 kernel. Preference will be given to
systems with faster read/write performance and to systems with a
lower overall cost. Benchmark should be run in RAID-5 mode with
write through caching enabled and a 64kB or smaller stripe size.
Vendors who wish to provide this data should give output from
Bonnie++ v 1.03 with the command line:
bonnie++ -s 8192:64K -x 5 -d /REISERFS/ -fu root
Ability to monitor disk SMART data and run disk self-tests with
Linux tools (eg, smartmontools) is desirable and should be noted.
Please indicate if the controller/backplane/hot-swap-bay combo
provides functioning disk power/activity/failure indicator lights.
- be clearly labeled on both the front and back with large legible
consecutive labels of the form S001, S002, S012, S321, .... .
- be clearly labeled on the back with the MAC addresses of all
ethernet ports (motherboard and BMC).
- have hot swap carriers clearly labeled on both the front and back
with large legible consecutive labels of the form S001/P0, S001/P1,
S001/P2, S001/P3 and so on. Here P0-3 indicate the RAID controller
port, and should correspond to the OS identifier for the appropriate
controller port. On each node, RAID controller port numbers must be
identically numbered. The numbers must increase either from left to
right (prefered) or from right to left. Thus, looking at node 123,
the drive carriers and node should be labeled like this:
-----------------------------------------------------
| S123/P0 | S123/P1 | S123/P2 | S123/P3 | S123 |
-----------------------------------------------------
- have all labeling done in a permanent way. Labels should not peel
off, fall off, or degrade after a few years of normal use.
- shut down cleanly (with 'shutdown -hf' or similar) to standby power
only.
- power up using 'wakeon lan'.
- contain identical parts, including identical firmware
revisions/versions and board-level hardware revision numbers.
- contain manufacturers latest versions of BIOS and/or firmware.
- contain all components cleanly within the case, without interference
or crowding. For example the RAID controller card and/or cabling
MUST NOT 'bump into' the CPU heatsinks.
NOTE: our cluster room (details above) includes generous amounts of
space, as well as clean UPS backed power and plenty of cooling.
We are not interested in getting the highest possible density of nodes.
Vendor must respect the maximum recommended rack density for the
systems and should not fully load the racks unless the systems are
fully qualified in that configuration. For example the Vendor may
choose to provide only 20 nodes per 42U rack.
------------------------------------------------------------------------
[2] Equipment racks
Nodes must be delivered to UW-Milwaukee in 42U equipment racks with a
total weight not exceeding 1250 pounds per rack. Unused rack openings
should be covered with blanking plates to maintain proper
front-to-back cooling airflow. Racks should not have doors,
side panels, fans, or other un-neccesary items which add expense.
Rack should include internal clips, guides, tracks or similar means to
neatly bundle and support network and power cables.
Inexpensive power distribution must be provided within the racks using
(for example) Wiremold, Belkin, APC, Tripp Lite, or similar power
strips. Power available is 120 VAC with standard 20-amp outlets.
There are approximately 250 such outlets in the room.
Each rack should have a suitable number of 120 VAC 20-amp flexible
power cords. The power cords should be at least 12 feet long measured
from the point where they exit the bottom of the racks. The nominal
operating current of any power cord should not exceed 80% of 20A
(16A).
Example: if each rack contains 20 nodes, and each node requires
nominal 3.0 A @ 120 VAC, then each rack should contain a total of four
20A power strips. Each power strip would power five nodes. The racks
would then have four 12-foot 20A power cords.
------------------------------------------------------------------------
GENERAL
VENDOR MUST BE PREPARED TO FURNISH ONE OPERATIONAL NODE FULLY
CONFIGURED AS DESCRIBED ABOVE FOR EVALUATION PURPOSES. THIS NODE MUST
BE DELIVERED TO UW-MILWAUKEE WITHIN TEN DAYS OF REQUEST.
Vendor must be prepared to demonstrate the following functionality on
the test node:
[1] PXE booting of machine with virgin disks (no OS or RAID array)
and ability to run a remote kickstart script, ending with the
system configured to boot from a Linux kernel from a RAID-5 disk
array.
[2] Ability to monitor cooling fans, system and CPU temperatures,
power supply and CPU voltages.
[3] Remote management functionality (with NO OS) on disk:
(a) remote power off
(b) remote power on
(c) remote system (re)boot
(d) remote motherboard bios setting
(e) remote motherboard bios flashing
(f) redirection of serial console boot input/output
to a remote location
[4] 'shutdown -hf' or equivalent must power the system down to standby
power only (no fans spinning).
[5] Ability of system (with the OS installed) to boot from and run
from the hardware RAID controller array.
Before a purchase order is issued, the Vendor must attend a
half-day meeting with project principals at UWM or an agreed-upon
alternative location. The purpose of this meeting is to review the
detailed specifications above, the delivery schedule, and other
expectations. Attending this meeting must be at least one member of
the Vendor's Management Team with fiduciary authority for this bid,
and at least one member of Vendor's Technical Team with overall
management responsibility for the acquisition, integration, testing
and delivery of the system.
The Vendor will designate a single point of contact for all technical,
configuration, management and delivery questions and issues.
Systems must have a three-year warranty on all parts and components.
Warranty will be both from the Vendor and from original manufacturer.
Note: for node repairs that can not be carried out by project
personnel on-site, systems will be returned by mail to Vendor for
repair.
Vendor will guarantee proper operation of all components in a Linux
environment using a modern Linux 2.6 kernel. Baseline OS distribution
is Fedora Core 3 or 4. Vendor will provide any patches needed to the
stock Fedora Core 3 or 4 distribution to provide the functionality
described in this document. Test system will be delivered with Fedora
Core 3 or 4 and any necessary patches installed.
Vendor will repair or replace any subsystems or components which do
not operate in a reliable fashion with equivalent or better items at
Vendor expense.
Note: disk drive failures are specifically EXCLUDED from this last
requirement, PROVIDED that in conjunction with the hardware RAID
controllers, normal system operation is maintained through single-disk
failures. A hardware RAID controller that does not provide reliable
operation after an initial burn-in period of one month will be
replaced with more reliable hardware RAID system at Vendor expense.
Reliable operation is defined as no more than four RAID controller
card failures with data loss in one year of operation.
Vendor agrees to pass on to purchaser any price drops in CPU
components which occur after bidding but before a purchase order is
issued, by increasing the number of nodes delivered.
Vendor agrees to maintain a stock of spare parts on site at UWM, in
sufficient number to cover expected hardware failures. At a minimum
this should include:
- ten sticks of memory
- two CPUs, including any heat-sinks and/or fans
- two motherboards (including any management or daughter cards)
- four of each type of fan in the system
- three power supplies
- ten disk drives
- one complete case assembly including hot-swap carriers
- two RAID cards
- internal cabling sufficient for one system
This stock of spare parts will be maintained via replacement if and as
parts fail. Failed parts will be periodically returned to the Vendor.
Vendors may wish to benchmark the completed system for inclusion in a
Top 500 (or other) list. The system will be made available to the
Vendor for this purpose during an initial one-month burn in period,
and the Vendor may circulate and/or publicize the results as they
wish. Please note that the University cannot appear to endorse or
promote any product nor can a vendor use the University's logo.
-------------------------------------------------------------------------
Vendor response must include:
[1] Total number of nodes delivered? Note: the number of CPUs should
be twice this.
[2] Motherboard manufacturer and model?
[3] Memory type and manufacturer?
[4] Node case size (1U, 2U or 3U)?
[5] MTBF of system power supply?
[6] Additional remote management hardware monitoring capability
(temperatures, fan speeds, voltages)?
[7] Remote Management requires separate LAN (yes/no)?
[8] Disk model
[9] MTBF of system cooling fans
[10] Does hardware RAID controller support smartmontools?
[11] The number of nodes per rack and the total number of
racks?
[12] Are there working drive power/activity/failure indicator front
panel lights?
Vendor must also specify the number of nodes delivered and per node
cost if the following changes are made to the specifications above (to
apply to ALL systems):
- Use AMD Opteron 250 processors (latest stepping, 1GHz HT clock)
- Use AMD Opteron 252 processors (latest stepping, 1GHz HT clock)
- Use ONE AMD Opteron dual-core 265 processor (latest stepping, 1GHz
HT clock) per motherboard. Processor must be in CPU1 socket, with
CPU2 socket empty and available for future expansion. All memory
must be in the CPU1 memory slots.
- Use of alternative RAID controllers in bidder's responses.
(a) If Areca 1110 controller is specified, give price for systems
delivered with 3ware 9500S-4LP instead.
(b) If 3ware 9500S-4LP is specified, give price for systems
delivered with Areca 1110 instead.
(c) If a different RAID controller is specified, give price for
the Areca alternative.
(d) If a different RAID controller is specified, give price for
the 3ware alternative.
- Include a CD-ROM drive in each system
- Supply systems WITH NO DISK DRIVES (these may be provided by a disk
manufacturer as part of a collaborative research project).
Vendor must provide Mean Time Between Failure (MTBF) data for power
supplies and fans. Additional confidential MTBF data may also be
requested under a non-disclosure agreement before a purchase order is
issued.