UWMLSC > Beowulf Systems > Nemo

NEMO RAID Server and Rack Specifications

NFS Storage Servers

This is a bid specification for Network File System (NFS) Storage
Servers for use in a a Linux Computing Cluster.  These will consist of
rack-mounted RAID storage units.

Bidder should give the cost of a system capable of storing 300TB of
user data, and offering at least a 10GB/sec READ or 10GB/sec WRITE
bandwidth to this data.  If units consist of identical components, for
example 50 storage units of 6TB each, then the price per unit should
also be indicated along with the minimum quantity that must be
purchased to get this price, in the event that less than 300TB is
purchased at the outset.

[Note: The entire system will be placed in an existing dedicated
1400-square foot cluster room at UW - Milwaukee.  This room contains
one dedicated Powerware 9315 500kVA/400kW central UPS system and two
dedicated Liebert 225 kVA power distribution units.  The room is
equipped with four dedicated 26 ton Data-Aire down-flow AC units and
an 18-inch raised floor air distribution system. A floor plan is
included.  A Cisco 6500 or Force10 E600/E1200 based ethernet switch
and suitable cabling will be provided by the purchaser.]


[1] NFS Server Specifications

- System will store data using a RAID-6 (preferred) or RAID-5 plus hot
  spare configuration, with native hardware RAID controllers.

- All hard disks (data and OS) will be hot swappable from the front
  panel with no tools.

- System will be capable of offering NFS access to this data.

- System will include redundant power supplies and battery-backup
  protection for hardware RAID write caches.

- System will offer aggregate 10 GB/sec bandwidth to the data, via one
  hundred copper gigabit ports.  These ports must support jumbo

- System will include remote management features (eg, IPMI, SNMP).
  These should include remote power cycling, BIOS access, and
  serial-over-lan console IO.

- System will have command-line and web based tools for management,
  observing rebuilds, warning of degraded RAID arrays, initializing
  RAID arrays, etc.

- System must support 'in the background' RAID array rebuilding.

- Systems must be delivered in 42U racks with 12-foot power cables
  (measured from the bottom of the racks).  Racks should fit through a
  standard 7' high wide doorway.

- Systems must be powered from available 120VAC 20A outlets.

- Systems and disk carriers will be clearly and distinctly labeled,
  indicating disk position within a particular unit.

- Systems for which alternative source replacement parts and
  components are available are desirable.

- Systems supporting 'in the background' RAID array initialization are

- Ability to monitor disk SMART data and run disk self-tests with
  Linux tools (eg, smartmontools) is desirable and should be noted.

- Systems supporting on-line RAID migration are desirable.

- Systems whose controller/backplane/hot-swap-bay combo provides
  functioning disk power/activity/failure indicator lights are

- Systems with simple uncrowded internal cabling and good cooling
  airflow are desirable.

- Systems with Native Command Queuing (NCQ) support are desirable.

Note: we use TB to denote 1,000,000,000,000 bytes of data and GB to
denote 1,000,000,000 bytes of data.

Choice of system will be based on price, quality and functionality of
overall design, anticipated reliability, ease of maintenance and
operation in a Linux environment, and life cycle cost factors.

A TYPICAL system design is sketched below.  Alternative designs will
be considered if they meet the functional specifications above.


Typical system design: 50 identical storage nodes.

Each node is a dual-CPU AMD Opteron 248 Linux computer with two
processors (versions with DDR400 support, latest E stepping, 1 GHz
HyperTransport clock).

Each node includes 4 GB of registered ECC memory, PC3200/DDR400

Each node uses two 12-port SATA controller cards (Areca or 3ware) with
multilane connectors to a hot-swap SATA backplane and battery backup

Each node includes two copper gigabit ethernet ports. Each RAID
controller user one network port for IO.

Each node has a 5U rack-mount case with 24 hot-swap front-panel SATA
carriers and redundant power supplies.

Each node contains quantity 24 x 320GB Raid Edition SATA drives.

Each controller is configured as 10+2 RAID-6 for total usable storage
of 6.4 TB, with 0.4TB used as boot partition and OS storage.

Nodes are delivered with 5 units per 42-U racks, leaving some space
for vertical internal access.

Inexpensive power distribution is provided within the racks using (for
example) Wiremold, Belkin, APC, Tripp Lite, or similar power strips.




Systems must have a three-year warranty on all parts and components.
Warranty will be both from the Vendor and from original manufacturer.
Note: for node repairs that can not be carried out by project
personnel on-site, systems will be returned by mail to Vendor for

Vendor will repair or replace any subsystems or components which do
not operate in a reliable fashion with equivalent or better items at
Vendor expense.

Note: disk drive failures are specifically EXCLUDED from this last
requirement, PROVIDED that in conjunction with the hardware RAID
controllers, normal system operation is maintained through single-disk
failures.  A hardware RAID controller that does not provide reliable
operation after an initial burn-in period of one month will be
replaced with more reliable hardware RAID system at Vendor expense.
Reliable operation is defined as no more than four RAID controller
card failures with data loss in one year of operation.

Vendor must maintain a sufficient stock of spare parts on site at
UWM, to allow immediate onsite replacement of any failed component.
This stock of spare parts will be maintained via replacement if and
as parts fail. Failed parts will be periodically returned to the Vendor.


Vendor response must include:

[0] A description of the overall system design, including manufacturer
    names and part numbers where appropriate (motherboard, CPU, memory,
    disks, RAID controllers, case, power-supply, etc).

[1] If applicable, total number of storage nodes, and cost per storage

[2] Minimum number of storage nodes which may be purchased for this

[3] Cost per storage node if ALL disks are supplied by end user.

[4] Total number of racks. Note that there is plenty of floor space
    available.  Weight of rack should be < 1250 pounds.

Vendor may be asked to provde Mean Time Between Failure (MTBF)
information for power supplies, disks, fans, and memory components.
Check this page for dead links, sloppy HTML, or a bad style sheet; or strip it for printing.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.