|
|
|
Switching
The cluster's proposed switch topology is as shown below:

XFig image source file
This is a network design that is oversubscribed at one or two
locations, depending upon the choice of the core switch on the right.
While it would be possible to design a network with no
oversubscription, this would increase costs by approximately $120k,
and we believe that this would not increase the performance of the
cluster on typical analylsis codes.
This design differs from the original proposal. The design in the
original proposal was fully meshed and not oversubscribed. However we
believe that based on observation of current analysis codes, it makes
more sense to have a network design that is oversubscribed and to
spend the savings on additional nodes.
The network is oversubscribed at the edges by a factor of two. The
edge switches are SMC
8505T (5 port) or 8508T
(8 port) unmanaged layer two switches. These are based on a Broadcom
chip set, and cost approximately $62 and $89 each, respectively.
These switches are non-blocking at all packet sizes and can handle
jumbo frames up to 9kB in size.
There are a handful of options for the core network switch. These
are:
Cisco 6509E chassis. This
has 9 slots. One must be occupied by a supervisor 720 card. Five
slots are populated with 48-port line cards, from the 67XX series.
Three slots are free. Both the chassis and the line cards will not
run at line speed, but will run close to line speed if we include
distributed forwarding cards. The cost of this core switch is
$117k including redundant power supplies and three years of
service.
Cisco 6509E chassis,
exactly as above but WITHOUT the distributed forwarding cards. The
cost of this core switch is $94k.
Force10 E600 chassis. This
has 7 slots. Five slots are populated with 48-port line cards. Two
slots are free. Both the chassis and line cards are non-blocking at
all packet sizes. The cost of this core switch is $126k.
Force10 E1200 chassis. This has 14 slots. Five slots
are populated with 48-port line cards. Nine slots are free. Both the
chassis and line cards are non-blocking at all packet sizes. The
cost of this core switch is $141k.
We have tested the edge switches and on-board Broadcom NICs both
for overall performance and for jumbo frame support. There are two
choices of Linux driver for this NIC. NEED BCM RESULTS.
The following netperf tests were performed using netperf-2.4.0,
available at netperf's
home page. For each test, I ran netserv on one machine (with the
command netserver -v 4 -4 -d, where -v 4
turned up verbosity, -4 specifies TCP/IP v4.0, and -d would return
extra debug info), and then ran the following netperf commands from
the remote machine. In the tests that measure bidirectional speeds I
ran a netserver and netperf on each machine.
tg3 driver, crossover cable - tg3 driver
supports >9000 byte frames (it varied by payload, I saw 9010-9014
byte frames, I tested by sending larger than 9000 byte frames,
sniffing, looking for fragmentation point), I issued ping -s 9000
192.168.168.3 (note, ethernet header adds 14 bytes to IP datagrams).
netperf -c -C -f K -l 60 -H
192.168.168.3 ("-c" show local CPU, "-C"
show remote CPU, "-f K" report in KB/s, "-l 60"
= 60 sec. test) 1. with 9000 MTU, yields 120111.19 KB/s
7.24% local CPU 9.00% remote CPU 2. with 1500 MTU, yields
114905.40 KB/s 18.50% local CPU 40.83% remote CPU 3. with
9000 MTU, running test in each direction simultaneously,
94587.40/94568.85 13.08/14.37 14.37/13.10
tg3 driver, SMC 8505T - switch only processes
frames <=8996 bytes (tested by sending ICMP echo requests of
increasing size until I hit a ceiling, the largest packet sent with
"ping -s 8954 192.168.168.3"). Note, because of the 8996
limit of the switch, I had to set the MTU via ifconfig to 8982 for
my netperf tests.
netperf -c -C -f K
-l 60 -H 192.168.168.3 ("-c" show local CPU,
"-C" show remote CPU, "-f K" report in KB/s, "-l
60" = 60 sec. test) 1.with 8982 MTU, yields 96132.05
KB/s 5.97% local CPU 6.42% remote CPU 2.with 1500 MTU,
yields 114902.26 KB/s 18.37% local CPU 39.56% remote CPU 3.with
8982 MTU, running test in each direction simultaneously,
49415.38/47241.83 5.79/6.72 6.73/5.78. This test was repeated MANY
times with the total throughput being between 95-98MB/sec (not
always evenly distributed). 4.with 1500 MTU, running test
in each direction simultaneously, 111594.47/110839.93 69.32/70.52
70.78/69.55. This test was repeated several times, with the total
throughput being between 222.3-222.8MB/s.
tg3 driver, SMC 8508T - switch only processes
frames <=8996 bytes (tested by sending ICMP echo requests of
increasing size until I hit a ceiling, the largest packet sent with
"ping -s 8954 192.168.168.3"). Note, because of the 8996
byte limit of the switch, I had to set the MTU via ifconfig to 8982
for my netperf tests.
netperf -c -C -f
K -l 60 -H 192.168.168.3 ("-c" show local CPU,
"-C" show remote CPU, "-f K" report in KB/s, "-l
60" = 60 sec. test) 1.with 8982 MTU, yields 96013.90
KB/s 6.11% local CPU 6.39% remote CPU 2.with 1500 MTU,
yields 114899.13 KB/s 18.45% local CPU 40.26% remote CPU
This graph
shows total throughput of two machines, attached via crossover
cable, running netperf. The bottom plot shows avg cpu % of the two
machines while the top plot shows total throughput:

This graph
shows total throughput of two machines, attached via 8505T switch,
running netperf. The bottom plot shows avg cpu % of the two machines
while the top plot shows total throughput:

This graph shows
total throughput of two machines, attached via crossover cable,
running netperf (as described above). The top plot is the total
throughput with the given MTU. The bottom plot shows cpu % of the
Sunfire x2100, using the Nvidia driver. The middle plot is the cpu %
of the machine controlling the test. I find the results strange in
their uniformity, so am trying to reproduce/retest:

SMC 8505T Benchmarking This
graph shows total
throughput of two machines, both Supermicro H8DAR-T motherboards, TCP
settings were not optimized, attached via 8505T switch, and running
netperf. The bottom plot shows avg cpu % of the two machines while
the top plot shows total throughput: -Average CPU Usage
This graph
shows total throughput of two machines, both Supermicro H8DAR-T
motherboards, TCP settings were not optimized, attached via 8505T
switch, and running netperf. The bottom plots shows both machines cpu
usage while the top plot shows total throughput. -Individual
control and slave CPU usage
This graph
shows total throughput of two machines, both Supermicro H8DAR-T
motherboards, TCP settings were not optimized, attached via 8505T
switch, and running netperf. The bottom (blue) plot shows both
machines cpu usage while the top (red) plot shows total throughput
and finally the middle (violet) plot is instructions per
byte. -Instruction per byte
SMC
8508T Benchmarking This
graph shows total
throughput of two machines, both Supermicro H8DAR-T motherboards, TCP
settings were not optimized, attached via 8508T switch, and running
netperf. The bottom plot shows avg cpu % of the two machines while
the top plot shows total throughput: -Average CPU Usage
This
graph
shows total throughput of two machines, both Supermicro H8DAR-T
motherboards, TCP settings were not optimized, attached via 8508T
switch, and running netperf. The bottom plots shows both machines cpu
usage while the top plot shows total throughput. -Individual
control and slave CPU usage
This graph
shows total throughput of two machines, both Supermicro H8DAR-T
motherboards, TCP settings were not optimized, attached via 8508T
switch, and running netperf. The bottom (blue) plot shows both
machines cpu usage while the top (red) plot shows total throughput
and finally the middle (violet) plot is instructions per
byte. -Instruction per byte
D-Link DGS-108 Benchmarking This
graph
shows total throughput of two machines, both Supermicro
H8DAR-T motherboards, TCP settings were not optimized, attached via
DGS-108 switch, and running netperf. The bottom plot shows avg cpu %
of the two machines while the top plot shows total throughput:
-Average CPU Usage
This graph
shows total throughput of two machines, both Supermicro H8DAR-T
motherboards, TCP settings were not optimized, attached via DGS-108
switch, and running netperf. The bottom plots shows both machines cpu
usage while the top plot shows total throughput. -Individual
control and slave CPU usage
This graph
shows total throughput of two machines, both Supermicro H8DAR-T
motherboards, TCP settings were not optimized, attached via DGS-108
switch, and running netperf. The bottom (blue) plot shows both
machines cpu usage while the top (red) plot shows total throughput
and finally the middle (violet) plot is instructions per
byte. -Instruction per byte
SMCGS5
Benchmarking This
graph
shows total throughput of two machines, both Supermicro
H8DAR-T motherboards, TCP settings were not optimized, attached via
SMCGS5 switch, and running netperf. The bottom plot shows avg cpu %
of the two machines while the top plot shows total throughput:
-Average CPU Usage
This graph
shows total throughput of two machines, both Supermicro H8DAR-T
motherboards, TCP settings were not optimized, attached via SMCGS5
switch, and running netperf. The bottom plots shows both machines cpu
usage while the top plot shows total throughput. -Individual
control and slave CPU usage
This graph
shows total throughput of two machines, both Supermicro H8DAR-T
motherboards, TCP settings were not optimized, attached via SMCGS5
switch, and running netperf. The bottom (blue) plot shows both
machines cpu usage while the top (red) plot shows total throughput
and finally the middle (violet) plot is instructions per
byte. -Instruction per byte
SMCGS8
Benchmarking This
graph
shows total throughput of two machines, both Supermicro H8DAR-T
motherboards, TCP settings were not optimized, attached via SMCGS8
switch, and running netperf. The bottom plot shows avg cpu % of the
two machines while the top plot shows total throughput -Average
CPU Usage
This graph
shows total throughput of two machines, both Supermicro H8DAR-T
motherboards, TCP settings were not optimized, attached via SMCGS8
switch, and running netperf. The bottom plots shows both machines cpu
usage while the top plot shows total throughput. -Individual
control and slave CPU usage
This graph
shows total throughput of two machines, both Supermicro H8DAR-T
motherboards, TCP settings were not optimized, attached via SMCGS8
switch, and running netperf. The bottom (blue) plot shows both
machines cpu usage while the top (red) plot shows total throughput
and finally the middle (violet) plot is instructions per
byte. -Instruction per byte
Crossover Cable Benchmarking This
graph
shows total throughput of two machines, both Supermicro
H8DAR-T motherboards, TCP settings were not optimized, attached via a
crossover cable and running netperf. The bottom plot shows avg cpu %
of the two machines while the top plot shows total
throughput -Average CPU Usage
This graph
shows total throughput of two machines, both Supermicro H8DAR-T
motherboards, TCP settings were not optimized, attached via a
crossover cable and running netperf. The bottom plots shows both
machines cpu usage while the top plot shows total
throughput. -Individual control and slave CPU usage
This graph
shows total throughput of two machines, both Supermicro H8DAR-T
motherboards, TCP settings were not optimized, attached via a
crossover cable and running netperf. The bottom (blue) plot shows
both machines cpu usage while the top (red) plot shows total
throughput and finally the middle (violet) plot is instructions per
byte. -Instruction per byte
|