UWMLSC > Beowulf Systems > Nemo
   Untitled Document

Analysis of MCE errors.

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/40555.pdf use of interleaving for optimizing performance. p37, p47
http://www.amd.com/us-en/0,,3715_13530_13515_8806~85257,00.html FAQ on Opteron 100.
ftp://ftp.software.ibm.com/eserver/benchmarks/wp_x3455_081506.pdf IBM performance testing of opteron processors including interleaving. p10
http://docs.sun.com/source/817-5248-16/chapter2.html Sun documentation on bios tuning. table 2-3 node interleaving stated conditions for use.
http://ieeexplore.ieee.org/iel5/8535/27072/01202441.pdf?arnumber=1202441 paper on the current use of interleaving for preformance gain.

This graph shows the machines that have fallen down due to an MCE error, since 9-26-06.

Useful dates:
  • 3/07/2007 took delivery of Kingston Samsung 838C RAM
  • 4/27/2007 took delivery of Kingston Qlumonda RAM
  • 6/15/2007 bios updated to include use of ECC scrubbing
  • Kingston/Samsung 838C RAM down per day:

    Kingston/Samsung 838D RAM down per day:

    Kingston/Qlmonda RAM down per day:

    These three graphs are the totals for Broken RAM by Chipset. Out of 304 Problem reports only 121 had RAM types listed. Some reports included two broken sticks of RAM which is not illustrated in these graphs. And lastly the other chipsets, Wintech, Kinston/Kingston, Kingston/MT, and ATP are not included.

    This graph is the analysis of the speed for completion of lalapps power function using deferent RAM configurations.
    All this data was collected from machines proven not to fall down over the course of 2 days.
    Check this page for dead links, sloppy HTML, or a bad style sheet; or strip it for printing.
    Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.