mcelog
Stress testing a PC revisited
I’m still using mostly the same tools for stress testing PCs as when I last wrote about this topic. memtest86+ in particular continues to be very useful. In practice, the instrumentation in most PCs still isn’t good enough to identify which DIMM is failing most of the time (mcelog sometimes makes a suggestion about which DIMM has failed and EDAC can also be helpful, but in my experience there is lots of hardware out there which doesn’t support these tools well). The easiest approach I’ve found to date is to take out one DIMM at a time and re-run memtest86+ … when the errors go away you’ve found your problematic DIMM – put it back in again and re-run to make sure you’ve identified the problem. If you keep getting the errors regardless of which DIMMs are installed, you may be looking at a problem with the memory controller (either on the processor or the motherboard depending on which type of processor you are using) – if you have identical hardware, you should look at swapping the components into that for further testing.
Breakin is a tool recently announced on the beowulf mailing list which looks like it has a lot of potential also and I plan on adding it to my stress testing toolkit the next time I encounter a problem which looks like a possible hardware problem. What looks nice about Breakin is that it tests all of the usual suspects including processor, memory, hard drives and it includes support for temperature sensors, MCE logging and EDAC. This is attractive from the perspective of being able to fire it up, walk away and come back to check on progress 24 hours later.
Finally, we’ve found the Intel MPI Benchmarks (IMB, previously known as the Pallas MPI benchmark) to be pretty good at stress testing systems. Anyone conducting any kind of qualification or UAT on PC hardware, particularly hardware intended to be used in HPC applications should definitely be including
an IMB run as part of their tests.
Categories
Archives
- September 2010
- February 2010
- November 2009
- September 2009
- August 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- November 2007
- September 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- September 2006
- July 2006
- June 2006
- April 2006