memtestx86+

Stress testing a PC revisited

Thursday, September 25th, 2008 | hardware, linux, useful tools | No Comments

I’m still using mostly the same tools for stress testing PCs as when I last wrote about this topic. memtest86+ in particular continues to be very useful. In practice, the instrumentation in most PCs still isn’t good enough to identify which DIMM is failing most of the time (mcelog sometimes makes a suggestion about which DIMM has failed and EDAC can also be helpful, but in my experience there is lots of hardware out there which doesn’t support these tools well). The easiest approach I’ve found to date is to take out one DIMM at a time and re-run memtest86+ … when the errors go away you’ve found your problematic DIMM – put it back in again and re-run to make sure you’ve identified the problem. If you keep getting the errors regardless of which DIMMs are installed, you may be looking at a problem with the memory controller (either on the processor or the motherboard depending on which type of processor you are using) – if you have identical hardware, you should look at swapping the components into that for further testing.

Breakin is a tool recently announced on the beowulf mailing list which looks like it has a lot of potential also and I plan on adding it to my stress testing toolkit the next time I encounter a problem which looks like a possible hardware problem. What looks nice about Breakin is that it tests all of the usual suspects including processor, memory, hard drives and it includes support for temperature sensors, MCE logging and EDAC. This is attractive from the perspective of being able to fire it up, walk away and come back to check on progress 24 hours later.

Finally, we’ve found the Intel MPI Benchmarks (IMB, previously known as the Pallas MPI benchmark) to be pretty good at stress testing systems. Anyone conducting any kind of qualification or UAT on PC hardware, particularly hardware intended to be used in HPC applications should definitely be including
an IMB run as part of their tests.

Tags: , , , , , ,