Archive for September, 2009

Ubuntu 9.04 Fake RAID problems

Tuesday, September 15th, 2009 | hardware, linux | 2 Comments

So we have RAIDa technology that allowed computer users to achieve high levels of storage reliability from low-cost and less reliable PC-class disk-drive components, via the technique of arranging the devices into arrays for redundancy to quote the Wikipedia article.

In the beginning, manufacturers created dedicated hardware controllers to which disks were attached. These controllers include their own processor and memory and handle all the RAID functionality within the black box they present to the system (the good ones will even include a battery that lets the controller run for long enough in the event of a power failure so that any data stored in the RAID controller’s cache memory isn’t lost but can be written to the drives when the power comes back). As far as the system the controller is attached to is concerned – the RAID controller is one big disk. This is called hardware RAID.

As machines have gotten more powerful, most machines (certainly most desktop machines) are sitting idle most of the time, so it has become feasible to start using the system for operating system level tasks like providing RAID. All mainstream operating systems provide some form of this software RAID which performs exactly the same functionality as the hardware RAID controller above, but using the system’s processor and memory. There are advantages and disadvantages to both approaches (I’m increasingly leaning towards using software RAID on Linux – low end hardware RAID controllers aren’t very reliable and tend to be slow from an I/O perspective – most modern Linux servers tend to have multiple processor cores which are sitting idle most of the time and are perfectly suited to driving a RAID array) but they both work reasonably well.

In between these two comes something described as Firmware/driver-based RAID, HostRAID or Fake RAID. This is provided by cheap RAID controllers that do not implement all the RAID functionality (normally they are standard disk controllers with some special firmware) and utilise the main system processor for most of the heavy-lifting. They also rely on dedicated operating drivers to provide the RAID functionality, hence the name Fake RAID. I’m not a fan of Fake RAID controllers – apart from the fact that the manufacturers of these controllers rarely make it clear that they are not fully functional RAID controllers, their reliance on elaborate driver software makes them less reliable than hardware RAID but more complex to maintain than true software RAID. They are reasonably well supported under Linux these days using the Device-Mapper Software RAID Tool (aka dmraid) but personally, I prefer to use a Fake RAID controller as a standard SATA controller and if I require RAID on such a system, implement it using Linux’s excellent Software RAID support.

Up to recently, when people installed Ubuntu – if they did want to use their Fake RAID controller as a RAID controller, they ran into the problem of the installer not including dmraid support. Using Ubuntu 9.04 (Jaunty) – the installer detects at least some Fake RAID controllers and prompts you as to whether to use this controller via dmraid or not. If you choose not to, you will then be able to use it as a normal SATA controller.

I ran into an interesting problem on a recent reinstall of Ubuntu 9.04 onto a Supermicro X7DVL system which includes an Intel 631xESB/632xESB I/O controller which supports some sort of Fake RAID (Intel seems to call their Fake RAID Matrix Storage Technology). Given my stance on Fake RAID, I immediately disabled this in the BIOS by changing the controller to compatible mode (the datasheet above suggests this should disable RAID). When installing Ubuntu, the installer still detected the Fake RAID volumes and offered to configure dmraid for me. I declined the option and the native SATA disks (unRAIDed) were presented to me and fully partitioned and formatted.

I thought nothing more of this until I rebooted after completing the installation. The system booted as far as GRUB before dumping the message

No block devices found

It took me a while to figure out what was going on. Google turned up lots of people who had problems with Ubuntu and dmraid, but generally they were having the opposite problem of wanting to use dmraid but the installer not supporting it (like DMRAID on Ubuntu with SATA fakeraiddmraid missing from livecd and Need dmraid to support fakeraid). Presumably most of these problems have been fixed with the inclusion of dmraid in Jaunty.

This was the clue for me – I finally figured out (with some help from bug 392510 I must admit) that even though I had declined to use dmraid during the install, the newly installed operating system still contained dmraid and was loading the dmraid kernel modules at boot-time. This resulted in the kernel seeing some dmraid volumes rather than the partitions I had created during the OS install.

Once I figured that out, fixing the problem was relatively straightforward,

  1. Reboot with the Ubuntu 9.04 install cd and select Rescue broken system.
  2. When the rescue boot has been configured, select Execute shell and chroot into the installed environment.
  3. aptitude purge dmraid (this removes the dmraid software and the dmraid kernel modules from the initramfs).
  4. Reboot and enjoy your new OS.

Two things that I found misleading here are,

  • I had declined to use dmraid during the Ubuntu install, but it still included this functionality during installation
  • I had disabled SATA RAID in the BIOS but it was still visible to Ubuntu. I notice a newer version of the BIOS from Supermicro which may fix this problem but since Supermicro don’t include change log in their BIOS releases it’s hard to tell without going to the trouble of actually installing the update.

I should probably log a bug against the dmraid package in Ubuntu (if I get
around to it, it should appear against the dmraid package) – bug 392510 talks about supporting a nodmraid option to the kernel at boot time which would explicitly disable dmraid, I think this could be a good idea (Fedora apparently already does this).

Update 1: Bug 311637 already addresses this problem so I’ve added a comment to this.

Update 2: Upgrading the Supermicro system to the latest BIOS and disabling the Fake RAID controller through the BIOS seems to fix this problem also.

Tags: , , , , ,

Passing kernel module parameters in Ubuntu 8.10

Wednesday, September 2nd, 2009 | hardware, linux | 2 Comments

Sorry for the mouthful of a title but I wanted to use something that would show up for the kind of queries I was firing into Google yesterday in a vain attempt to solve my problem.

A little background first: I’m working with some SuperMicro Twin systems (basically, two system boards in a single 1U chassis sharing a power supply – not as compact as a blade but not bad) which includes a nVidia MCP55V Pro Chipset Dual-port LAN / Ethernet Controller.  On Ubuntu 8.10 at least, this uses the forcedeth driver (originally a cleanroom implementation of a driver which competed with a proprietary offering from Nvidia – it now seems to have superceded that driver).

I noticed while doing large network transfers to or from one of these machines that the load on the machine seemed to spike. Running dmesg show a lot of these messages,

[1617484.523059] eth0: too many iterations (6) in nv_nic_irq.
[1617484.843113] eth0: too many iterations (6) in nv_nic_irq.
[1617484.869831] eth0: too many iterations (6) in nv_nic_irq.
[1617485.101377] eth0: too many iterations (6) in nv_nic_irq.
[1617485.855067] eth0: too many iterations (6) in nv_nic_irq.
[1617485.896692] eth0: too many iterations (6) in nv_nic_irq.

Google returns lots of results for this message – some people seem to have experienced complete lockups while others noted slowdowns. The proposed solution seems to be to pass the max_interrupt_work option to the forcedeth kernel module to increase the maximum events handled per interrupt. The driver seems to default to running in throughput mode where each packet received or transmitted generates an interrupt (which would presumably ensure the fastest possible transfer of the data) but can also be configured to operate in CPU mode (aka poll mode) where the interrupts are controlled by a timer (I’m assuming this makes for higher latency of transfers but smoother before for high network loads – the documentation on this is a little thin). This behaviour is controlled by the optimization_mode option. You can investigate the options which can be passed to any kernel module by running,

modinfo <module name>

for example,

modinfo forcedeth

So, as an initial pass at tuning the behaviour on the server, I decided to pass the following options to the forcedeth driver.

max_interrupt_work=20 optimization_mode=1

The standard way on Ubuntu 8.10 to set kernel module parameters is to add a line to /etc/modprobe.d/options like the following

options forcedeth max_interrupt_work=20 optimization_mode=1

I tried this and then tried rebooting my system but found it didn’t have any effect (I started seeing the same error messages after running a large network transfer and the max iterations were still referenced as 6 rather than the 20 I should be seeing if my options were parsed).

After trying a few things and brainstorming with the good people on #ubuntu-uk I figured out that you need to run the following command, after modifying /etc/modprobe.d/options

sudo update-initramfs -u

This presumably updates the initramfs image to include the specified options. It is possible this is only needed for modules that are loaded early on in the boot process (such as the network driver) – I haven’t had the need to modify other kernel modules recently (and this stuff seems to change subtly between distributions and even versions).

Following another reboot and a large network transfer, the error messages suggesting that the options are now being parsed (some kernel modules allow you to review the parameters currently being used by looking in the file /sys/modules/<module name>/parameters but forcedeth doesn’t seem to support this).

I figured the trick with update-initramfs was worth publishing for the next person who runs into this problem. I’d also love to hear from others using the forcedeth driver as to what options they find useful to tune the throughput.

Tags: , , , , ,