Linux RAID volume and drive ordering

On some of my systems, I’ve noticed that a RAID logical volume exposed through an LSI RAID card, is enumerated by the Linux kernel as a drive ‘before’ the internal drives, instead of ‘after’ the internal drives. This happens only on some.

The system has 2 internal drives (SATA) for the OS and data, and an external JBOD connected to an LSI RAID card, hosting a RAID volume. So, the kernel provides a /dev/sda device node for it, and /dev/sdb and /dev/sdc for the internal drives.

On other very similar systems, the RAID volume is /dev/sdc, which is fine.

This can be a problem, if you have systems/scripts/people that make certain assumptions, on device names. In the system, the RAID volume is optional, and therefore, I want the RAID volume to ‘always’ appear as /dev/sdc, if present.

Investigation

So, how do I go about finding more about this weird/broken behaviour?

At first, my guess was that this could be due to differing kernel versions. So, I check the kernel versions… 3.10.0… they are the same on all my systems.

I ensured the RAID card is plugged in to the same slot on the motherboards on all systems.

I ensured that the internal drives are connected to the same motherboard internal connectors on all the systems.

I visit the BIOS settings, just in case there is something there controlling this sort of thing. All same.

So, let’s see what the dmesg command is telling me.

On the broken system:

[    2.227082] sd 0:2:0:0: [sda] 11718813696 512-byte logical blocks: (6.00 TB/5.45 TiB)
[    2.227088] sd 0:2:0:0: [sda] 4096-byte physical blocks
[    2.227092] sd 2:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[    2.227138] sd 3:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)

On the fine system:

[    2.001546] sd 3:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[    2.001561] sd 4:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
...
[   10.030057] sd 9:2:0:0: [sdc] 11718813696 512-byte logical blocks: (6.00 TB/5.45 TiB) 

Notice, how the 6TB RAID volume come up early (around 2 sec into boot up) on the broken system, and fairly late on the fine system. Hmmm.

Also, notice how the numbers are changing. A bit of googling, tells me that the numbers are the SCSI host:bus:target:lun numbers.

Could the host number be somehow tied to the PCI numbering? But, these are identical systems, so even the PCI numbering is the same.

Now, I do know that the LSI RAID card requires a kernel module/driver. Turns out, it’s called megaraid_sas. And, dmesg can tell me more.

On the broken system:

[    1.201278] megasas: FW now in Ready state
[    1.201301] megaraid_sas 0000:01:00.0: irq 42 for MSI/MSI-X
[    1.201310] megaraid_sas 0000:01:00.0: firmware supports msix    : (0)
[    1.201312] megaraid_sas 0000:01:00.0: current msix/online cpus  : (1/8)
...
[    1.243024] megaraid_sas 0000:01:00.0: controller type   : MR(512MB)
[    1.243027] megasas_init_mfi: fw_support_ieee=0
[    1.243029] megasas: INIT adapter done
...
[    1.285022] megaraid_sas 0000:01:00.0: pci id        : (0x1000)/(0x0079)/(0x1000)/(0x9277)
[    1.285025] megaraid_sas 0000:01:00.0: unevenspan support    : no
[    1.285026] megaraid_sas 0000:01:00.0: disable ocr       : no
[    1.285027] megaraid_sas 0000:01:00.0: firmware crash dump   : no
[    1.285028] megaraid_sas 0000:01:00.0: secure jbod       : no
[    1.285031] scsi host0: Avago SAS based MegaRAID driver
[    1.287078] scsi 0:2:0:0: Direct-Access     LSI      MR9280-16i4e     2.13 PQ: 0 ANSI: 5

On the fine system:

[    9.930483] megasas: 06.807.10.00-rh1
[    9.931087] megasas: FW now in Ready state
[    9.931113] megaraid_sas 0000:01:00.0: irq 51 for MSI/MSI-X
[    9.931123] megaraid_sas 0000:01:00.0: firmware supports msix    : (0)
[    9.931125] megaraid_sas 0000:01:00.0: current msix/online cpus  : (1/8)
[    9.942798] input: PC Speaker as /devices/platform/pcspkr/input/input8
[    9.973019] megaraid_sas 0000:01:00.0: controller type   : MR(512MB)
[    9.973023] megasas_init_mfi: fw_support_ieee=0
[    9.973036] megasas: INIT adapter done
[   10.015081] megaraid_sas 0000:01:00.0: pci id        : (0x1000)/(0x0079)/(0x1000)/(0x9277)
[   10.015090] megaraid_sas 0000:01:00.0: unevenspan support    : no
[   10.015095] megaraid_sas 0000:01:00.0: disable ocr       : no
[   10.015100] megaraid_sas 0000:01:00.0: firmware crash dump   : no
[   10.015104] megaraid_sas 0000:01:00.0: secure jbod       : no
[   10.015115] scsi host9: Avago SAS based MegaRAID driver
[   10.019701] scsi 9:2:0:0: Direct-Access     LSI      MR9280-16i4e     2.13 PQ: 0 ANSI: 5

Again, look how the driver comes up pretty early on, in the broken system.

So, what else could cause this?

UDEV? I know it’s used for device persistent naming. Quick googling tells me that UDEV can only rename already named kernel devices, for example, by providing symlinks in the /dev/disk/by-(path|id|label) areas. So, deciding kernel device names is not it’s job.

Breakthrough

Bit more thinking, the timings in dmesg are telling me something.

Let’s think about how drivers are used. If a kernel needs to boot up, it needs drivers to work with. These drivers can be built into the kernel image. A quick double check of the kernel image dates on all systems, indicate that this is the exact same kernel, which means that the kernel has not been rebuilt.

$ ls -l /boot/
total 136976
...
-rw-r--r--. 1 root root 32891431 Feb  2 13:43 initramfs-3.10.0-327.el7.x86_64.img
...
-rwxr-xr-x. 1 root root  5156528 Nov 19  2015 vmlinuz-3.10.0-327.el7.x86_64

But, notice the timestamp on the initramfs, which is newer. That’s where the drivers can be as well. Side note – initramfs is similar to older initrd images, that contain a the extra stuff the kernel needs to boot up properly.

It might have a later timestamp, if it was rebuilt, for example, if NVIDIA display drivers are installed.

Fix

I’m aware of blacklisting drivers from the initramfs. So, i experimented by adding the following line to the grub call invoking the kernel:

rd.driver.blacklist=megaraid_sas

Whoa! This fixed my broken system.

From, there on it was a matter of making the fix permanent, by appending to this line in /etc/default/grub properly.

GRUB_CMDLINE_LINUX="... rd.driver.blacklist=megaraid_sas"

Followed by:

$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg

And reboot/test.

SED to the rescue

Making this change on all systems is such a pain. So, I must use sed for this. At least, I think it should be possible.

directive="rd.driver.blacklist=megaraid_sas"

sed -i'.bak' "s/GRUB_CMDLINE_LINUX=\"\(.*\)\"/GRUB_CMDLINE_LINUX=\"\1 $directive\"/" /etc/default/grub

Of course, check if the directive is there in the first place using grub.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s