linux-kernel - PROBLEM: Kernel panic and system crash during RAID disk failure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.1110300257510.32513@hytron.hytron.net>
Date:	Sun, 30 Oct 2011 03:27:28 -0400 (EDT)
From:	Darko <darko@...ron.net>
To:	Thomas Gleixner <tglx@...utronix.de>
cc:	linux-kernel@...r.kernel.org
Subject: PROBLEM: Kernel panic and system crash during RAID disk failure

Hello,

I have been doing some testing with the md RAID driver and I think I 
discovered a problem with it.
Everything was performed on a system with a single hard drive using loop 
devices as virtual raid devices.
So here is the setup:
/dev/sdc is my main drive that hold entire Linux OS and has one partition.
in the /tmp I created 7 files, 100MB each and associated them with loop 
devices:

losetup -a
/dev/loop0: [0821]:294820 (/var/tmp/raid-0)
/dev/loop1: [0821]:294857 (/var/tmp/raid-1)
/dev/loop2: [0821]:300120 (/var/tmp/raid-2)
/dev/loop3: [0821]:301073 (/var/tmp/raid-3)
/dev/loop4: [0821]:301074 (/var/tmp/raid-4)
/dev/loop5: [0821]:301075 (/var/tmp/raid-5)
/dev/loop6: [0821]:301076 (/var/tmp/raid-6)

The next step was, created an RAID6 array:
mdadm --create /dev/md10 --level=6 -raid-deviced=7 /dev/loop[0-6]

Here is how it looks so far:

cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md10 : active raid6 loop6[6] loop5[5] loop4[4] loop3[3] loop2[2] loop1[1] loop0[0]
       499200 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/7] [UUUUUUU]

Then the filesystem...
mkfs.ext4 -b 4096 -i 4096 -m 0 /dev/md10

Mounting the file system to a folder called 'A' right in the root of my 
system:

mount /dev/md10 /A

Then I copied a few files on that file system. So far everything is good.

Then I purposly failed 2 drives:
mdadm --manage /dev/md10 --fail /dev/loop0
mdadm --manage /dev/md10 --fail /dev/loop1

The array continues to run fine in degraded mode. I was wondering what 
would happen if another drive fails. So while I was doing a write 
operating right in that filesystem (/dev/md10) using:
dd if=/dev/zero of=testfile bs=1k count=360000  ...

...quickly switched to a different console and entered the command:
mdadm --manage /dev/md10 --fail /dev/loop2

...which made 3 failed drives and the array can no longer work...

Well I would be happy to see just the array not working, but kernel panic 
in both versions 2.6.37.4 and 3.0.8 made me worry that it is serious bug 
and appears to be in older and newer kernels as well.
I repeated this several times, and mostly the machine gets locked up with 
kernel panic. But once I was able to get it not to lock up all the way, 
and that is how I have dmesg output.

The attached files include dmesg from the system startup until the bug 
trace, and some additional information regarding my system that might be 
helpful.

For any additional question, please feel free to contact me!

I hope this info helps someone find and resolve the problem in the code.

Thanks,

--
Darko Kraus
Enterprise Network Administrator

Download attachment "Kernel.panic.MD.tgz" of type "APPLICATION/octet-stream" (28728 bytes)