lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 7 Sep 2012 09:40:18 +0800
From:	clplayer <cl.player@...il.com>
To:	linux-kernel@...r.kernel.org
Subject: Content Of Files May Be Changed After One Disk Is Failed In RAID5

I am stressing the RAID5 functions on my desktop.

I installed 8 hard disks which 4 were on the internal SATA ports and
the others were connected via eSATA.

The operating system on the desktop is Ubuntu 12.04.1 LTS 64-bit.

I have made a script to check the files in the raid while there are
disks becoming failed.

The actions are as below:

1. creating an 8-disk raid, one of the 8 disks is set as the spare.
2. making a ext4 file system on the raid and mounting that raid.
3. generating a file from /dev/urandom in the root file system, and
the size of the file is 1GB.
4. calculating the checksum of the file by the command "cksum."
5. making 10 duplicates of the file and store in the raid, and then
calculating the checksums of each duplicate.
6. setting one of the disks in the raid to be failed after the 10
duplicates are stored and checked.
7. parallelly calculating the checksums of the duplicates again immediately.

Curiously, there are usually several files changed and the checksums
are not consistent.

Then I tried the same senario with the 8-disk reaid with no spare, and
the results is the same.

I have also tried with RAID1 and RAID6, and the checksums are
consistent with the two algorithms.

It looks like there are something wrong within the raid5 functions. I
am tracing the file raid5.c but I can not figure out the

root causes yet.

Would someone please suggest any ideas? Thank you very much.

My script is attached below:

#!/bin/sh

TESTSEQ="0 1 2 3 4 5 6 7 8 9"

mdadm --create /dev/md0 --level=raid5 --raid-devices=7
--spare-devices=1 /dev/sd[a-h]3 --assume-clean -z 10485760 -f -R

mkfs.ext4 /dev/md0

mount /dev/md0 /mnt

#duplicating the source file and calculating the checksum
for ITEM in $TESTSEQ
do
        echo "copying 1Gr.${ITEM}..."
        cp /1Gr /mnt/1Gr.${ITEM}

        cksum /mnt/1Gr.${ITEM} >> /tmp/cksum_org.${ITEM}
        cat /tmp/cksum_org.${ITEM} | while read tmpline
        do
                orgcksum=${tmpline%% *}
                echo "checksum is ${orgcksum}"
        done
done

sync

sleep 10

mdadm -f /dev/md0 /dev/sdb3

echo "producing checksum..."
for ITEM in $TESTSEQ
do
        cksum /md0/1Gr.${ITEM} > /tmp/cksum_out.${ITEM} &
done

#wait for the 10 cksum process being done
sleep 120

echo "checking the result..."
for ITEM in $TESTSEQ
do
        cat /tmp/cksum_out.${ITEM} | while read line
        do
                item=${line%% *}

		#the value 2606882893 was pre-calculated manually
                if [ x"$item" != "x2606882893" ]
                then
                        echo "get wrong cksum on ${ITEM}"
                else
                        rm /tmp/cksum_out.${ITEM}
                fi
        done
done

Thanks.
Peng.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ