linux-kernel - Re: Content Of Files May Be Changed After One Disk Is Failed In RAID5

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120907123348.798dfc28@notabene.brown>
Date:	Fri, 7 Sep 2012 12:33:48 +1000
From:	NeilBrown <neilb@...e.de>
To:	clplayer <cl.player@...il.com>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: Content Of Files May Be Changed After One Disk Is Failed In
 RAID5

On Fri, 7 Sep 2012 09:40:18 +0800 clplayer <cl.player@...il.com> wrote:

> I am stressing the RAID5 functions on my desktop.
> 
> I installed 8 hard disks which 4 were on the internal SATA ports and
> the others were connected via eSATA.
> 
> The operating system on the desktop is Ubuntu 12.04.1 LTS 64-bit.
> 
> I have made a script to check the files in the raid while there are
> disks becoming failed.
> 
> The actions are as below:
> 
> 1. creating an 8-disk raid, one of the 8 disks is set as the spare.
> 2. making a ext4 file system on the raid and mounting that raid.
> 3. generating a file from /dev/urandom in the root file system, and
> the size of the file is 1GB.
> 4. calculating the checksum of the file by the command "cksum."
> 5. making 10 duplicates of the file and store in the raid, and then
> calculating the checksums of each duplicate.
> 6. setting one of the disks in the raid to be failed after the 10
> duplicates are stored and checked.
> 7. parallelly calculating the checksums of the duplicates again immediately.
> 
> Curiously, there are usually several files changed and the checksums
> are not consistent.
> 
> Then I tried the same senario with the 8-disk reaid with no spare, and
> the results is the same.
> 
> I have also tried with RAID1 and RAID6, and the checksums are
> consistent with the two algorithms.
> 
> It looks like there are something wrong within the raid5 functions. I
> am tracing the file raid5.c but I can not figure out the
> 
> root causes yet.
> 
> Would someone please suggest any ideas? Thank you very much.
> 
> My script is attached below:
> 
> #!/bin/sh
> 
> TESTSEQ="0 1 2 3 4 5 6 7 8 9"
> 
> mdadm --create /dev/md0 --level=raid5 --raid-devices=7
> --spare-devices=1 /dev/sd[a-h]3 --assume-clean -z 10485760 -f -R

--assume-clean is not safe with RAID5 unless the array actually is clean.
It is safe with RAID1 and RAID6 due to details of the specific implementation.
So I suspect that is the cause of the corruption.

NeilBrown

> 
> mkfs.ext4 /dev/md0
> 
> mount /dev/md0 /mnt
> 
> #duplicating the source file and calculating the checksum
> for ITEM in $TESTSEQ
> do
>         echo "copying 1Gr.${ITEM}..."
>         cp /1Gr /mnt/1Gr.${ITEM}
> 
>         cksum /mnt/1Gr.${ITEM} >> /tmp/cksum_org.${ITEM}
>         cat /tmp/cksum_org.${ITEM} | while read tmpline
>         do
>                 orgcksum=${tmpline%% *}
>                 echo "checksum is ${orgcksum}"
>         done
> done
> 
> sync
> 
> sleep 10
> 
> mdadm -f /dev/md0 /dev/sdb3
> 
> echo "producing checksum..."
> for ITEM in $TESTSEQ
> do
>         cksum /md0/1Gr.${ITEM} > /tmp/cksum_out.${ITEM} &
> done
> 
> #wait for the 10 cksum process being done
> sleep 120
> 
> echo "checking the result..."
> for ITEM in $TESTSEQ
> do
>         cat /tmp/cksum_out.${ITEM} | while read line
>         do
>                 item=${line%% *}
> 
> 		#the value 2606882893 was pre-calculated manually
>                 if [ x"$item" != "x2606882893" ]
>                 then
>                         echo "get wrong cksum on ${ITEM}"
>                 else
>                         rm /tmp/cksum_out.${ITEM}
>                 fi
>         done
> done
> 
> Thanks.
> Peng.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


Download attachment "signature.asc" of type "application/pgp-signature" (829 bytes)