[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <d00698fb0701181211l480966e5o6f10db9300d4c8aa@mail.gmail.com>
Date: Thu, 18 Jan 2007 21:11:58 +0100
From: noah <noah123@...il.com>
To: linux-kernel@...r.kernel.org
Subject: Data corruption with raid5/dm-crypt/lvm/reiserfs on 2.6.19.2
Hi!
I'm experiencing data corruption in the following setup:
1. mdadm --create /dev/md0 -n3 -lraid5 /dev/hda1 /dev/hdc1 /dev/hde1
2. cryptsetup -c aes-cbc-essiva:sha256 luksFormat /dev/md0 mykey
3. cryptsetup -d mykey luksOpen /dev/md0 cryptvol
4. pvcreate /dev/mapper/cryptvol
5. vgcreate vg0 /dev/cryptvol
6. lvcreate -n root -L10G vg0
7. mkreiserfs -q /dev/vg0/root
8. mkdir /.newroot; mount /dev/vg0/root /.newroot
9. mkdir /.realroot; mount -o bind / /.realroot
10. tar cf - -C /.realroot|tar xvpf - -C /.newroot
With Linux 2.6.18 (it's broken, OK, but there's still something wrong
even in 2.6.19.2 so keep on reading) I started getting warnings from
ReiserFS indicating severe data corruptions. Reiserfsck confirms
this. It usually happened while extracting the Linux source tree.
So after asking around I found out dm-crypt had a bug[1] fixed in
early December.
It got fixed in 2.6.19 and the fix was backported and included in 2.6.18.6[2].
Fine, so I upgraded to 2.6.18.6, rebuilt the array from scratch and
did the whole procedure again.
No messages from reiserfs in dmesg this time, but reiserfsck still
revealed severe data corruption.
I also found compressed archives and ISO-images for which I had
md5sums to be corrupt.
I then upgraded to 2.6.19.2 with the exact same result as with 2.6.18.6.
I even verified this on a fairly new computer with different hardware
(Intel CPU and chipset).
Figured it maybe was some kind of race condition so on my second try
on 2.6.19.2, when recreating the array, I let md finish resyncing it
before copying over the files.
This time, reiserfsck didn't complain.
Just for the sake of fun, I did the whole thing again, rebuilding the
array from scratch, let md resync the third drive and then I started
to copy over all files again. Thinking the cause of the problem was
heavy disk I/O I tried to stress the other LVM volumes residing on md0
using tar during the copy. Everything seemed fine; no problems arose.
Did a few reboots and confirmed that reiserfsck didn't have any
complaints on any of the filesystems residing on the LVM volumes on
md0.
Started using the machine as normal, and half a day later I unmounted
the filesystems and ran reiserfsck just to make sure everything still
was OK. Unfortunately, it wasn't.
The drives in the array are three brand new drives, 2x250GB and one
200GB, all three IDE drives.
According to SMART there's no problems with them. And they worked
fine in my previous RAID1 setup with dm-crypt and LVM, by the way.
The computer itself is an Athlon XP with less than 1GB of RAM on a M/B
with nForce2 chipset FWIW. No memory errors were detected with
memtest86+ (I completed the full test).
I haven't tried using another filesystem as I've got quite a lot of
faith in reiserfs's stability.
Is anybody else experiencing these problems?
Unfortunately I'm only able to do limited testing due to busy days,
but I'd love to help if I can.
[1] Here's a thread on the recently fixed data corruption bug in dm-crypt
http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/1974
[2] The backport of the dm-crypt fix for 2.6.18.6 is here
http://uwsg.iu.edu/hypermail/linux/kernel/0612.1/2299.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists