[<prev] [next>] [day] [month] [year] [list]
Message-ID: <6FDE26082D451C41BE1A3742966200B3BAED1F@DR2EX01.hosting.itg>
Date: Tue, 31 Oct 2006 17:09:54 +0100
From: "Andreas Paulsson" <andreas.paulsson@...arden.se>
To: "Neil Brown" <neilb@...e.de>
Cc: <linux-kernel@...r.kernel.org>
Subject: RE: RE: RE: PROBLEM: raid5 just dies
>On Monday October 30, andreas.paulsson@...arden.se wrote:
>> >Exactly how are aes-loop and raid5 connected together?
>>
>> We use 5x300gb drives in a raid5 array, which is then used as a
>> physical disk in an lvm volume, with one logical volume. This logical
>> volume is then encrypted with "losetup -e aes /dev/loop1
>> /dev/vg0/lv0", and then formatted with ReiserFS.
>Thanks.
>
>It could be a hardware problem....
>The symptom is that we try to free some memory and a
>consistency check tells us that the memory wasn't allocated.
>So a single bit error in the address could be thecause.
>Running memtest86 for a while wouldn't hurt if you haven't
>already done that.
I'll see if I get get someone to do a memtest for me (the box is 600km
away).
>You have three layers here: loop over dm over md/raid5.
>So if it is a software problem it could be in any of these layers,
>or in an interaction between two of them.
>1/ how repeatable is this?
cp -arv source target
.. and then wait for a while, and it crashes. That is, 100% repeatable.
>2/ how much room have you got to experiment?
> Could you remake the array without the loop/aes and see if you can
>reproduce the problem?
I could, but then I would need to redo all the work again, since we need
the disk to be encrypted too.
>Could you remake the array without the LVM layer and see if you can
>reproduce the problem?
No, this is impossible, we need to have LVM because we are making one
big volume that consists of raid5 and raid1 devices.
>Do you have CONFIG_DEBUG_PAGEALLOC and CONFIG_DEBUG_SLAB set?
>If not could you >recompile with those set to see if they
>provide more helpful information.
I can try and see if it helps.
>I must admit I am somewhat at a loss.
>I cannot see much room for problems leading to that
>particular point in the code that would not be seen
>by lots more people than just you.
Heh, you're not the only one that's lost.
>NeilBrown
/Andreas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists