[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTikMNw5yMxcUV3a-temcOdWa5gySZ+vTtfuBEAiD@mail.gmail.com>
Date: Tue, 8 Mar 2011 09:41:15 +0100
From: Bastien ROUCARIES <roucaries.bastien@...il.com>
To: Frederic Weisbecker <fweisbec@...il.com>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
akpm@...ux-foundation.org
Subject: Re: Reiserfs deadlock in 2.6.36
On Mon, Mar 7, 2011 at 8:00 PM, Frederic Weisbecker <fweisbec@...il.com> wrote:
> Hi Bastien,
Cc: Ingo Molnar because he work a lot on soft lockup, and could have
an idea to debug
cc: andrew morton that trakc also "File/memory corruption in 2.6.37"
>> I take me more than two days of testing to reporduce this bugs with trace enabled. My filesystem was quite slow and this bugs seems
>> to be timing related.
>>
>> One patern that trigger this bug is git. Doing a lot of git work of my desktop crash my machine.
>>
>> Moreover, trying to reproduce this bug lead to data loss. I have rebuilded twice my / partition using --rebuild-tree, and restored
>> my home partition three times using backups.
>>
>> My log is here.
>>
>> Do you need more information?
>
> Yeah do you have CONFIG_REISERFS_CHECK? I just would
> like to ensure we are not missing this important source of
> information.
Yes I have it
> I'm puzzled because, given the traces, your opening and closing of the journal are
> well balanced.
>
> You have a writer queued and stuck but I see no trace of it in the traces stream.
> I only see well balanced journal operations, including journal closing that would have
> woken your queued writer.
>
> A theory could be that your queued writer was waiting for someone to close the journal,
> which finally happen but actually several minutes later, after there was many
> journal opening/closing that overwrote the old trace containing the queueing of
> the stuck writer.
Doing a while true;do sync && sleep1; done; help a lot
>
> I don't know what to do yet. I need to think more about it.
>
Could we do the stuff I have sugested at first ? use lockdep to track
journal open,/close using fake lock ?
BTW it seems that someone experiment this confition on ext3. I could
do more testing if you want, and I will run xfstests in order to see
if I could reproduce more quickly
Bastien
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists