lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTikMNw5yMxcUV3a-temcOdWa5gySZ+vTtfuBEAiD@mail.gmail.com>
Date:	Tue, 8 Mar 2011 09:41:15 +0100
From:	Bastien ROUCARIES <roucaries.bastien@...il.com>
To:	Frederic Weisbecker <fweisbec@...il.com>
Cc:	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
	akpm@...ux-foundation.org
Subject: Re: Reiserfs deadlock in 2.6.36

On Mon, Mar 7, 2011 at 8:00 PM, Frederic Weisbecker <fweisbec@...il.com> wrote:
> Hi Bastien,

Cc: Ingo Molnar because he work a lot on soft lockup, and could have
an idea to debug
cc: andrew morton that trakc also "File/memory corruption in 2.6.37"

>> I take me more than two days of testing to reporduce this bugs with trace enabled. My filesystem was quite slow and this bugs seems
>> to be timing related.
>>
>> One patern that trigger this bug is git. Doing a lot of git work of my desktop crash my machine.
>>
>> Moreover, trying to reproduce this bug lead to data loss. I have rebuilded twice my / partition using --rebuild-tree, and restored
>> my home partition three times using backups.
>>
>> My log is here.
>>
>> Do you need more information?
>
> Yeah do you have CONFIG_REISERFS_CHECK? I just would
> like to ensure we are not missing this important source of
> information.

Yes I have it
> I'm puzzled because, given the traces, your opening and closing of the journal are
> well balanced.
>
> You have a writer queued and stuck but I see no trace of it in the traces stream.
> I only see well balanced journal operations, including journal closing that would have
> woken your queued writer.
>
> A theory could be that your queued writer was waiting for someone to close the journal,
> which finally happen but actually several minutes later, after there was many
> journal opening/closing that overwrote the old trace containing the queueing of
> the stuck writer.

Doing a while true;do  sync && sleep1; done; help a lot

>
> I don't know what to do yet. I need to think more about it.
>

Could we do the stuff I have sugested at first ? use lockdep to track
journal open,/close using fake lock ?

BTW it seems that someone experiment this confition on ext3. I could
do more testing if you want, and I will run xfstests in order to see
if I could reproduce more quickly

Bastien
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ