lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 31 Mar 2009 14:33:07 +0200
From:	Jan Kara <jack@...e.cz>
To:	Alexander Beregalov <a.beregalov@...il.com>
Cc:	Theodore Tso <tytso@....edu>,
	"linux-next@...r.kernel.org" <linux-next@...r.kernel.org>,
	linux-ext4@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
	sparclinux@...r.kernel.org
Subject: Re: next-20090310: ext4 hangs

On Tue 31-03-09 14:07:30, Alexander Beregalov wrote:
> 2009/3/31 Jan Kara <jack@...e.cz>:
> > On Thu 26-03-09 01:38:32, Alexander Beregalov wrote:
> >> 2009/3/25 Jan Kara <jack@...e.cz>:
> >> > On Wed 25-03-09 20:07:46, Alexander Beregalov wrote:
> >> >> 2009/3/25 Jan Kara <jack@...e.cz>:
> >> >> > On Wed 25-03-09 18:29:10, Alexander Beregalov wrote:
> >> >> >> 2009/3/25 Jan Kara <jack@...e.cz>:
> >> >> >> > On Wed 25-03-09 18:18:43, Alexander Beregalov wrote:
> >> >> >> >> 2009/3/25 Jan Kara <jack@...e.cz>:
> >> >> >> >> >> > So, I think I need to try it on 2.6.29-rc7 again.
> >> >> >> >> >>   I've looked into this. Obviously, what's happenning is that we delete
> >> >> >> >> >> an inode and jbd2_journal_release_jbd_inode() finds inode is just under
> >> >> >> >> >> writeout in transaction commit and thus it waits. But it gets never woken
> >> >> >> >> >> up and because it has a handle from the transaction, every one eventually
> >> >> >> >> >> blocks on waiting for a transaction to finish.
> >> >> >> >> >>   But I don't really see how that can happen. The code is really
> >> >> >> >> >> straightforward and everything happens under j_list_lock... Strange.
> >> >> >> >> >  BTW: Is the system SMP?
> >> >> >> >> No, it is UP system.
> >> >> >> >  Even stranger. And do you have CONFIG_PREEMPT set?
> >> >> >> >
> >> >> >> >> The bug exists even in 2.6.29, I posted it with a new topic.
> >> >> >> >  OK, I've sort-of expected this.
> >> >> >>
> >> >> >> CONFIG_PREEMPT_RCU=y
> >> >> >> CONFIG_PREEMPT_RCU_TRACE=y
> >> >> >> # CONFIG_PREEMPT_NONE is not set
> >> >> >> # CONFIG_PREEMPT_VOLUNTARY is not set
> >> >> >> CONFIG_PREEMPT=y
> >> >> >> CONFIG_DEBUG_PREEMPT=y
> >> >> >> # CONFIG_PREEMPT_TRACER is not set
> >> >> >>
> >> >> >> config is attached.
> >> >> >  Thanks for the data. I still don't see how the wakeup can get lost. The
> >> >> > process even cannot be preempted when we are in the section protected by
> >> >> > j_list_lock... Can you send me a disassembly of functions
> >> >> > jbd2_journal_release_jbd_inode() and journal_submit_data_buffers() so that
> >> >> > I can see whether the compiler has not reordered something unexpectedly?
> >> >  Thanks for the disassembly...
> >> >
> >> >> By default gcc inlines journal_submit_data_buffers()
> >> >> Here is -fno-inline version. Default version is in attach.
> >  <snip>
> >
> >  I'm helpless here. I don't see how we can miss a wakeup (plus you seem to
> > be the only one reporting the bug). Could you please compile and test the kernel
> > with the attached patch? It will print to kernel log when we go to sleep
> > waiting for inode commit and when we send wakeups etc. When you hit the
> > deadlock, please send me your kernel log. It should help with debugging why do
> > we miss the wakeup. Thanks.
> 
> Which patch?
  Ups. Forgot to attach ;).

										Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR

View attachment "0001-ext4-Debug-sleepers-in-iput.patch" of type "text/x-patch" (1984 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ