lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090331100150.GF11808@duck.suse.cz>
Date:	Tue, 31 Mar 2009 12:01:51 +0200
From:	Jan Kara <jack@...e.cz>
To:	Alexander Beregalov <a.beregalov@...il.com>
Cc:	Theodore Tso <tytso@....edu>,
	"linux-next@...r.kernel.org" <linux-next@...r.kernel.org>,
	linux-ext4@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
	sparclinux@...r.kernel.org
Subject: Re: next-20090310: ext4 hangs

On Thu 26-03-09 01:38:32, Alexander Beregalov wrote:
> 2009/3/25 Jan Kara <jack@...e.cz>:
> > On Wed 25-03-09 20:07:46, Alexander Beregalov wrote:
> >> 2009/3/25 Jan Kara <jack@...e.cz>:
> >> > On Wed 25-03-09 18:29:10, Alexander Beregalov wrote:
> >> >> 2009/3/25 Jan Kara <jack@...e.cz>:
> >> >> > On Wed 25-03-09 18:18:43, Alexander Beregalov wrote:
> >> >> >> 2009/3/25 Jan Kara <jack@...e.cz>:
> >> >> >> >> > So, I think I need to try it on 2.6.29-rc7 again.
> >> >> >> >>   I've looked into this. Obviously, what's happenning is that we delete
> >> >> >> >> an inode and jbd2_journal_release_jbd_inode() finds inode is just under
> >> >> >> >> writeout in transaction commit and thus it waits. But it gets never woken
> >> >> >> >> up and because it has a handle from the transaction, every one eventually
> >> >> >> >> blocks on waiting for a transaction to finish.
> >> >> >> >>   But I don't really see how that can happen. The code is really
> >> >> >> >> straightforward and everything happens under j_list_lock... Strange.
> >> >> >> >  BTW: Is the system SMP?
> >> >> >> No, it is UP system.
> >> >> >  Even stranger. And do you have CONFIG_PREEMPT set?
> >> >> >
> >> >> >> The bug exists even in 2.6.29, I posted it with a new topic.
> >> >> >  OK, I've sort-of expected this.
> >> >>
> >> >> CONFIG_PREEMPT_RCU=y
> >> >> CONFIG_PREEMPT_RCU_TRACE=y
> >> >> # CONFIG_PREEMPT_NONE is not set
> >> >> # CONFIG_PREEMPT_VOLUNTARY is not set
> >> >> CONFIG_PREEMPT=y
> >> >> CONFIG_DEBUG_PREEMPT=y
> >> >> # CONFIG_PREEMPT_TRACER is not set
> >> >>
> >> >> config is attached.
> >> >  Thanks for the data. I still don't see how the wakeup can get lost. The
> >> > process even cannot be preempted when we are in the section protected by
> >> > j_list_lock... Can you send me a disassembly of functions
> >> > jbd2_journal_release_jbd_inode() and journal_submit_data_buffers() so that
> >> > I can see whether the compiler has not reordered something unexpectedly?
> >  Thanks for the disassembly...
> >
> >> By default gcc inlines journal_submit_data_buffers()
> >> Here is -fno-inline version. Default version is in attach.
  <snip>

  I'm helpless here. I don't see how we can miss a wakeup (plus you seem to
be the only one reporting the bug). Could you please compile and test the kernel
with the attached patch? It will print to kernel log when we go to sleep
waiting for inode commit and when we send wakeups etc. When you hit the
deadlock, please send me your kernel log. It should help with debugging why do
we miss the wakeup. Thanks.

									Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ