lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110906153601.GA20571@thunk.org>
Date:	Tue, 6 Sep 2011 11:36:01 -0400
From:	Ted Ts'o <tytso@....edu>
To:	Jan Kara <jack@...e.cz>
Cc:	Ext4 Developers List <linux-ext4@...r.kernel.org>,
	Allison Henderson <achender@...ux.vnet.ibm.com>
Subject: Re: [PATCH] ext4: only call ext4_jbd2_file_inode when an inode has
 been extended

On Tue, Sep 06, 2011 at 12:02:53PM +0200, Jan Kara wrote:
>   Umm, I don't quite understand the race. It seems to me we can block on
> lack of journal space during writeback only in ext4_da_writepages() where
> we do ext4_journal_start(). But at that point there shouldn't be any pages
> with PageWriteback set which were not submitted for IO. So it's not clear
> to me why PageWriteback bit does not get cleared when IO finishes and
> kjournald would then continue. Is it because IO completion is somehow
> blocked on journal?

No, you're right.  I looked more closely at it and the problem is
because of the end_io handling.  There was a thead blocked in
mpage_da_map_and_submit path but it was blocked in
ext4_get_wite_access(), and it was not holding PageWriteback at the
time.

> That would seem possible as I'm looking e.g. into
> ext4_convert_unwritten_extents(). But then any waiting for writeback from
> kjournald is prone to deadlocks. In fact regardless of what kjournald does
> there does not seem to be sane lock ordering since we have:
> 
> transaction start -> PageWriteback -> transaction start
>                   ^                ^
>                   |         end_io handling if I'm right
>       enforced by write_cache_pages_da
> 
>   Which is really nasty. We cannot end page writeback until the page is
> readable from disk which means until we have properly updated extent tree.
> But for extent tree update we need a transaction. The only way out I can
> see is to reserve space for extent tree update in a transaction in
> writepages() and holding the transaction open until end_io time when we
> change extent tree and close the transaction. But I'm afraid that's going
> to suck under heavy load... 

Yes, that's a problem with ext4_convert_unwritten_extents() being
called out of end_io handling, and of course dioread_nolock does a lot
more of that, hence why it shows up in that mode.

I think the long-term solution here is that we have to reserve space
and make the allocation decision at writepages() time, but we don't
actually modify any on-disk state, hence we don't have to hold a
transaction open.  We just prevent those disk blocks from getting
allocated anywhere else, and we tentative assoicate physical blocks
with the logical block numbers, but in a memory structure only.  Then
when the pages are written, we can drop PageWriteback.  We don't have
to wait until the extent blocks are written to disk, so long as any
callers of ext4_map_blocks() get the right information (via an
in-memory cache that we would have to add to make this whole thing
first), and so long as fsync() not only calls filemap_fdatawrite(),
but also waits for the metadata updates to the extent tree/indirect
blocks have been completed.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ