linux-ext4 - Re: [PATCH] ext4: don't submit unwritten extent while holding active jbd2 handle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190103031840.GB6908@mit.edu>
Date:   Wed, 2 Jan 2019 22:18:40 -0500
From:   "Theodore Y. Ts'o" <tytso@....edu>
To:     Xiaoguang Wang <xiaoguang.wang@...ux.alibaba.com>
CC:     <linux-ext4@...r.kernel.org>, <harshads@...gle.com>
Subject: Re: [PATCH] ext4: don't submit unwritten extent while holding active
 jbd2 handle

On Tue, Dec 25, 2018 at 10:16:25PM +0800, Xiaoguang Wang wrote:
> In ext4_writepages(), for every iteration, mpage_prepare_extent_to_map()
> will try to find 2048 pages to map and normally one bio can contain 256
> pages at most. If we really found 2048 pages to map, there will be 4 bios
> and 4 ext4_io_submit() calls which are called both in ext4_writepages()
> and mpage_map_and_submit_extent().
> 
> But note that in mpage_map_and_submit_extent(), we hold a valid jbd2 handle,
> when dioread_nolock is enabled and extent is unwritten, jbd2 commit thread
> will wait this handle to finish, so wait the unwritten extent is written to
> disk, this will introduce unnecessary stall time, especially longer when the
> writeback operation is io throttled, need to fix this issue.
> 
> Here for this scene, we accumulate bios in ext4_io_submit's io_bio, and only
> submit these bios after dropping the jbd2 handle.

I think it would be better if we simply don't even try to *start* the
jbd2 handle at all until we after ext4_end_io_rsv_work() is called.  I
suspect the best place to do this will be in ext4_end_io().  The
shorter time we hold the handle, the better the commit completion
latency will be.  (After all, while we are assembling the bios, memory
allocations could block if the system is under heavy memory pressure,
etc.)

This will also make it a bit easier to support dioread_nolock for
block sizes < page size, since we need to start a separate handle for
each extent that needs to be converted, and if the block size is 4k
and the page size is 64k (for example, on the PowerPC), there might be
as many as 16 extents that have to be modified for a heavily
fragmented file system.

Supporting dioread_nolock for 1k file systems on x86 and 4k file
systems on ppc64/ppc64le will allow us to make dioread_nolock the only
supported write path, dropping the current default.  This will give us
only one write path to optimize, debug, and maintain, which will be a
Good Thing.

Cheers,

						- Ted