[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190824230427.GA32012@infradead.org>
Date: Sat, 24 Aug 2019 16:04:27 -0700
From: Christoph Hellwig <hch@...radead.org>
To: "Darrick J. Wong" <darrick.wong@...cle.com>
Cc: Matthew Bobrowski <mbobrowski@...browski.org>,
Ritesh Harjani <riteshh@...ux.ibm.com>, tytso@....edu,
jack@...e.cz, adilger.kernel@...ger.ca, linux-ext4@...r.kernel.org,
linux-fsdevel@...r.kernel.org, hch@...radead.org,
aneesh.kumar@...ux.ibm.com
Subject: Re: [PATCH 0/5] ext4: direct IO via iomap infrastructure
On Fri, Aug 23, 2019 at 08:55:54PM -0700, Darrick J. Wong wrote:
> I'm probably misunderstanding the ext4 extent cache horribly, but I keep
> wondering why any of this is necessary -- why can't ext4 track the
> unwritten status in the extent records directly? And why is there all
> this strange "can merge" logic? If you need to convert blocks X to Y
> to written state because a write to those blocks completed, isn't that
> just manipulation of a bunch of incore records? And can't you just seek
> back and forth in the extent cache to look for adjacent records to merge
> with? <confuseD>
Same here. I'm not an ext4 expert, but here is what we do in XFS, which
hopefully works in some form for ext4 a well:
- when starting a direct I/O we allocate any needed blocks and do so
as unwritten extent. The extent tree code will merge them in
whatever way that seems suitable
- if the IOMAP_DIO_UNWRITTEN is set on the iomap at ->end_io time we
call a function that walks the whole range covered by the ioend,
and convert any unwritten extent to a normal written extent. Any
splitting and merging will be done as needed by the low-level
extent tree code
- this also means we don't need the xfs_ioen structure (which ext4)
copied from for direct I/O at all (we used to have it initially,
though including the time when ext4 copied this code).
- we don't need the equivalent to the ext4_unwritten_wait call in
ext4_file_write_iter because we serialize any non-aligned I/O
instead of trying to optimize for weird corner cases
> (I'd really prefer not to go adding private fields all over the
> place...)
Agreed.
Powered by blists - more mailing lists