[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191008151238.GK5078@quack2.suse.cz>
Date: Tue, 8 Oct 2019 17:12:38 +0200
From: Jan Kara <jack@...e.cz>
To: Matthew Bobrowski <mbobrowski@...browski.org>
Cc: tytso@....edu, jack@...e.cz, adilger.kernel@...ger.ca,
linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
hch@...radead.org, david@...morbit.com, darrick.wong@...cle.com
Subject: Re: [PATCH v4 8/8] ext4: introduce direct I/O write path using iomap
infrastructure
On Thu 03-10-19 21:35:05, Matthew Bobrowski wrote:
> This patch introduces a new direct I/O write code path implementation
> that makes use of the iomap infrastructure.
>
> All direct I/O write operations are now passed from the ->write_iter()
> callback to the new function ext4_dio_write_iter(). This function is
> responsible for calling into iomap infrastructure via
> iomap_dio_rw(). Snippets of the direct I/O code from within
> ext4_file_write_iter(), such as checking whether the IO request is
> unaligned asynchronous I/O, or whether it will ber overwriting
> allocated and initialized blocks has been moved out and into
> ext4_dio_write_iter().
>
> The block mapping flags that are passed to ext4_map_blocks() from
> within ext4_dio_get_block() and friends have effectively been taken
> out and introduced within the ext4_iomap_alloc().
>
> For inode extension cases, the ->end_io() callback
> ext4_dio_write_end_io() is responsible for calling into
> ext4_handle_inode_extension() and performing the necessary metadata
> updates. Failed writes that we're intended to be extend the inode will
> have blocks truncated accordingly. The ->end_io() handler is also
> responsible for converting allocated unwritten extents to written
> extents.
>
> In the instance of a short write, we fallback to buffered I/O and use
> that method to complete whatever is left over in 'iter'.
> Any blocks
> that have been allocated in preparation for direct I/O write will be
> reused by buffered I/O, so there's no issue with leaving allocated
> blocks beyond EOF.
This actually is not true as ext4_truncate_failed_write() will trim blocks
beyond EOF. Also this would not be 100% reliable as if we crash between DIO
short write succeeding and buffered write happening, we would leave inode
with blocks beyond EOF. So I'd just remove this sentence.
> Existing direct I/O write buffer_head code has been removed as it's
> now redundant.
>
> Signed-off-by: Matthew Bobrowski <mbobrowski@...browski.org>
...
> +static int ext4_dio_write_end_io(struct kiocb *iocb, ssize_t size,
> + int error, unsigned int flags)
> +{
> + loff_t offset = iocb->ki_pos;
> + struct inode *inode = file_inode(iocb->ki_filp);
> +
> + if (!error && (flags & IOMAP_DIO_UNWRITTEN))
> + error = ext4_convert_unwritten_extents(NULL, inode, offset,
> + size);
> + return ext4_handle_inode_extension(inode, offset, error ? : size,
> + size);
> +}
I was pondering about this and I don't think we have it quite correct.
Still :-|. The problem is that iomap_dio_complete() will pass dio->size as
'size', which is the amount of submitted IO but not necessarily the amount
of blocks that were mapped (that can be larger). Thus
ext4_handle_inode_extension() can miss the fact that there are blocks
beyond EOF that need to be trimmed. And we have no way of finding that out
inside our ->end_io handler. Even iomap_dio_complete() doesn't have that
information so we'd need to add 'original length' to struct iomap_dio and
then pass it do ->end_io.
Seeing how difficult it is when a filesystem wants to complete the iocb
synchronously (regardless whether it is async or sync) and have all the
information in one place for further processing, I think it would be the
easiest to provide iomap_dio_rw_wait() that forces waiting for the iocb to
complete *and* returns the appropriate return value instead of pretty
useless EIOCBQUEUED. It is actually pretty trivial (patch attached). With
this we can then just call iomap_dio_rw_sync() for the inode extension case
with ->end_io doing just the unwritten extent processing and then call
ext4_handle_inode_extension() from ext4_direct_write_iter() where we would
have all the information we need.
Christoph, Darrick, what do you think about extending iomap like in the
attached patch (plus sample use in XFS)?
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
View attachment "0001-iomap-Allow-forcing-of-waiting-for-running-DIO-in-io.patch" of type "text/x-patch" (2967 bytes)
View attachment "0002-xfs-Use-iomap_dio_rw_wait.patch" of type "text/x-patch" (1326 bytes)
Powered by blists - more mailing lists