lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87bjnyg9me.fsf@gmail.com>
Date: Fri, 29 Aug 2025 07:41:21 +0530
From: Ritesh Harjani (IBM) <ritesh.list@...il.com>
To: Jan Kara <jack@...e.cz>
Cc: Keith Busch <kbusch@...nel.org>, Jan Kara <jack@...e.cz>, Keith Busch <kbusch@...a.com>, linux-block@...r.kernel.org, linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, linux-ext4@...r.kernel.org, snitzer@...nel.org, axboe@...nel.dk, dw@...idwei.uk, brauner@...nel.org, hch@....de, martin.petersen@...cle.com, djwong@...nel.org, linux-xfs@...r.kernel.org, viro@...iv.linux.org.uk, Jan Kara <jack@...e.com>, Brian Foster <bfoster@...hat.com>
Subject: Re: [PATCHv3 0/8] direct-io: even more flexible io vectors

Jan Kara <jack@...e.cz> writes:

> On Tue 26-08-25 10:29:58, Ritesh Harjani wrote:
>> Keith Busch <kbusch@...nel.org> writes:
>> 
>> > On Mon, Aug 25, 2025 at 02:07:15PM +0200, Jan Kara wrote:
>> >> On Fri 22-08-25 18:57:08, Ritesh Harjani wrote:
>> >> > Keith Busch <kbusch@...a.com> writes:
>> >> > >
>> >> > >   - EXT4 falls back to buffered io for writes but not for reads.
>> >> > 
>> >> > ++linux-ext4 to get any historical context behind why the difference of
>> >> > behaviour in reads v/s writes for EXT4 DIO. 
>> >> 
>> >> Hum, how did you test? Because in the basic testing I did (with vanilla
>> >> kernel) I get EINVAL when doing unaligned DIO write in ext4... We should be
>> >> falling back to buffered IO only if the underlying file itself does not
>> >> support any kind of direct IO.
>> >
>> > Simple test case (dio-offset-test.c) below.
>> >
>> > I also ran this on vanilla kernel and got these results:
>> >
>> >   # mkfs.ext4 /dev/vda
>> >   # mount /dev/vda /mnt/ext4/
>> >   # make dio-offset-test
>> >   # ./dio-offset-test /mnt/ext4/foobar
>> >   write: Success
>> >   read: Invalid argument
>> >
>> > I tracked the "write: Success" down to ext4's handling for the "special"
>> > -ENOTBLK error after ext4_want_directio_fallback() returns "true".
>> >
>> 
>> Right. Ext4 has fallback only for dio writes but not for DIO reads... 
>> 
>> buffered
>> static inline bool ext4_want_directio_fallback(unsigned flags, ssize_t written)
>> {
>> 	/* must be a directio to fall back to buffered */
>> 	if ((flags & (IOMAP_WRITE | IOMAP_DIRECT)) !=
>> 		    (IOMAP_WRITE | IOMAP_DIRECT))
>> 		return false;
>> 
>>     ...
>> }
>> 
>> So basically the path is ext4_file_[read|write]_iter() -> iomap_dio_rw
>>     -> iomap_dio_bio_iter() -> return -EINVAL. i.e. from...
>> 
>> 
>> 	if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) ||
>> 	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
>> 		return -EINVAL;
>> 
>> EXT4 then fallsback to buffered-io only for writes, but not for reads. 
>
> Right. And the fallback for writes was actually inadvertedly "added" by
> commit bc264fea0f6f "iomap: support incremental iomap_iter advances". That
> changed the error handling logic. Previously if iomap_dio_bio_iter()
> returned EINVAL, it got propagated to userspace regardless of what
> ->iomap_end() returned. After this commit if ->iomap_end() returns error
> (which is ENOTBLK in ext4 case), it gets propagated to userspace instead of
> the error returned by iomap_dio_bio_iter().
>
> Now both the old and new behavior make some sense so I won't argue that the
> new iomap_iter() behavior is wrong. But I think we should change ext4 back
> to the old behavior of failing unaligned dio writes instead of them falling
> back to buffered IO. I think something like the attached patch should do
> the trick - it makes unaligned dio writes fail again while writes to holes
> of indirect-block mapped files still correctly fall back to buffered IO.
> Once fstests run completes, I'll do a proper submission...
>

Aah, right. So it wasn't EXT4 which had this behaviour of falling back
to buffered I/O for unaligned writes. Earlier EXT4 was assuming an error
code will be detected by iomap and will be passed to it as "written" in
ext4_iomap_end() for such unaligned writes. But I guess that logic
silently got changed with that commit. Thanks for analyzing that. 
I missed looking underneath iomap behaviour change :). 


>
> 								Honza
> -- 
> Jan Kara <jack@...e.com>
> SUSE Labs, CR
> From ce6da00a09647a03013c3f420c2e7ef7489c3de8 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@...e.cz>
> Date: Wed, 27 Aug 2025 14:55:19 +0200
> Subject: [PATCH] ext4: Fail unaligned direct IO write with EINVAL
>
> Commit bc264fea0f6f ("iomap: support incremental iomap_iter advances")
> changed the error handling logic in iomap_iter(). Previously any error
> from iomap_dio_bio_iter() got propagated to userspace, after this commit
> if ->iomap_end returns error, it gets propagated to userspace instead of
> an error from iomap_dio_bio_iter(). This results in unaligned writes to
> ext4 to silently fallback to buffered IO instead of erroring out.
>
> Now returning ENOTBLK for DIO writes from ext4_iomap_end() seems
> unnecessary these days. It is enough to return ENOTBLK from
> ext4_iomap_begin() when we don't support DIO write for that particular
> file offset (due to hole).

Right. This mainly only happens if we have holes in non-extent (indirect
blocks) case.

Also, as I see ext4 always just fallsback to buffered-io for no or
partial writes (unless iomap returned any error code). So, I was just
wondering if that could ever happen for DIO atomic write case. It's good
that we have a WARN_ON_ONCE() check in there to catch it. But I was
wondering if this needs an explicit handling in ext4_dio_write_iter() to
not fallback to buffered-writes for atomic DIO requests?

-ritesh



>
> Fixes: bc264fea0f6f ("iomap: support incremental iomap_iter advances")
> Signed-off-by: Jan Kara <jack@...e.cz>
> ---
>  fs/ext4/file.c  |  2 --
>  fs/ext4/inode.c | 35 -----------------------------------
>  2 files changed, 37 deletions(-)
>
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index 93240e35ee36..cf39f57d21e9 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -579,8 +579,6 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  		iomap_ops = &ext4_iomap_overwrite_ops;
>  	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
>  			   dio_flags, NULL, 0);
> -	if (ret == -ENOTBLK)
> -		ret = 0;
>  	if (extend) {
>  		/*
>  		 * We always perform extending DIO write synchronously so by
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 5b7a15db4953..c3b23c90fd11 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3872,47 +3872,12 @@ static int ext4_iomap_overwrite_begin(struct inode *inode, loff_t offset,
>  	return ret;
>  }
>  
> -static inline bool ext4_want_directio_fallback(unsigned flags, ssize_t written)
> -{
> -	/* must be a directio to fall back to buffered */
> -	if ((flags & (IOMAP_WRITE | IOMAP_DIRECT)) !=
> -		    (IOMAP_WRITE | IOMAP_DIRECT))
> -		return false;
> -
> -	/* atomic writes are all-or-nothing */
> -	if (flags & IOMAP_ATOMIC)
> -		return false;
> -
> -	/* can only try again if we wrote nothing */
> -	return written == 0;
> -}
> -
> -static int ext4_iomap_end(struct inode *inode, loff_t offset, loff_t length,
> -			  ssize_t written, unsigned flags, struct iomap *iomap)
> -{
> -	/*
> -	 * Check to see whether an error occurred while writing out the data to
> -	 * the allocated blocks. If so, return the magic error code for
> -	 * non-atomic write so that we fallback to buffered I/O and attempt to
> -	 * complete the remainder of the I/O.
> -	 * For non-atomic writes, any blocks that may have been
> -	 * allocated in preparation for the direct I/O will be reused during
> -	 * buffered I/O. For atomic write, we never fallback to buffered-io.
> -	 */
> -	if (ext4_want_directio_fallback(flags, written))
> -		return -ENOTBLK;
> -
> -	return 0;
> -}
> -
>  const struct iomap_ops ext4_iomap_ops = {
>  	.iomap_begin		= ext4_iomap_begin,
> -	.iomap_end		= ext4_iomap_end,
>  };
>  
>  const struct iomap_ops ext4_iomap_overwrite_ops = {
>  	.iomap_begin		= ext4_iomap_overwrite_begin,
> -	.iomap_end		= ext4_iomap_end,
>  };
>  
>  static int ext4_iomap_begin_report(struct inode *inode, loff_t offset,
> -- 
> 2.43.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ