lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZyFh3uCGqB20+2X2@dread.disaster.area>
Date: Wed, 30 Oct 2024 09:29:50 +1100
From: Dave Chinner <david@...morbit.com>
To: Ritesh Harjani <ritesh.list@...il.com>
Cc: linux-ext4@...r.kernel.org, Theodore Ts'o <tytso@....edu>,
	Jan Kara <jack@...e.cz>, "Darrick J . Wong" <djwong@...nel.org>,
	Christoph Hellwig <hch@...radead.org>,
	John Garry <john.g.garry@...cle.com>,
	Ojaswin Mujoo <ojaswin@...ux.ibm.com>, linux-kernel@...r.kernel.org,
	linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH 4/6] ext4: Warn if we ever fallback to buffered-io for
 DIO atomic writes

On Mon, Oct 28, 2024 at 11:44:00PM +0530, Ritesh Harjani wrote:
> 
> Hi Dave, 
> 
> Dave Chinner <david@...morbit.com> writes:
> 
> > On Mon, Oct 28, 2024 at 06:39:36AM +0530, Ritesh Harjani wrote:
> >> 
> >> Hi Dave, 
> >> 
> >> Dave Chinner <david@...morbit.com> writes:
> >> 
> >> > On Fri, Oct 25, 2024 at 09:15:53AM +0530, Ritesh Harjani (IBM) wrote:
> >> >> iomap will not return -ENOTBLK in case of dio atomic writes. But let's
> >> >> also add a WARN_ON_ONCE and return -EIO as a safety net.
> >> >> 
> >> >> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@...il.com>
> >> >> ---
> >> >>  fs/ext4/file.c | 10 +++++++++-
> >> >>  1 file changed, 9 insertions(+), 1 deletion(-)
> >> >> 
> >> >> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> >> >> index f9516121a036..af6ebd0ac0d6 100644
> >> >> --- a/fs/ext4/file.c
> >> >> +++ b/fs/ext4/file.c
> >> >> @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
> >> >>  		iomap_ops = &ext4_iomap_overwrite_ops;
> >> >>  	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
> >> >>  			   dio_flags, NULL, 0);
> >> >> -	if (ret == -ENOTBLK)
> >> >> +	if (ret == -ENOTBLK) {
> >> >>  		ret = 0;
> >> >> +		/*
> >> >> +		 * iomap will never return -ENOTBLK if write fails for atomic
> >> >> +		 * write. But let's just add a safety net.
> >> >> +		 */
> >> >> +		if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC))
> >> >> +			ret = -EIO;
> >> >> +	}
> >> >
> >> > Why can't the iomap code return EIO in this case for IOCB_ATOMIC?
> >> > That way we don't have to put this logic into every filesystem.
> >> 
> >> This was origially intended as a safety net hence the WARN_ON_ONCE.
> >> Later Darrick pointed out that we still might have an unconverted
> >> condition in iomap which can return ENOTBLK for DIO atomic writes (page
> >> cache invalidation).
> >
> > Yes. That's my point - iomap knows that it's an atomic write, it
> > knows that invalidation failed, and it knows that there is no such
> > thing as buffered atomic writes. So there is no possible fallback
> > here, and it should be returning EIO in the page cache invalidation
> > failure case and not ENOTBLK.
> >
> 
> So the iomap DIO can return following as return values which can make
> some filesystems fallback to buffered-io (if they implement fallback
> logic) - 
> (1) -ENOTBLK -> this is only returned for pagecache invalidation failure.
> (2) 0 or partial write size -> This can never happen for atomic writes
> (since we are only allowing for single fsblock as of now).

Even when we allow multi-FSB atomic writes, the definition of
atomic write is still "all or nothing". There is no scope for "short
writes" when IOCB_ATOMIC is set - any condition that means we can't
write the entire IO as a single bio, we need to abort and return
EINVAL.

Hence -ENOTBLK should never be returned by iomap for atomic DIO
writes - we need to say -EINVAL if the write could not be issued
atomically for whatever reason it may be so the application knows
that atomic IO submission was not possible for that IO.

> Now looking at XFS, it never fallsback to buffered-io ever except just 2
> cases - 
> 1. When pagecache invalidation fails in iomap (can never happen for
> atomic writes)

Why can't this happen for atomic DIO writes?  It's the same failure
cases as for normal DIO writes, isn't it? (i.e. race with mmap
writes)

My point is that if it's an atomic write, this failure should get
turned into -EINVAL by the iomap code. We do not want a fallback to
buffered IO when this situation happens for atomic IO.

> 2. On unaligned DIO writes to reflinked CoW (not possible for atomic writes)

This path doesn't ever go through iomap - XFS catches that case
before it calls into iomap, so it's not relevant to how iomap
behaves w.r.t atomic IO.

> So it anyways should never happen that XFS ever fallback to buffered-io
> for DIO atomic writes. Even today it does not fallback to buffered-io
> for non-atomic short DIO writes.
> 
> >> You pointed it right that it should be fixed in iomap. However do you
> >> think filesystems can still keep this as safety net (maybe no need of
> >> WARN_ON_ONCE).
> >
> > I don't see any point in adding "impossible to hit" checks into
> > filesystems just in case some core infrastructure has a bug
> > introduced....
> 
> Yes, that is true for XFS. EXT4 however can return -ENOTBLK for short
> writes, though it should not happen for current atomic write case where
> we are only allowing for 1 fsblock. 

Yes, but the -ENOTBLK error returned from ext4_iomap_end() if
nothing was written does not get returned to ext4 from
__iomap_dio_rw(). It is consumed by the iomap code:

	/* magic error code to fall back to buffered I/O */
        if (ret == -ENOTBLK) {
                wait_for_completion = true;
                ret = 0;
	}

This means that all the IO that was issued gets completed before
returning to the caller and that's how the short write comes about.

-ENOTBLK is *not returned to the caller* on a short write -
iomap_dio_rw will return 0 (success).  The caller then has to look
at the iov_iter state to determine if the write was fully completed.
This is exactly what the ext4 code currently does for all DIO
writes, not just those that return -ENOTBLK.

> I would still like to go with a WARN_ON_ONCE where we are calling ext4
> buffered-io handling for DIO fallback writes. This is to catch any bugs
> even in future when we move to multi-fsblock case (until we have atomic
> write support for buffered-io).

Your choice, but please realise that it is not going to catch short
atomic writes at all.

-Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ