linux-kernel - Re: [PATCH v3 5/7] iomap: Support restarting direct I/O requests after user copy failures

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <20210726180847.GO20621@quack2.suse.cz>
Date:   Mon, 26 Jul 2021 20:08:48 +0200
From:   Jan Kara <jack@...e.cz>
To:     Andreas Grünbacher 
        <andreas.gruenbacher@...il.com>
Cc:     Jan Kara <jack@...e.cz>, Andreas Gruenbacher <agruenba@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Christoph Hellwig <hch@...radead.org>,
        "Darrick J. Wong" <djwong@...nel.org>,
        Matthew Wilcox <willy@...radead.org>,
        cluster-devel <cluster-devel@...hat.com>,
        Linux FS-devel Mailing List <linux-fsdevel@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        ocfs2-devel@....oracle.com
Subject: Re: [PATCH v3 5/7] iomap: Support restarting direct I/O requests
 after user copy failures

On Mon 26-07-21 19:45:22, Andreas Grünbacher wrote:
> Jan Kara <jack@...e.cz> schrieb am Mo., 26. Juli 2021, 19:21:
> 
> > On Fri 23-07-21 22:58:38, Andreas Gruenbacher wrote:
> > > In __iomap_dio_rw, when iomap_apply returns an -EFAULT error, complete
> > the
> > > request synchronously and reset the iterator to the start position.  This
> > > allows callers to deal with the failure and retry the operation.
> > >
> > > In gfs2, we need to disable page faults while we're holding glocks to
> > prevent
> > > deadlocks.  This patch is the minimum solution I could find to make
> > > iomap_dio_rw work with page faults disabled.  It's still expensive
> > because any
> > > I/O that was carried out before hitting -EFAULT needs to be retried.
> > >
> > > A possible improvement would be to add an IOMAP_DIO_FAULT_RETRY or
> > similar flag
> > > that would allow iomap_dio_rw to return a short result when hitting
> > -EFAULT.
> > > Callers could then retry only the rest of the request after dealing with
> > the
> > > page fault.
> > >
> > > Asynchronous requests turn into synchronous requests up to the point of
> > the
> > > page fault in any case, but they could be retried asynchronously after
> > dealing
> > > with the page fault.  To make that work, the completion notification
> > would have
> > > to include the bytes read or written before the page fault(s) as well,
> > and we'd
> > > need an additional iomap_dio_rw argument for that.
> > >
> > > Signed-off-by: Andreas Gruenbacher <agruenba@...hat.com>
> > > ---
> > >  fs/iomap/direct-io.c | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > index cc0b4bc8861b..b0a494211bb4 100644
> > > --- a/fs/iomap/direct-io.c
> > > +++ b/fs/iomap/direct-io.c
> > > @@ -561,6 +561,15 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter
> > *iter,
> > >               ret = iomap_apply(inode, pos, count, iomap_flags, ops, dio,
> > >                               iomap_dio_actor);
> > >               if (ret <= 0) {
> > > +                     if (ret == -EFAULT) {
> > > +                             /*
> > > +                              * To allow retrying the request, fail
> > > +                              * synchronously and reset the iterator.
> > > +                              */
> > > +                             wait_for_completion = true;
> > > +                             iov_iter_revert(dio->submit.iter,
> > dio->size);
> > > +                     }
> > > +
> >
> > Hum, OK, but this means that if userspace submits large enough write, GFS2
> > will livelock trying to complete it? While other filesystems can just
> > submit multiple smaller bios constructed in iomap_apply() (paging in
> > different parts of the buffer) and thus complete the write?
> >
> 
> No. First, this affects reads but not writes. We cannot just blindly repeat
> writes; when a page fault occurs in the middle of a write, the result will
> be a short write. For reads, the plan is to ads a flag to allow
> iomap_dio_rw to return a partial result when a page fault occurs.
> (Currently, it fails the entire request.) Then we can handle the page fault
> and complete the rest of the request.
> 
> The changes needed for that are simple on the iomap side, but we need to go
> through some gymnastics for handling the page fault without giving up the
> glock in the non-contended case. There will still be the potential for
> losing the lock and having to re-acquire it, in which case we'll actually
> have to repeat the entire read.

I've missed you've already sent out v4 (I'm catching up after vacation so
my mailbox is a bit of a mess). What's in there addresses my objection.
I'm sorry for the noise.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR