linux-ext4 - Re: [PATCH v5 7/9] iomap/xfs: Eliminate the iomap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHpGcMJu47qQEFgzSvodbpJtPz71HFrG66wbrXR_6AdbkZsOOw@mail.gmail.com>
Date:   Wed, 4 Jan 2023 20:02:56 +0100
From:   Andreas Grünbacher <andreas.gruenbacher@...il.com>
To:     "Darrick J. Wong" <djwong@...nel.org>
Cc:     Andreas Gruenbacher <agruenba@...hat.com>,
        Christoph Hellwig <hch@...radead.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Matthew Wilcox <willy@...radead.org>,
        linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-ext4@...r.kernel.org, cluster-devel@...hat.com
Subject: Re: [PATCH v5 7/9] iomap/xfs: Eliminate the iomap_valid handler

Am Mi., 4. Jan. 2023 um 19:00 Uhr schrieb Darrick J. Wong <djwong@...nel.org>:
> On Sat, Dec 31, 2022 at 04:09:17PM +0100, Andreas Gruenbacher wrote:
> > Eliminate the ->iomap_valid() handler by switching to a ->get_folio()
> > handler and validating the mapping there.
> >
> > Signed-off-by: Andreas Gruenbacher <agruenba@...hat.com>
> > ---
> >  fs/iomap/buffered-io.c | 25 +++++--------------------
> >  fs/xfs/xfs_iomap.c     | 37 ++++++++++++++++++++++++++-----------
> >  include/linux/iomap.h  | 22 +++++-----------------
> >  3 files changed, 36 insertions(+), 48 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 4f363d42dbaf..df6fca11f18c 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -630,7 +630,7 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
> >       const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
> >       const struct iomap *srcmap = iomap_iter_srcmap(iter);
> >       struct folio *folio;
> > -     int status = 0;
> > +     int status;
> >
> >       BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
> >       if (srcmap != &iter->iomap)
> > @@ -646,27 +646,12 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
> >               folio = page_ops->get_folio(iter, pos, len);
> >       else
> >               folio = iomap_get_folio(iter, pos);
> > -     if (IS_ERR(folio))
> > -             return PTR_ERR(folio);
> > -
> > -     /*
> > -      * Now we have a locked folio, before we do anything with it we need to
> > -      * check that the iomap we have cached is not stale. The inode extent
> > -      * mapping can change due to concurrent IO in flight (e.g.
> > -      * IOMAP_UNWRITTEN state can change and memory reclaim could have
> > -      * reclaimed a previously partially written page at this index after IO
> > -      * completion before this write reaches this file offset) and hence we
> > -      * could do the wrong thing here (zero a page range incorrectly or fail
> > -      * to zero) and corrupt data.
> > -      */
> > -     if (page_ops && page_ops->iomap_valid) {
> > -             bool iomap_valid = page_ops->iomap_valid(iter->inode,
> > -                                                     &iter->iomap);
> > -             if (!iomap_valid) {
> > +     if (IS_ERR(folio)) {
> > +             if (folio == ERR_PTR(-ESTALE)) {
> >                       iter->iomap.flags |= IOMAP_F_STALE;
> > -                     status = 0;
> > -                     goto out_unlock;
> > +                     return 0;
> >               }
> > +             return PTR_ERR(folio);
>
> I wonder if this should be reworked a bit to reduce indenting:
>
>         if (PTR_ERR(folio) == -ESTALE) {
>                 iter->iomap.flags |= IOMAP_F_STALE;
>                 return 0;
>         }
>         if (IS_ERR(folio))
>                 return PTR_ERR(folio);
>
> But I don't have any strong opinions about that.

This is a relatively hot path; the compiler would have to convert the
flattened version back to the nested version for the best possible
result to avoid a redundant check. Not sure it would always do that.

> >       }
> >
> >       if (pos + len > folio_pos(folio) + folio_size(folio))
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index 669c1bc5c3a7..d0bf99539180 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -62,29 +62,44 @@ xfs_iomap_inode_sequence(
> >       return cookie | READ_ONCE(ip->i_df.if_seq);
> >  }
> >
> > -/*
> > - * Check that the iomap passed to us is still valid for the given offset and
> > - * length.
> > - */
> > -static bool
> > -xfs_iomap_valid(
> > -     struct inode            *inode,
> > -     const struct iomap      *iomap)
> > +static struct folio *
> > +xfs_get_folio(
> > +     struct iomap_iter       *iter,
> > +     loff_t                  pos,
> > +     unsigned                len)
> >  {
> > +     struct inode            *inode = iter->inode;
> > +     struct iomap            *iomap = &iter->iomap;
> >       struct xfs_inode        *ip = XFS_I(inode);
> > +     struct folio            *folio;
> >
> > +     folio = iomap_get_folio(iter, pos);
> > +     if (IS_ERR(folio))
> > +             return folio;
> > +
> > +     /*
> > +      * Now that we have a locked folio, we need to check that the iomap we
> > +      * have cached is not stale.  The inode extent mapping can change due to
> > +      * concurrent IO in flight (e.g., IOMAP_UNWRITTEN state can change and
> > +      * memory reclaim could have reclaimed a previously partially written
> > +      * page at this index after IO completion before this write reaches
> > +      * this file offset) and hence we could do the wrong thing here (zero a
> > +      * page range incorrectly or fail to zero) and corrupt data.
> > +      */
> >       if (iomap->validity_cookie !=
> >                       xfs_iomap_inode_sequence(ip, iomap->flags)) {
> >               trace_xfs_iomap_invalid(ip, iomap);
> > -             return false;
> > +             folio_unlock(folio);
> > +             folio_put(folio);
> > +             return ERR_PTR(-ESTALE);
> >       }
> >
> >       XFS_ERRORTAG_DELAY(ip->i_mount, XFS_ERRTAG_WRITE_DELAY_MS);
> > -     return true;
> > +     return folio;
> >  }
> >
> >  const struct iomap_page_ops xfs_iomap_page_ops = {
> > -     .iomap_valid            = xfs_iomap_valid,
> > +     .get_folio              = xfs_get_folio,
> >  };
> >
> >  int
> > diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> > index dd3575ada5d1..6f8e3321e475 100644
> > --- a/include/linux/iomap.h
> > +++ b/include/linux/iomap.h
> > @@ -134,29 +134,17 @@ static inline bool iomap_inline_data_valid(const struct iomap *iomap)
> >   * When get_folio succeeds, put_folio will always be called to do any
> >   * cleanup work necessary.  put_folio is responsible for unlocking and putting
> >   * @folio.
> > + *
> > + * When an iomap is created, the filesystem can store internal state (e.g., a
> > + * sequence number) in iomap->validity_cookie.  When it then detects in the
>
> I would reword this to:
>
> "When an iomap is created, the filesystem can store internal state (e.g.
> sequence number) in iomap->validity_cookie.  The get_folio handler can
> use this validity cookie to detect if the iomap is no longer up to date
> and needs to be refreshed.  If this is the case, the function should
> return ERR_PTR(-ESTALE) to retry the operation with a fresh mapping."

Yes, but "e.g." is always followed by a comma and it's "when", not
"if" here. So how about:

 * When an iomap is created, the filesystem can store internal state (e.g., a
 * sequence number) in iomap->validity_cookie.  The get_folio handler can use
 * this validity cookie to detect when the iomap needs to be refreshed because
 * it is no longer up to date.  In that case, the function should return
 * ERR_PTR(-ESTALE) to retry the operation with a fresh mapping.

I've incorporated all your feedback into this branch for now:

https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/log/?h=iomap-race

Thank you,
Andreas


> --D
>
> > + * get_folio handler that the iomap is no longer up to date and needs to be
> > + * refreshed, it can return ERR_PTR(-ESTALE) to trigger a retry.
> >   */
> >  struct iomap_page_ops {
> >       struct folio *(*get_folio)(struct iomap_iter *iter, loff_t pos,
> >                       unsigned len);
> >       void (*put_folio)(struct inode *inode, loff_t pos, unsigned copied,
> >                       struct folio *folio);
> > -
> > -     /*
> > -      * Check that the cached iomap still maps correctly to the filesystem's
> > -      * internal extent map. FS internal extent maps can change while iomap
> > -      * is iterating a cached iomap, so this hook allows iomap to detect that
> > -      * the iomap needs to be refreshed during a long running write
> > -      * operation.
> > -      *
> > -      * The filesystem can store internal state (e.g. a sequence number) in
> > -      * iomap->validity_cookie when the iomap is first mapped to be able to
> > -      * detect changes between mapping time and whenever .iomap_valid() is
> > -      * called.
> > -      *
> > -      * This is called with the folio over the specified file position held
> > -      * locked by the iomap code.
> > -      */
> > -     bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap);
> >  };
> >
> >  /*
> > --
> > 2.38.1
> >