[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHpGcMJu47qQEFgzSvodbpJtPz71HFrG66wbrXR_6AdbkZsOOw@mail.gmail.com>
Date: Wed, 4 Jan 2023 20:02:56 +0100
From: Andreas Grünbacher <andreas.gruenbacher@...il.com>
To: "Darrick J. Wong" <djwong@...nel.org>
Cc: Andreas Gruenbacher <agruenba@...hat.com>,
Christoph Hellwig <hch@...radead.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Matthew Wilcox <willy@...radead.org>,
linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-ext4@...r.kernel.org, cluster-devel@...hat.com
Subject: Re: [PATCH v5 7/9] iomap/xfs: Eliminate the iomap_valid handler
Am Mi., 4. Jan. 2023 um 19:00 Uhr schrieb Darrick J. Wong <djwong@...nel.org>:
> On Sat, Dec 31, 2022 at 04:09:17PM +0100, Andreas Gruenbacher wrote:
> > Eliminate the ->iomap_valid() handler by switching to a ->get_folio()
> > handler and validating the mapping there.
> >
> > Signed-off-by: Andreas Gruenbacher <agruenba@...hat.com>
> > ---
> > fs/iomap/buffered-io.c | 25 +++++--------------------
> > fs/xfs/xfs_iomap.c | 37 ++++++++++++++++++++++++++-----------
> > include/linux/iomap.h | 22 +++++-----------------
> > 3 files changed, 36 insertions(+), 48 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 4f363d42dbaf..df6fca11f18c 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -630,7 +630,7 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
> > const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
> > const struct iomap *srcmap = iomap_iter_srcmap(iter);
> > struct folio *folio;
> > - int status = 0;
> > + int status;
> >
> > BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
> > if (srcmap != &iter->iomap)
> > @@ -646,27 +646,12 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
> > folio = page_ops->get_folio(iter, pos, len);
> > else
> > folio = iomap_get_folio(iter, pos);
> > - if (IS_ERR(folio))
> > - return PTR_ERR(folio);
> > -
> > - /*
> > - * Now we have a locked folio, before we do anything with it we need to
> > - * check that the iomap we have cached is not stale. The inode extent
> > - * mapping can change due to concurrent IO in flight (e.g.
> > - * IOMAP_UNWRITTEN state can change and memory reclaim could have
> > - * reclaimed a previously partially written page at this index after IO
> > - * completion before this write reaches this file offset) and hence we
> > - * could do the wrong thing here (zero a page range incorrectly or fail
> > - * to zero) and corrupt data.
> > - */
> > - if (page_ops && page_ops->iomap_valid) {
> > - bool iomap_valid = page_ops->iomap_valid(iter->inode,
> > - &iter->iomap);
> > - if (!iomap_valid) {
> > + if (IS_ERR(folio)) {
> > + if (folio == ERR_PTR(-ESTALE)) {
> > iter->iomap.flags |= IOMAP_F_STALE;
> > - status = 0;
> > - goto out_unlock;
> > + return 0;
> > }
> > + return PTR_ERR(folio);
>
> I wonder if this should be reworked a bit to reduce indenting:
>
> if (PTR_ERR(folio) == -ESTALE) {
> iter->iomap.flags |= IOMAP_F_STALE;
> return 0;
> }
> if (IS_ERR(folio))
> return PTR_ERR(folio);
>
> But I don't have any strong opinions about that.
This is a relatively hot path; the compiler would have to convert the
flattened version back to the nested version for the best possible
result to avoid a redundant check. Not sure it would always do that.
> > }
> >
> > if (pos + len > folio_pos(folio) + folio_size(folio))
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index 669c1bc5c3a7..d0bf99539180 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -62,29 +62,44 @@ xfs_iomap_inode_sequence(
> > return cookie | READ_ONCE(ip->i_df.if_seq);
> > }
> >
> > -/*
> > - * Check that the iomap passed to us is still valid for the given offset and
> > - * length.
> > - */
> > -static bool
> > -xfs_iomap_valid(
> > - struct inode *inode,
> > - const struct iomap *iomap)
> > +static struct folio *
> > +xfs_get_folio(
> > + struct iomap_iter *iter,
> > + loff_t pos,
> > + unsigned len)
> > {
> > + struct inode *inode = iter->inode;
> > + struct iomap *iomap = &iter->iomap;
> > struct xfs_inode *ip = XFS_I(inode);
> > + struct folio *folio;
> >
> > + folio = iomap_get_folio(iter, pos);
> > + if (IS_ERR(folio))
> > + return folio;
> > +
> > + /*
> > + * Now that we have a locked folio, we need to check that the iomap we
> > + * have cached is not stale. The inode extent mapping can change due to
> > + * concurrent IO in flight (e.g., IOMAP_UNWRITTEN state can change and
> > + * memory reclaim could have reclaimed a previously partially written
> > + * page at this index after IO completion before this write reaches
> > + * this file offset) and hence we could do the wrong thing here (zero a
> > + * page range incorrectly or fail to zero) and corrupt data.
> > + */
> > if (iomap->validity_cookie !=
> > xfs_iomap_inode_sequence(ip, iomap->flags)) {
> > trace_xfs_iomap_invalid(ip, iomap);
> > - return false;
> > + folio_unlock(folio);
> > + folio_put(folio);
> > + return ERR_PTR(-ESTALE);
> > }
> >
> > XFS_ERRORTAG_DELAY(ip->i_mount, XFS_ERRTAG_WRITE_DELAY_MS);
> > - return true;
> > + return folio;
> > }
> >
> > const struct iomap_page_ops xfs_iomap_page_ops = {
> > - .iomap_valid = xfs_iomap_valid,
> > + .get_folio = xfs_get_folio,
> > };
> >
> > int
> > diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> > index dd3575ada5d1..6f8e3321e475 100644
> > --- a/include/linux/iomap.h
> > +++ b/include/linux/iomap.h
> > @@ -134,29 +134,17 @@ static inline bool iomap_inline_data_valid(const struct iomap *iomap)
> > * When get_folio succeeds, put_folio will always be called to do any
> > * cleanup work necessary. put_folio is responsible for unlocking and putting
> > * @folio.
> > + *
> > + * When an iomap is created, the filesystem can store internal state (e.g., a
> > + * sequence number) in iomap->validity_cookie. When it then detects in the
>
> I would reword this to:
>
> "When an iomap is created, the filesystem can store internal state (e.g.
> sequence number) in iomap->validity_cookie. The get_folio handler can
> use this validity cookie to detect if the iomap is no longer up to date
> and needs to be refreshed. If this is the case, the function should
> return ERR_PTR(-ESTALE) to retry the operation with a fresh mapping."
Yes, but "e.g." is always followed by a comma and it's "when", not
"if" here. So how about:
* When an iomap is created, the filesystem can store internal state (e.g., a
* sequence number) in iomap->validity_cookie. The get_folio handler can use
* this validity cookie to detect when the iomap needs to be refreshed because
* it is no longer up to date. In that case, the function should return
* ERR_PTR(-ESTALE) to retry the operation with a fresh mapping.
I've incorporated all your feedback into this branch for now:
https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/log/?h=iomap-race
Thank you,
Andreas
> --D
>
> > + * get_folio handler that the iomap is no longer up to date and needs to be
> > + * refreshed, it can return ERR_PTR(-ESTALE) to trigger a retry.
> > */
> > struct iomap_page_ops {
> > struct folio *(*get_folio)(struct iomap_iter *iter, loff_t pos,
> > unsigned len);
> > void (*put_folio)(struct inode *inode, loff_t pos, unsigned copied,
> > struct folio *folio);
> > -
> > - /*
> > - * Check that the cached iomap still maps correctly to the filesystem's
> > - * internal extent map. FS internal extent maps can change while iomap
> > - * is iterating a cached iomap, so this hook allows iomap to detect that
> > - * the iomap needs to be refreshed during a long running write
> > - * operation.
> > - *
> > - * The filesystem can store internal state (e.g. a sequence number) in
> > - * iomap->validity_cookie when the iomap is first mapped to be able to
> > - * detect changes between mapping time and whenever .iomap_valid() is
> > - * called.
> > - *
> > - * This is called with the folio over the specified file position held
> > - * locked by the iomap code.
> > - */
> > - bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap);
> > };
> >
> > /*
> > --
> > 2.38.1
> >
Powered by blists - more mailing lists