[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cdbd9c6917ab66164596b95dad90625f46221b70.camel@hammerspace.com>
Date: Wed, 5 Oct 2022 13:34:23 +0000
From: Trond Myklebust <trondmy@...merspace.com>
To: "jlayton@...nel.org" <jlayton@...nel.org>,
"neilb@...e.de" <neilb@...e.de>
CC: "zohar@...ux.ibm.com" <zohar@...ux.ibm.com>,
"djwong@...nel.org" <djwong@...nel.org>,
"xiubli@...hat.com" <xiubli@...hat.com>,
"brauner@...nel.org" <brauner@...nel.org>,
"bfields@...ldses.org" <bfields@...ldses.org>,
"linux-btrfs@...r.kernel.org" <linux-btrfs@...r.kernel.org>,
"linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
"david@...morbit.com" <david@...morbit.com>,
"fweimer@...hat.com" <fweimer@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"chuck.lever@...cle.com" <chuck.lever@...cle.com>,
"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
"tytso@....edu" <tytso@....edu>,
"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
"jack@...e.cz" <jack@...e.cz>,
"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"lczerner@...hat.com" <lczerner@...hat.com>,
"adilger.kernel@...ger.ca" <adilger.kernel@...ger.ca>,
"ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>
Subject: Re: [PATCH v6 6/9] nfsd: use the getattr operation to fetch i_version
On Wed, 2022-10-05 at 06:06 -0400, Jeff Layton wrote:
> On Tue, 2022-10-04 at 10:39 +1100, NeilBrown wrote:
> > On Fri, 30 Sep 2022, Jeff Layton wrote:
> > > Now that we can call into vfs_getattr to get the i_version field,
> > > use
> > > that facility to fetch it instead of doing it in
> > > nfsd4_change_attribute.
> > >
> > > Neil also pointed out recently that IS_I_VERSION directory
> > > operations
> > > are always logged, and so we only need to mitigate the rollback
> > > problem
> > > on regular files. Also, we don't need to factor in the ctime when
> > > reexporting NFS or Ceph.
> > >
> > > Set the STATX_VERSION (and BTIME) bits in the request when we're
> > > dealing
> > > with a v4 request. Then, instead of looking at IS_I_VERSION when
> > > generating the change attr, look at the result mask and only use
> > > it if
> > > STATX_VERSION is set. With this change, we can drop the
> > > fetch_iversion
> > > export operation as well.
> > >
> > > Move nfsd4_change_attribute into nfsfh.c, and change it to only
> > > factor
> > > in the ctime if it's a regular file and the fs doesn't advertise
> > > STATX_ATTR_VERSION_MONOTONIC.
> > >
> > > Signed-off-by: Jeff Layton <jlayton@...nel.org>
> > > ---
> > > fs/nfs/export.c | 7 -------
> > > fs/nfsd/nfs4xdr.c | 4 +++-
> > > fs/nfsd/nfsfh.c | 40
> > > ++++++++++++++++++++++++++++++++++++++++
> > > fs/nfsd/nfsfh.h | 29 +----------------------------
> > > fs/nfsd/vfs.h | 7 ++++++-
> > > include/linux/exportfs.h | 1 -
> > > 6 files changed, 50 insertions(+), 38 deletions(-)
> > >
> > > diff --git a/fs/nfs/export.c b/fs/nfs/export.c
> > > index 01596f2d0a1e..1a9d5aa51dfb 100644
> > > --- a/fs/nfs/export.c
> > > +++ b/fs/nfs/export.c
> > > @@ -145,17 +145,10 @@ nfs_get_parent(struct dentry *dentry)
> > > return parent;
> > > }
> > >
> > > -static u64 nfs_fetch_iversion(struct inode *inode)
> > > -{
> > > - nfs_revalidate_inode(inode, NFS_INO_INVALID_CHANGE);
> > > - return inode_peek_iversion_raw(inode);
> > > -}
> > > -
> > > const struct export_operations nfs_export_ops = {
> > > .encode_fh = nfs_encode_fh,
> > > .fh_to_dentry = nfs_fh_to_dentry,
> > > .get_parent = nfs_get_parent,
> > > - .fetch_iversion = nfs_fetch_iversion,
> > > .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK|
> > > EXPORT_OP_CLOSE_BEFORE_UNLINK|EXPORT_OP_REMOTE_FS
> > > |
> > > EXPORT_OP_NOATOMIC_ATTR,
> > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > > index 1e9690a061ec..779c009314c6 100644
> > > --- a/fs/nfsd/nfs4xdr.c
> > > +++ b/fs/nfsd/nfs4xdr.c
> > > @@ -2869,7 +2869,9 @@ nfsd4_encode_fattr(struct xdr_stream *xdr,
> > > struct svc_fh *fhp,
> > > goto out;
> > > }
> > >
> > > - err = vfs_getattr(&path, &stat, STATX_BASIC_STATS,
> > > AT_STATX_SYNC_AS_STAT);
> > > + err = vfs_getattr(&path, &stat,
> > > + STATX_BASIC_STATS | STATX_BTIME |
> > > STATX_VERSION,
> > > + AT_STATX_SYNC_AS_STAT);
> > > if (err)
> > > goto out_nfserr;
> > > if (!(stat.result_mask & STATX_BTIME))
> > > diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> > > index a5b71526cee0..9168bc657378 100644
> > > --- a/fs/nfsd/nfsfh.c
> > > +++ b/fs/nfsd/nfsfh.c
> > > @@ -634,6 +634,10 @@ void fh_fill_pre_attrs(struct svc_fh *fhp)
> > > stat.mtime = inode->i_mtime;
> > > stat.ctime = inode->i_ctime;
> > > stat.size = inode->i_size;
> > > + if (v4 && IS_I_VERSION(inode)) {
> > > + stat.version =
> > > inode_query_iversion(inode);
> > > + stat.result_mask |= STATX_VERSION;
> > > + }
> >
> > This is increasingly ugly. I wonder if it is justified at all...
> >
>
> I'm fine with dropping that. So if the getattrs fail, we should just
> not
> offer up pre/post attrs?
>
> > > }
> > > if (v4)
> > > fhp->fh_pre_change =
> > > nfsd4_change_attribute(&stat, inode);
> > > @@ -665,6 +669,8 @@ void fh_fill_post_attrs(struct svc_fh *fhp)
> > > if (err) {
> > > fhp->fh_post_saved = false;
> > > fhp->fh_post_attr.ctime = inode->i_ctime;
> > > + if (v4 && IS_I_VERSION(inode))
> > > + fhp->fh_post_attr.version =
> > > inode_query_iversion(inode);
> >
> > ... ditto ...
> >
> > > } else
> > > fhp->fh_post_saved = true;
> > > if (v4)
> > > @@ -754,3 +760,37 @@ enum fsid_source fsid_source(const struct
> > > svc_fh *fhp)
> > > return FSIDSOURCE_UUID;
> > > return FSIDSOURCE_DEV;
> > > }
> > > +
> > > +/*
> > > + * We could use i_version alone as the change attribute.
> > > However, i_version
> > > + * can go backwards on a regular file after an unclean
> > > shutdown. On its own
> > > + * that doesn't necessarily cause a problem, but if i_version
> > > goes backwards
> > > + * and then is incremented again it could reuse a value that was
> > > previously
> > > + * used before boot, and a client who queried the two values
> > > might incorrectly
> > > + * assume nothing changed.
> > > + *
> > > + * By using both ctime and the i_version counter we guarantee
> > > that as long as
> > > + * time doesn't go backwards we never reuse an old value. If the
> > > filesystem
> > > + * advertises STATX_ATTR_VERSION_MONOTONIC, then this mitigation
> > > is not needed.
> > > + *
> > > + * We only need to do this for regular files as well. For
> > > directories, we
> > > + * assume that the new change attr is always logged to stable
> > > storage in some
> > > + * fashion before the results can be seen.
> > > + */
> > > +u64 nfsd4_change_attribute(struct kstat *stat, struct inode
> > > *inode)
> > > +{
> > > + u64 chattr;
> > > +
> > > + if (stat->result_mask & STATX_VERSION) {
> > > + chattr = stat->version;
> > > +
> > > + if (S_ISREG(inode->i_mode) &&
> > > + !(stat->attributes &
> > > STATX_ATTR_VERSION_MONOTONIC)) {
> >
> > I would really rather that the fs got to make this decision.
> > If it can guarantee that the i_version is monotonic even over a
> > crash
> > (which is probably can for directory, and might need changes to do
> > for
> > files) then it sets STATX_ATTR_VERSION_MONOTONIC and nfsd trusts it
> > completely.
> > If it cannot, then it doesn't set the flag.
> > i.e. the S_ISREG() test should be in the filesystem, not in nfsd.
> >
>
> This sounds reasonable, but for one thing.
>
> From RFC 7862:
>
> While Section 5.4 of [RFC5661] discusses
> per-file system attributes, it is expected that the value of
> change_attr_type will not depend on the value of "homogeneous" and
> will only change in the event of a migration.
>
> The change_attr_type4 must be the same for all filehandles under a
> particular filesystem.
>
> If we do what you suggest though, then it's easily possible for the
> fs
> to set STATX_ATTR_VERSION_MONOTONIC on directories but not files. If
> we
> later want to allow nfsd to advertise a change_attr_type4, we won't
> be
> able to rely on the STATX_ATTR_VERSION_MONOTONIC to tell us how to
> fill
> that out.
That will break clients. So no, that's not acceptable.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@...merspace.com
Powered by blists - more mailing lists