[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150912062454.2283d4b6@tlielax.poochiereds.net>
Date: Sat, 12 Sep 2015 06:24:54 -0400
From: Jeff Layton <jlayton@...chiereds.net>
To: "Dilger, Andreas" <andreas.dilger@...el.com>
Cc: "bfields@...ldses.org" <bfields@...ldses.org>,
"cluster-devel@...hat.com" <cluster-devel@...hat.com>,
"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
"fuse-devel@...ts.sourceforge.net" <fuse-devel@...ts.sourceforge.net>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"HPDD-discuss@...ts.01.org" <HPDD-discuss@...ts.01.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>,
"ocfs2-devel@....oracle.com" <ocfs2-devel@....oracle.com>
Subject: Re: [HPDD-discuss] [PATCH] nfsd: add a new EXPORT_OP_NOWCC flag to
struct export_operations
On Sat, 12 Sep 2015 04:41:33 +0000
"Dilger, Andreas" <andreas.dilger@...el.com> wrote:
> On 2015/09/11, 4:20 AM, "HPDD-discuss on behalf of Jeff Layton"
> <hpdd-discuss-bounces@...ts.01.org on behalf of jlayton@...chiereds.net>
> wrote:
>
> >With NFSv3 nfsd will always attempt to send along WCC data to the
> >client. This generally involves saving off the in-core inode information
> >prior to doing the operation on the given filehandle, and then issuing a
> >vfs_getattr to it after the op.
> >
> >Some filesystems (particularly clustered or networked ones) have an
> >expensive ->getattr inode operation. Atomicitiy is also often difficult
> >or impossible to guarantee on such filesystems. For those, we're best
> >off not trying to provide WCC information to the client at all, and to
> >simply allow it to poll for that information as needed with a GETATTR
> >RPC.
> >
> >This patch adds a new flags field to struct export_operations, and
> >defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate
> >that nfsd should not attempt to provide WCC info in NFSv3 replies. It
> >also adds a blurb about the new flags field and flag to the exporting
> >documentation.
> >
> >The server will also now skip collecting this information for NFSv2 as
> >well, since that info is never used there anyway.
> >
> >Note that this patch does not add this flag to any filesystem
> >export_operations structures. This was originally developed to allow
> >reexporting nfs via nfsd. That code is not (and may never be) suitable
> >for merging into mainline.
> >
> >Other filesystems may want to consider enabling this flag too. It's hard
> >to tell however which ones have export operations to enable export via
> >knfsd and which ones mostly rely on them for open-by-filehandle support,
> >so I'm leaving that up to the individual maintainers to decide. I am
> >cc'ing the relevant lists for those filesystems that I think may want to
> >consider adding this though.
> >
> >Cc: HPDD-discuss@...ts.01.org
> >Cc: ceph-devel@...r.kernel.org
> >Cc: cluster-devel@...hat.com
> >Cc: fuse-devel@...ts.sourceforge.net
> >Cc: ocfs2-devel@....oracle.com
> >Signed-off-by: Jeff Layton <jeff.layton@...marydata.com>
> >---
> > Documentation/filesystems/nfs/Exporting | 27 +++++++++++++++++++++++++++
> > fs/nfsd/nfs3xdr.c | 5 ++++-
> > fs/nfsd/nfsfh.c | 14 ++++++++++++++
> > fs/nfsd/nfsfh.h | 5 ++++-
> > include/linux/exportfs.h | 2 ++
> > 5 files changed, 51 insertions(+), 2 deletions(-)
> >
> >diff --git a/Documentation/filesystems/nfs/Exporting
> >b/Documentation/filesystems/nfs/Exporting
> >index 520a4becb75c..fa636cde3907 100644
> >--- a/Documentation/filesystems/nfs/Exporting
> >+++ b/Documentation/filesystems/nfs/Exporting
> >@@ -138,6 +138,11 @@ struct which has the following members:
> > to find potential names, and matches inode numbers to find the
> >correct
> > match.
> >
> >+ flags
> >+ Some filesystems may need to be handled differently than others. The
> >+ export_operations struct also includes a flags field that allows the
> >+ filesystem to communicate such information to nfsd. See the Export
> >+ Operations Flags section below for more explanation.
> >
> > A filehandle fragment consists of an array of 1 or more 4byte words,
> > together with a one byte "type".
> >@@ -147,3 +152,25 @@ generated by encode_fh, in which case it will have
> >been padded with
> > nuls. Rather, the encode_fh routine should choose a "type" which
> > indicates the decode_fh how much of the filehandle is valid, and how
> > it should be interpreted.
> >+
> >+Export Operations Flags
> >+-----------------------
> >+In addition to the operation vector pointers, struct export_operations
> >also
> >+contains a "flags" field that allows the filesystem to communicate to
> >nfsd
> >+that it may want to do things differently when dealing with it. The
> >+following flags are defined:
> >+
> >+ EXPORT_OP_NOWCC
> >+ RFC 1813 recommends that servers always send weak cache consistency
> >+ (WCC) data to the client after each operation. The server should
> >+ atomically collect attributes about the inode, do an operation on it,
> >+ and then collect the attributes afterward. This allows the client to
> >+ skip issuing GETATTRs in some situations but means that the server
> >+ is calling vfs_getattr for almost all RPCs. On some filesystems
> >+ (particularly those that are clustered or networked) this is
> >expensive
> >+ and atomicity is difficult to guarantee. This flag indicates to nfsd
> >+ that it should skip providing WCC attributes to the client in NFSv3
> >+ replies when doing operations on this filesystem. Consider enabling
> >+ this on filesystems that have an expensive ->getattr inode operation,
> >+ or when atomicity between pre and post operation attribute collection
> >+ is impossible to guarantee.
> >diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> >index 01dcd494f781..c30c8c604e2a 100644
> >--- a/fs/nfsd/nfs3xdr.c
> >+++ b/fs/nfsd/nfs3xdr.c
> >@@ -203,7 +203,7 @@ static __be32 *
> > encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh
> >*fhp)
> > {
> > struct dentry *dentry = fhp->fh_dentry;
> >- if (dentry && d_really_is_positive(dentry)) {
> >+ if (!fhp->fh_no_wcc && dentry && d_really_is_positive(dentry)) {
> > __be32 err;
> > struct kstat stat;
> >
> >@@ -256,6 +256,9 @@ void fill_post_wcc(struct svc_fh *fhp)
> > {
> > __be32 err;
> >
> >+ if (fhp->fh_no_wcc)
> >+ return;
> >+
> > if (fhp->fh_post_saved)
> > printk("nfsd: inode locked twice during operation.\n");
> >
> >diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> >index 350041a40fe5..29ae37f62b9b 100644
> >--- a/fs/nfsd/nfsfh.c
> >+++ b/fs/nfsd/nfsfh.c
> >@@ -267,6 +267,16 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst
> >*rqstp, struct svc_fh *fhp)
> >
> > fhp->fh_dentry = dentry;
> > fhp->fh_export = exp;
> >+
> >+ switch (rqstp->rq_vers) {
> >+ case 3:
> >+ if (!(dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC))
> >+ break;
> >+ /* Fallthrough */
> >+ case 2:
> >+ fhp->fh_no_wcc = true;
> >+ }
> >+
> > return 0;
> > out:
> > exp_put(exp);
> >@@ -535,6 +545,9 @@ fh_compose(struct svc_fh *fhp, struct svc_export
> >*exp, struct dentry *dentry,
> > */
> > set_version_and_fsid_type(fhp, exp, ref_fh);
> >
> >+ /* If we have a ref_fh, then copy the fh_no_wcc setting from it. */
> >+ fhp->fh_no_wcc = ref_fh ? ref_fh->fh_no_wcc : false;
> >+
> > if (ref_fh == fhp)
> > fh_put(ref_fh);
> >
> >@@ -641,6 +654,7 @@ fh_put(struct svc_fh *fhp)
> > exp_put(exp);
> > fhp->fh_export = NULL;
> > }
> >+ fhp->fh_no_wcc = false;
> > return;
> > }
> >
> >diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> >index 1e90dad4926b..9ddead4d98f8 100644
> >--- a/fs/nfsd/nfsfh.h
> >+++ b/fs/nfsd/nfsfh.h
> >@@ -32,6 +32,7 @@ typedef struct svc_fh {
> >
> > unsigned char fh_locked; /* inode locked by us */
> > unsigned char fh_want_write; /* remount protection taken */
> >+ bool fh_no_wcc; /* no wcc data needed */
>
> This increases the size of svc_fh because it splits the four unsigned
> chars.
> You could change all of these (fh_locked, fh_want_write,
> fh_{pre,post}saved)
> to be bools to avoid that and make it more clear they are only used as
> booleans (I verified that they all are only assigned 0 or 1).
>
I don't think it matters, at least not on x86_64. bools and chars both
require a byte. pahole does show this adding a new hole, but that's
just because this brings the code up to 5 flags and the next field
(fh_pre_size) needs to be aligned.
I do agree that replacing those other unsigned chars with bools is more
clear however. Maybe we should even replace them all with a single
unsigned int and use bitops to set flags in there. That would be more
space efficient now that we're at 5 flags.
> Cheers, Andreas
>
> >
> > #ifdef CONFIG_NFSD_V3
> > unsigned char fh_post_saved; /* post-op attrs saved */
> >@@ -51,7 +52,6 @@ typedef struct svc_fh {
> > struct kstat fh_post_attr; /* full attrs after operation */
> > u64 fh_post_change; /* nfsv4 change; see above */
> > #endif /* CONFIG_NFSD_V3 */
> >-
> > } svc_fh;
> >
> > enum nfsd_fsid {
> >@@ -225,6 +225,9 @@ fill_pre_wcc(struct svc_fh *fhp)
> > {
> > struct inode *inode;
> >
> >+ if (fhp->fh_no_wcc)
> >+ return;
> >+
> > inode = d_inode(fhp->fh_dentry);
> > if (!fhp->fh_pre_saved) {
> > fhp->fh_pre_mtime = inode->i_mtime;
> >diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
> >index fa05e04c5531..600c3fccc999 100644
> >--- a/include/linux/exportfs.h
> >+++ b/include/linux/exportfs.h
> >@@ -214,6 +214,8 @@ struct export_operations {
> > bool write, u32 *device_generation);
> > int (*commit_blocks)(struct inode *inode, struct iomap *iomaps,
> > int nr_iomaps, struct iattr *iattr);
> >+#define EXPORT_OP_NOWCC (0x1) /* Don't collect wcc data for NFSv3
> >replies */
> >+ unsigned long flags;
> > };
> >
> > extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
> >--
> >2.4.3
> >
> >_______________________________________________
> >HPDD-discuss mailing list
> >HPDD-discuss@...ts.01.org
> >https://lists.01.org/mailman/listinfo/hpdd-discuss
> >
>
>
> Cheers, Andreas
--
Jeff Layton <jlayton@...chiereds.net>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists