[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-id: <176420378092.634289.15227073044036379500@noble.neil.brown.name>
Date: Thu, 27 Nov 2025 11:36:20 +1100
From: NeilBrown <neilb@...mail.net>
To: "Benjamin Coddington" <bcodding@...merspace.com>
Cc: "Alexander Viro" <viro@...iv.linux.org.uk>,
"Christian Brauner" <brauner@...nel.org>, "Jan Kara" <jack@...e.cz>,
"Chuck Lever" <chuck.lever@...cle.com>, "Jeff Layton" <jlayton@...nel.org>,
"Olga Kornievskaia" <okorniev@...hat.com>, "Dai Ngo" <Dai.Ngo@...cle.com>,
"Tom Talpey" <tom@...pey.com>, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-nfs@...r.kernel.org,
"Trond Myklebust" <trondmy@...nel.org>, "Mike Snitzer" <snitzer@...nel.org>
Subject: Re: [PATCH v1 0/3] Allow knfsd to use atomic_open()
On Thu, 27 Nov 2025, Benjamin Coddington wrote:
> On 26 Nov 2025, at 15:59, NeilBrown wrote:
>
> > On Fri, 21 Nov 2025, Benjamin Coddington wrote:
> >> On 20 Nov 2025, at 17:26, NeilBrown wrote:
> >>
> >>> On Wed, 19 Nov 2025, Benjamin Coddington wrote:
> >>>
> >>>> Ah, it's true. I did not validate knfsd's behaviors, only its interface with
> >>>> VFS. IIUC knfsd gets around needing to pass O_EXCL by holding the directory
> >>>> inode lock over the create, and since it doesn't need to do lookup because
> >>>> it already has a filehandle, I think O_EXCL is moot.
> >>>
> >>> Holding the directory lock is sufficient for providing O_EXCL for local
> >>> filesystems which will be blocked from creating while that lock is held.
> >>> It is *not* sufficient for remote filesystems which are precisely those
> >>> which provide ->atomic_open.
> >>>
> >>> The fact that you are adding support for atomic_open means that O_EXCL
> >>> isn't moot.
> >>
> >> I mean to say: knfsd doesn't need to pass O_EXCL because its already taking
> >> care to produce an exclusive open via nfsv4 semantics.
> >
> > Huh?
> >
> > The interesting circumstance here is an NFS re-export of an NFS
> > filesystem - is that right?
>
> That's right.
>
> > The only way that an exclusive create can be achieved on the target
> > filesystem is if an NFS4_CREATE_EXCLUSIVE4_1 (or similar) create request
> > is sent to the ultimate sever. There is nothing knfsd can do to
> > produce exclusive open semantics on a remote NFS serve except to
> > explicitly request them.
>
> True - but I haven't really been worried about that, so I think I see what
> you're getting at now - you'd like kNFSD to start using O_EXCL when it
> receives NFS4_CREATE_EXCLUSIVE4_1.
>
> I think that's a whole different change on its own, but not necessary
> here because these changes are targeting a very specific problem - the
> problem where open(O_CREAT) is done in two operations on the remote
> filesystem. That problem is solved by this patchset, and I don't think the
> solution is incomplete because we're not passing O_EXCL for the
> NFS4_CREATE_EXCLUSIVE{4_1} case. I think that's a new enhancement - one
> that I haven't thought through (yet) or tested.
>
> Up until now, kNFSD has not bothered fiddling with O_EXCL because of the
> reasons I listed above - for local filesystems or remote.
>
> Do you disagree that the changes here for the open(O_CREAT) problem is
> incomplete without new O_EXCL passing to atomic_open()?
It isn't so much that the change is incomplete. Rather, the change
introduces a regression.
The old code was
- error = vfs_create(mnt_idmap(path->mnt),
- d_inode(path->dentry->d_parent),
- path->dentry, mode, true);
Note the "true" at the end. This instructs nfs_create() to pass O_EXCL
to nfs_do_create() so an over-the-wire exclusive create is performed.
The new code is
+ dentry = atomic_open(path, dentry, file, flags, mode);
Where "flags" is oflags from nfsd4_vfs_create() which is
O_CREAT| O_LARGEFILE | O_(read/write/rdwr)
and no O_EXCL.
(When atomic_open is called by lookup_open, "open_flag" is passed which
might contain O_EXCL).
> If so, do we also
> need to consider passing O_EXCL when kNFSD does vfs_open() for the case when
> the filesystem does not have atomic_open()?
No as vfs_open() doesn't do the create, vfs_create() does that.
And we do need to pass (the equivalent of) O_EXCL when calling
vfs_create(). In fact we do - that 'true' as the large arg means
exactly O_EXCL.
(really we shouldn't be passing 'true' if an exclusive create wasn't
requested, but it only makes a difference for filesystems that support
->atomic_open, so it doesn't actually matter what we pass - and Jeff
has a patch to remove that last arg to vfs_create()).
>
> Thanks for engaging with me,
> Ben
>
Thanks,
NeilBrown
Powered by blists - more mailing lists