[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <174053988113.102979.18024415194793753569@noble.neil.brown.name>
Date: Wed, 26 Feb 2025 14:18:01 +1100
From: "NeilBrown" <neilb@...e.de>
To: "Trond Myklebust" <trondmy@...merspace.com>
Cc: "xiubli@...hat.com" <xiubli@...hat.com>,
"brauner@...nel.org" <brauner@...nel.org>,
"idryomov@...il.com" <idryomov@...il.com>,
"okorniev@...hat.com" <okorniev@...hat.com>,
"linux-cifs@...r.kernel.org" <linux-cifs@...r.kernel.org>,
"Dai.Ngo@...cle.com" <Dai.Ngo@...cle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"johannes@...solutions.net" <johannes@...solutions.net>,
"chuck.lever@...cle.com" <chuck.lever@...cle.com>,
"jlayton@...nel.org" <jlayton@...nel.org>,
"anna@...nel.org" <anna@...nel.org>, "miklos@...redi.hu" <miklos@...redi.hu>,
"anton.ivanov@...bridgegreys.com" <anton.ivanov@...bridgegreys.com>,
"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
"jack@...e.cz" <jack@...e.cz>, "tom@...pey.com" <tom@...pey.com>,
"richard@....at" <richard@....at>,
"linux-um@...ts.infradead.org" <linux-um@...ts.infradead.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"netfs@...ts.linux.dev" <netfs@...ts.linux.dev>,
"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
"ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>,
"senozhatsky@...omium.org" <senozhatsky@...omium.org>
Subject:
Re: [PATCH 1/6] Change inode_operations.mkdir to return struct dentry *
On Wed, 26 Feb 2025, Trond Myklebust wrote:
> On Wed, 2025-02-26 at 13:09 +1100, NeilBrown wrote:
> > On Tue, 25 Feb 2025, Trond Myklebust wrote:
> > > On Mon, 2025-02-24 at 14:09 +1100, NeilBrown wrote:
> > > > On Mon, 24 Feb 2025, Al Viro wrote:
> > > > > On Mon, Feb 24, 2025 at 12:34:06PM +1100, NeilBrown wrote:
> > > > > > On Sat, 22 Feb 2025, Al Viro wrote:
> > > > > > > On Fri, Feb 21, 2025 at 10:36:30AM +1100, NeilBrown wrote:
> > > > > > >
> > > > > > > > +In general, filesystems which use d_instantiate_new() to
> > > > > > > > install the new
> > > > > > > > +inode can safely return NULL. Filesystems which may not
> > > > > > > > have an I_NEW inode
> > > > > > > > +should use d_drop();d_splice_alias() and return the
> > > > > > > > result
> > > > > > > > of the latter.
> > > > > > >
> > > > > > > IMO that's a bad pattern, _especially_ if you want to go
> > > > > > > for
> > > > > > > "in-update"
> > > > > > > kind of stuff later.
> > > > > >
> > > > > > Agreed. I have a draft patch to change d_splice_alias() and
> > > > > > d_exact_alias() to work on hashed dentrys. I thought it
> > > > > > should
> > > > > > go after
> > > > > > these mkdir patches rather than before.
> > > > >
> > > > > Could you give a braindump on the things d_exact_alias() is
> > > > > needed
> > > > > for?
> > > > > It's a recurring headache when doing ->d_name/->d_parent
> > > > > audits;
> > > > > see e.g.
> > > > > https://lore.kernel.org/all/20241213080023.GI3387508@ZenIV/ for
> > > > > related
> > > > > mini-rant from the latest iteration.
> > > > >
> > > > > Proof of correctness is bloody awful; it feels like the
> > > > > primitive
> > > > > itself
> > > > > is wrong, but I'd never been able to write anything concise
> > > > > regarding
> > > > > the things we really want there ;-/
> > > > >
> > > >
> > > > As I understand it, it is needed (or wanted) to handle the
> > > > possibility
> > > > of an inode becoming "stale" and then recovering. This could
> > > > happen,
> > > > for example, with a temporarily misconfigured NFS server.
> > > >
> > > > If ->d_revalidate gets a NFSERR_STALE from the server it will
> > > > return
> > > > '0'
> > > > so lookup_fast() and others will call d_invalidate() which will
> > > > d_drop()
> > > > the dentry. There are other paths on which -ESTALE can result in
> > > > d_drop().
> > > >
> > > > If a subsequent attempt to "open" the name successfully finds the
> > > > same
> > > > inode we want to reuse the old dentry rather than create a new
> > > > one.
> > > >
> > > > I don't really understand why. This code was added 20 years ago
> > > > before
> > > > git.
> > > > It was introduced by
> > > >
> > > > commit 89a45174b6b32596ea98fa3f89a243e2c1188a01
> > > > Author: Trond Myklebust <trond.myklebust@....uio.no>
> > > > Date: Tue Jan 4 21:41:37 2005 +0100
> > > >
> > > > VFS: Avoid dentry aliasing problems in filesystems like NFS,
> > > > where
> > > > inodes may be marked as stale in one instance (causing
> > > > the
> > > > dentry
> > > > to be dropped) then re-enabled in the next instance.
> > > >
> > > > Signed-off-by: Trond Myklebust <trond.myklebust@....uio.no>
> > > >
> > > > in history.git
> > > >
> > > > Trond: do you have any memory of this? Can you explain what the
> > > > symptom
> > > > was that you wanted to fix?
> > > >
> > > > The original patch used d_add_unique() for lookup and atomic_open
> > > > and
> > > > readdir prime-dcache. We now only use it for v4 atomic_open.
> > > > Maybe
> > > > we
> > > > don't need it at all? Or maybe we need to restore it to those
> > > > other
> > > > callers?
> > > >
> > >
> > > 2005? That looks like it was trying to deal with the userspace NFS
> > > server. I can't remember when it was given the ability to use the
> > > inode
> > > generation counter, but I'm pretty sure that in 2005 there were
> > > plenty
> > > of setups out there that had the older version that reused
> > > filehandles
> > > (due to inode number reuse). So you would get spurious ESTALE
> > > errors
> > > sometimes due to inode number reuse, sometimes because the
> > > filehandle
> > > fell out of the userspace NFS server's cache.
> >
> > So this was likely done to work-around known weaknesses in NFS
> > servers
> > at the time.
> >
> > The original d_add_unique() was used in nfs_lookup()
> > nfs_atomic_lookup()
> > and nfs_readdir_lookup() but the current descendent d_exact_alias()
> > is
> > only used in _nfs4_open_and_get_state() called only by nfs4_do_open()
> > which is only used in nfs4_atomic_open() and nfs4_proc_create().
> >
> > So the usage in 'lookup' and 'readdir' have fallen by the wayside
> > with
> > no apparent negative consequences.
> > The old NFS servers have probably been fixed.
> >
> > So do you have any concerns with us discarding d_exact_alias() and
> > only
> > using d_splice_alias() in _nfs4_open_get_state() ??
> >
>
> AFAIK, there were never any NFSv4 servers in public use that mimicked
> the quirks of the userspace NFSv2/NFSv3 server. So I'm thinking it
> should be safe to retire d_exact_alias.
Thanks. I'll submit a patch through the VFS tree as I have other VFS
patches in the works that will depend on that so having them together
would be good.
Thanks,
NeilBrown
Powered by blists - more mailing lists