[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50e6b21c644b050a29e159c9484a5e01061434f6.camel@hammerspace.com>
Date: Wed, 26 Feb 2025 02:34:49 +0000
From: Trond Myklebust <trondmy@...merspace.com>
To: "neilb@...e.de" <neilb@...e.de>
CC: "xiubli@...hat.com" <xiubli@...hat.com>, "brauner@...nel.org"
<brauner@...nel.org>, "idryomov@...il.com" <idryomov@...il.com>,
"okorniev@...hat.com" <okorniev@...hat.com>, "linux-cifs@...r.kernel.org"
<linux-cifs@...r.kernel.org>, "Dai.Ngo@...cle.com" <Dai.Ngo@...cle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"johannes@...solutions.net" <johannes@...solutions.net>,
"chuck.lever@...cle.com" <chuck.lever@...cle.com>, "jlayton@...nel.org"
<jlayton@...nel.org>, "anna@...nel.org" <anna@...nel.org>,
"miklos@...redi.hu" <miklos@...redi.hu>, "anton.ivanov@...bridgegreys.com"
<anton.ivanov@...bridgegreys.com>, "viro@...iv.linux.org.uk"
<viro@...iv.linux.org.uk>, "jack@...e.cz" <jack@...e.cz>, "tom@...pey.com"
<tom@...pey.com>, "richard@....at" <richard@....at>,
"linux-um@...ts.infradead.org" <linux-um@...ts.infradead.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"netfs@...ts.linux.dev" <netfs@...ts.linux.dev>, "linux-nfs@...r.kernel.org"
<linux-nfs@...r.kernel.org>, "ceph-devel@...r.kernel.org"
<ceph-devel@...r.kernel.org>, "senozhatsky@...omium.org"
<senozhatsky@...omium.org>
Subject: Re: [PATCH 1/6] Change inode_operations.mkdir to return struct dentry
*
On Wed, 2025-02-26 at 13:09 +1100, NeilBrown wrote:
> On Tue, 25 Feb 2025, Trond Myklebust wrote:
> > On Mon, 2025-02-24 at 14:09 +1100, NeilBrown wrote:
> > > On Mon, 24 Feb 2025, Al Viro wrote:
> > > > On Mon, Feb 24, 2025 at 12:34:06PM +1100, NeilBrown wrote:
> > > > > On Sat, 22 Feb 2025, Al Viro wrote:
> > > > > > On Fri, Feb 21, 2025 at 10:36:30AM +1100, NeilBrown wrote:
> > > > > >
> > > > > > > +In general, filesystems which use d_instantiate_new() to
> > > > > > > install the new
> > > > > > > +inode can safely return NULL. Filesystems which may not
> > > > > > > have an I_NEW inode
> > > > > > > +should use d_drop();d_splice_alias() and return the
> > > > > > > result
> > > > > > > of the latter.
> > > > > >
> > > > > > IMO that's a bad pattern, _especially_ if you want to go
> > > > > > for
> > > > > > "in-update"
> > > > > > kind of stuff later.
> > > > >
> > > > > Agreed. I have a draft patch to change d_splice_alias() and
> > > > > d_exact_alias() to work on hashed dentrys. I thought it
> > > > > should
> > > > > go after
> > > > > these mkdir patches rather than before.
> > > >
> > > > Could you give a braindump on the things d_exact_alias() is
> > > > needed
> > > > for?
> > > > It's a recurring headache when doing ->d_name/->d_parent
> > > > audits;
> > > > see e.g.
> > > > https://lore.kernel.org/all/20241213080023.GI3387508@ZenIV/ for
> > > > related
> > > > mini-rant from the latest iteration.
> > > >
> > > > Proof of correctness is bloody awful; it feels like the
> > > > primitive
> > > > itself
> > > > is wrong, but I'd never been able to write anything concise
> > > > regarding
> > > > the things we really want there ;-/
> > > >
> > >
> > > As I understand it, it is needed (or wanted) to handle the
> > > possibility
> > > of an inode becoming "stale" and then recovering. This could
> > > happen,
> > > for example, with a temporarily misconfigured NFS server.
> > >
> > > If ->d_revalidate gets a NFSERR_STALE from the server it will
> > > return
> > > '0'
> > > so lookup_fast() and others will call d_invalidate() which will
> > > d_drop()
> > > the dentry. There are other paths on which -ESTALE can result in
> > > d_drop().
> > >
> > > If a subsequent attempt to "open" the name successfully finds the
> > > same
> > > inode we want to reuse the old dentry rather than create a new
> > > one.
> > >
> > > I don't really understand why. This code was added 20 years ago
> > > before
> > > git.
> > > It was introduced by
> > >
> > > commit 89a45174b6b32596ea98fa3f89a243e2c1188a01
> > > Author: Trond Myklebust <trond.myklebust@....uio.no>
> > > Date: Tue Jan 4 21:41:37 2005 +0100
> > >
> > > VFS: Avoid dentry aliasing problems in filesystems like NFS,
> > > where
> > > inodes may be marked as stale in one instance (causing
> > > the
> > > dentry
> > > to be dropped) then re-enabled in the next instance.
> > >
> > > Signed-off-by: Trond Myklebust <trond.myklebust@....uio.no>
> > >
> > > in history.git
> > >
> > > Trond: do you have any memory of this? Can you explain what the
> > > symptom
> > > was that you wanted to fix?
> > >
> > > The original patch used d_add_unique() for lookup and atomic_open
> > > and
> > > readdir prime-dcache. We now only use it for v4 atomic_open.
> > > Maybe
> > > we
> > > don't need it at all? Or maybe we need to restore it to those
> > > other
> > > callers?
> > >
> >
> > 2005? That looks like it was trying to deal with the userspace NFS
> > server. I can't remember when it was given the ability to use the
> > inode
> > generation counter, but I'm pretty sure that in 2005 there were
> > plenty
> > of setups out there that had the older version that reused
> > filehandles
> > (due to inode number reuse). So you would get spurious ESTALE
> > errors
> > sometimes due to inode number reuse, sometimes because the
> > filehandle
> > fell out of the userspace NFS server's cache.
>
> So this was likely done to work-around known weaknesses in NFS
> servers
> at the time.
>
> The original d_add_unique() was used in nfs_lookup()
> nfs_atomic_lookup()
> and nfs_readdir_lookup() but the current descendent d_exact_alias()
> is
> only used in _nfs4_open_and_get_state() called only by nfs4_do_open()
> which is only used in nfs4_atomic_open() and nfs4_proc_create().
>
> So the usage in 'lookup' and 'readdir' have fallen by the wayside
> with
> no apparent negative consequences.
> The old NFS servers have probably been fixed.
>
> So do you have any concerns with us discarding d_exact_alias() and
> only
> using d_splice_alias() in _nfs4_open_get_state() ??
>
AFAIK, there were never any NFSv4 servers in public use that mimicked
the quirks of the userspace NFSv2/NFSv3 server. So I'm thinking it
should be safe to retire d_exact_alias.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@...merspace.com
Powered by blists - more mailing lists