[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87a10c8f00f5e803723811a7c7a67e03c6cbabb8.camel@kernel.org>
Date: Mon, 17 Nov 2025 17:41:49 -0500
From: Trond Myklebust <trondmy@...nel.org>
To: Aiden Lambert <alambert48@...ech.edu>
Cc: anna@...nel.org, chuck.lever@...cle.org, jlayton@...nel.org,
linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] NFS: ensure nfs_safe_remove() atomic nlink drop
On Mon, 2025-11-17 at 13:57 -0500, Aiden Lambert wrote:
> On Mon, Nov 17, 2025 at 01:24:54PM -0500, Trond Myklebust wrote:
> > On Mon, 2025-11-17 at 13:03 -0500, Aiden Lambert wrote:
> > > A race condition occurs when both unlink() and link() are running
> > > concurrently on the same inode, and the nlink count from the nfs
> > > server
> > > received in link()->nfs_do_access() clobbers the nlink count of
> > > the
> > > inode in nfs_safe_remove() after the "remove" RPC is made to the
> > > server
> > > but before we decrement the link count. If the nlink value from
> > > nfs_do_access() reflects the decremented nlink of the "remove"
> > > RPC, a
> > > double decrement occurs, which can lead to the dropping of the
> > > client
> > > side inode, causing the link call to return ENOENT. To fix this,
> > > we
> > > record an expected nlink value before the "remove" RPC and
> > > compare it
> > > with the value afterwards---if these two are the same, the drop
> > > is
> > > performed. Note that this does not take into account nlink values
> > > that
> > > are a result of multi-client (un)link operations as these are not
> > > guaranteed to be atomic by the NFS spec.
> >
> >
> > Why do we end up running nfs_do_access() at all in the above test?
> > That
> > sounds like a bug. We shouldn't ever need to validate if we can
> > create
> > or delete things using ACCESS. That just ends up producing an
> > unnecessary TOCTOU race.
> >
> >
>
> It seems that the call chain is
> 1. vfs_link()
> 2. may_create()
> 3. inode_permission()/nfs_permission() which fails to get a cached
> value as the cache is
> invalidated by the (un)link operations
> 4. nfs_do_access()
I'm still confused. Why wouldn't the S_IFDIR case in nfs_permission()
be filtering away the call to nfs_do_access()?
Either way, I think this cannot be fixed without treating it as a
generic attribute race. Your patch is sort of correct, but there are
loopholes that it won't catch either, so let's just fix it the way that
we fix all attribute races: by using the generation counter to tell us
when such a race may have occurred.
8<---------------------------------------
>From bd4928ec799b31c492eb63f9f4a0c1e0bb4bb3f7 Mon Sep 17 00:00:00 2001
Message-ID: <bd4928ec799b31c492eb63f9f4a0c1e0bb4bb3f7.1763418720.git.trond.myklebust@...merspace.com>
From: Trond Myklebust <trond.myklebust@...merspace.com>
Date: Mon, 17 Nov 2025 15:28:17 -0500
Subject: [PATCH] NFS: Avoid changing nlink when file removes and attribute
updates race
If a file removal races with another operation that updates its
attributes, then skip the change to nlink, and just mark the attributes
as being stale.
Reported-by: Aiden Lambert <alambert48@...ech.edu>
Fixes: 59a707b0d42e ("NFS: Ensure we revalidate the inode correctly after remove or rename")
Signed-off-by: Trond Myklebust <trond.myklebust@...merspace.com>
---
fs/nfs/dir.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index ea9f6ca8f30f..d557b0443e8b 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1894,13 +1894,15 @@ static int nfs_dentry_delete(const struct dentry *dentry)
}
/* Ensure that we revalidate inode->i_nlink */
-static void nfs_drop_nlink(struct inode *inode)
+static void nfs_drop_nlink(struct inode *inode, unsigned long gencount)
{
+ struct nfs_inode *nfsi = NFS_I(inode);
+
spin_lock(&inode->i_lock);
/* drop the inode if we're reasonably sure this is the last link */
- if (inode->i_nlink > 0)
+ if (inode->i_nlink > 0 && gencount == nfsi->attr_gencount)
drop_nlink(inode);
- NFS_I(inode)->attr_gencount = nfs_inc_attr_generation_counter();
+ nfsi->attr_gencount = nfs_inc_attr_generation_counter();
nfs_set_cache_invalid(
inode, NFS_INO_INVALID_CHANGE | NFS_INO_INVALID_CTIME |
NFS_INO_INVALID_NLINK);
@@ -1914,8 +1916,9 @@ static void nfs_drop_nlink(struct inode *inode)
static void nfs_dentry_iput(struct dentry *dentry, struct inode *inode)
{
if (dentry->d_flags & DCACHE_NFSFS_RENAMED) {
+ unsigned long gencount = READ_ONCE(NFS_I(inode)->attr_gencount);
nfs_complete_unlink(dentry, inode);
- nfs_drop_nlink(inode);
+ nfs_drop_nlink(inode, gencount);
}
iput(inode);
}
@@ -2507,9 +2510,11 @@ static int nfs_safe_remove(struct dentry *dentry)
trace_nfs_remove_enter(dir, dentry);
if (inode != NULL) {
+ unsigned long gencount = READ_ONCE(NFS_I(inode)->attr_gencount);
+
error = NFS_PROTO(dir)->remove(dir, dentry);
if (error == 0)
- nfs_drop_nlink(inode);
+ nfs_drop_nlink(inode, gencount);
} else
error = NFS_PROTO(dir)->remove(dir, dentry);
if (error == -ENOENT)
@@ -2709,6 +2714,7 @@ int nfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
{
struct inode *old_inode = d_inode(old_dentry);
struct inode *new_inode = d_inode(new_dentry);
+ unsigned long new_gencount = 0;
struct dentry *dentry = NULL;
struct rpc_task *task;
bool must_unblock = false;
@@ -2761,6 +2767,7 @@ int nfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
} else {
block_revalidate(new_dentry);
must_unblock = true;
+ new_gencount = NFS_I(new_inode)->attr_gencount;
spin_unlock(&new_dentry->d_lock);
}
@@ -2800,7 +2807,7 @@ int nfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
new_dir, new_dentry, error);
if (!error) {
if (new_inode != NULL)
- nfs_drop_nlink(new_inode);
+ nfs_drop_nlink(new_inode, new_gencount);
/*
* The d_move() should be here instead of in an async RPC completion
* handler because we need the proper locks to move the dentry. If
--
2.51.1
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@...nel.org, trond.myklebust@...merspace.com
Powered by blists - more mailing lists