[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <165534094600.26404.4349155093299535793@noble.neil.brown.name>
Date: Thu, 16 Jun 2022 10:55:46 +1000
From: "NeilBrown" <neilb@...e.de>
To: "Daire Byrne" <daire@...g.com>
Cc: "Al Viro" <viro@...iv.linux.org.uk>,
"Trond Myklebust" <trond.myklebust@...merspace.com>,
"Chuck Lever" <chuck.lever@...cle.com>,
"Linux NFS Mailing List" <linux-nfs@...r.kernel.org>,
linux-fsdevel@...r.kernel.org,
"LKML" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH RFC 00/12] Allow concurrent directory updates.
On Wed, 15 Jun 2022, Daire Byrne wrote:
...
> With the patch, the aggregate increases to 15 creates/s for 10 clients
> which again matches the results of a single patched client. Not quite
> a x10 increase but a healthy improvement nonetheless.
Great!
>
> However, it is at this point that I started to experience some
> stability issues with the re-export server that are not present with
> the vanilla unpatched v5.19-rc2 kernel. In particular the knfsd
> threads start to lock up with stack traces like this:
>
> [ 1234.460696] INFO: task nfsd:5514 blocked for more than 123 seconds.
> [ 1234.461481] Tainted: G W E 5.19.0-1.dneg.x86_64 #1
> [ 1234.462289] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 1234.463227] task:nfsd state:D stack: 0 pid: 5514
> ppid: 2 flags:0x00004000
> [ 1234.464212] Call Trace:
> [ 1234.464677] <TASK>
> [ 1234.465104] __schedule+0x2a9/0x8a0
> [ 1234.465663] schedule+0x55/0xc0
> [ 1234.466183] ? nfs_lookup_revalidate_dentry+0x3a0/0x3a0 [nfs]
> [ 1234.466995] __nfs_lookup_revalidate+0xdf/0x120 [nfs]
I can see the cause of this - I forget a wakeup. This patch should fix
it, though I hope to find a better solution.
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 54c2c7adcd56..072130d000c4 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2483,17 +2483,16 @@ int nfs_unlink(struct inode *dir, struct dentry *dentry)
if (!(dentry->d_flags & DCACHE_PAR_UPDATE)) {
/* Must have exclusive lock on parent */
did_set_par_update = true;
+ lock_acquire_exclusive(&dentry->d_update_map, 0,
+ 0, NULL, _THIS_IP_);
dentry->d_flags |= DCACHE_PAR_UPDATE;
}
spin_unlock(&dentry->d_lock);
error = nfs_safe_remove(dentry);
nfs_dentry_remove_handle_error(dir, dentry, error);
- if (did_set_par_update) {
- spin_lock(&dentry->d_lock);
- dentry->d_flags &= ~DCACHE_PAR_UPDATE;
- spin_unlock(&dentry->d_lock);
- }
+ if (did_set_par_update)
+ d_unlock_update(dentry);
out:
trace_nfs_unlink_exit(dir, dentry, error);
return error;
>
> So all in all, the performance improvements in the knfsd re-export
> case is looking great and we have real world use cases that this helps
> with (batch processing workloads with latencies >10ms). If we can
> figure out the hanging knfsd threads, then I can test it more heavily.
Hopefully the above patch will allow the more heavy testing to continue.
In any case, thanks a lot for the testing so far,
NeilBrown
Powered by blists - more mailing lists