[<prev] [next>] [day] [month] [year] [list]
Message-Id: <4E2EB304-EE09-424C-9939-48A9BE1C539A@oracle.com>
Date: Tue, 1 Dec 2020 11:07:17 -0500
From: Chuck Lever <chuck.lever@...cle.com>
To: Yi Wang <wang.yi59@....com.cn>
Cc: Trond Myklebust <trond.myklebust@...merspace.com>,
Anna Schumaker <anna.schumaker@...app.com>,
Bruce Fields <bfields@...ldses.org>,
Linux NFS Mailing List <linux-nfs@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
xue.zhihong@....com.cn, wang.liang82@....com.cn,
Cheng Lin <cheng.lin130@....com.cn>
Subject: Re: [PATCH] nfs_common: need lock during iterate through the list
Hello!
> On Dec 1, 2020, at 7:06 AM, Yi Wang <wang.yi59@....com.cn> wrote:
>
> From: Cheng Lin <cheng.lin130@....com.cn>
>
> If the elem is deleted during be iterated on it, the iteration
> process will fall into an endless loop.
>
> kernel: NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [nfsd:17137]
>
> PID: 17137 TASK: ffff8818d93c0000 CPU: 4 COMMAND: "nfsd"
> [exception RIP: __state_in_grace+76]
> RIP: ffffffffc00e817c RSP: ffff8818d3aefc98 RFLAGS: 00000246
> RAX: ffff881dc0c38298 RBX: ffffffff81b03580 RCX: ffff881dc02c9f50
> RDX: ffff881e3fce8500 RSI: 0000000000000001 RDI: ffffffff81b03580
> RBP: ffff8818d3aefca0 R8: 0000000000000020 R9: ffff8818d3aefd40
> R10: ffff88017fc03800 R11: ffff8818e83933c0 R12: ffff8818d3aefd40
> R13: 0000000000000000 R14: ffff8818e8391068 R15: ffff8818fa6e4000
> CS: 0010 SS: 0018
> #0 [ffff8818d3aefc98] opens_in_grace at ffffffffc00e81e3 [grace]
> #1 [ffff8818d3aefca8] nfs4_preprocess_stateid_op at ffffffffc02a3e6c [nfsd]
> #2 [ffff8818d3aefd18] nfsd4_write at ffffffffc028ed5b [nfsd]
> #3 [ffff8818d3aefd80] nfsd4_proc_compound at ffffffffc0290a0d [nfsd]
> #4 [ffff8818d3aefdd0] nfsd_dispatch at ffffffffc027b800 [nfsd]
> #5 [ffff8818d3aefe08] svc_process_common at ffffffffc02017f3 [sunrpc]
> #6 [ffff8818d3aefe70] svc_process at ffffffffc0201ce3 [sunrpc]
> #7 [ffff8818d3aefe98] nfsd at ffffffffc027b117 [nfsd]
> #8 [ffff8818d3aefec8] kthread at ffffffff810b88c1
> #9 [ffff8818d3aeff50] ret_from_fork at ffffffff816d1607
>
> The troublemake elem:
> crash> lock_manager ffff881dc0c38298
> struct lock_manager {
> list = {
> next = 0xffff881dc0c38298,
> prev = 0xffff881dc0c38298
> },
> block_opens = false
> }
>
> Signed-off-by: Cheng Lin <cheng.lin130@....com.cn>
> Signed-off-by: Yi Wang <wang.yi59@....com.cn>
> ---
> fs/nfs_common/grace.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs_common/grace.c b/fs/nfs_common/grace.c
> index b73d9dd37..26f2a50ec 100644
> --- a/fs/nfs_common/grace.c
> +++ b/fs/nfs_common/grace.c
> @@ -69,10 +69,14 @@ __state_in_grace(struct net *net, bool open)
> if (!open)
> return !list_empty(grace_list);
>
> + spin_lock(&grace_lock);
> list_for_each_entry(lm, grace_list, list) {
> - if (lm->block_opens)
> + if (lm->block_opens) {
> + spin_unlock(&grace_lock);
> return true;
> + }
> }
> + spin_unlock(&grace_lock);
> return false;
> }
>
> --
This looks most closely related to NFSD, so I've applied it to
my NFSD tree for the next merge window. I've also added
Fixes: c87fb4a378f9 ("lockd: NLM grace period shouldn't block NFSv4 opens")
You can find it in the cel-next topic branch in my kernel repo:
git://git.linux-nfs.org/projects/cel/cel-2.6.git
Incidentally, the e-mail encoding mangled the white space and I
don't see the e-mail showing up on lore.kernel.org. I applied it
by hand since it was small, but this should be addressed for
future patches so our patch handling infrastructure can deal
properly with your submissions. Thanks!
--
Chuck Lever
Powered by blists - more mailing lists