[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <A9DA4F5A-9BA8-4395-82CF-24C071AF1F8C@oracle.com>
Date: Tue, 29 Jun 2021 18:38:22 +0000
From: Chuck Lever III <chuck.lever@...cle.com>
To: "Marciniszyn, Mike" <mike.marciniszyn@...nelisnetworks.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
Jason Gunthorpe <jgg@...dia.com>,
"Hillman, Richie" <Richie.Hillman@...nelisnetworks.com>,
"Dalessandro, Dennis" <dennis.dalessandro@...nelisnetworks.com>,
Linux NFS Mailing List <linux-nfs@...r.kernel.org>
Subject: Re: NFS trace new to 5.13.0 (GA)
Hi Mike-
> On Jun 29, 2021, at 2:28 PM, Marciniszyn, Mike <mike.marciniszyn@...nelisnetworks.com> wrote:
>
> During our continuous integration testing on 5.13.0 kernel our testing trips on NFS testing with the following trace on the client:
>
> [32936.156848] INFO: task kworker/9:1:519 blocked for more than 122 seconds.
> [32936.165201] Tainted: G S 5.13.0 #1
> [32936.171562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [32936.180773] task:kworker/9:1 state:D stack: 0 pid: 519 ppid: 2 flags:0x00004000
> [32936.190565] Workqueue: events xprt_destroy_cb [sunrpc]
> [32936.196854] Call Trace:
> [32936.200107] __schedule+0x38e/0x8b0
> [32936.204482] schedule+0x3c/0xa0
> [32936.208464] schedule_timeout+0x215/0x2b0
> [32936.213401] ? check_preempt_curr+0x3f/0x70
> [32936.218518] ? ttwu_do_wakeup+0x17/0x140
> [32936.223336] wait_for_completion+0x98/0xf0
> [32936.228396] __flush_work+0x128/0x1e0
> [32936.232942] ? worker_attach_to_pool+0xb0/0xb0
> [32936.238351] ? work_busy+0x80/0x80
> [32936.242555] __cancel_work_timer+0x110/0x1a0
> [32936.247726] ? xprt_rdma_bc_destroy+0xc6/0xe0 [rpcrdma]
> [32936.254034] xprt_rdma_destroy+0x15/0x50 [rpcrdma]
> [32936.259873] process_one_work+0x1cb/0x360
> [32936.264788] ? process_one_work+0x360/0x360
> [32936.269915] worker_thread+0x30/0x370
> [32936.274436] ? process_one_work+0x360/0x360
> [32936.279526] kthread+0x116/0x130
> [32936.283534] ? __kthread_cancel_work+0x40/0x40
> [32936.288924] ret_from_fork+0x22/0x30
>
> The same tests and same servers see no such issue from rc4 to rc7, so the failure seems new.
>
> Any thoughts?
>
> I'm currently rerunning rc7 just to be sure.
The NFS server in v5.13 is afflicted by a late-breaking bug-fix
to the alloc_pages_bulk_array() API. It's been fixed in Linus'
tree, but that tree is otherwise unstable for me.
Have a look at commit 66d9282523b for a one-liner fix, it should
apply cleanly to v5.13.
--
Chuck Lever
Powered by blists - more mailing lists