linux-kernel - Re: NFS trace new to 5.13.0 (GA)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <A9DA4F5A-9BA8-4395-82CF-24C071AF1F8C@oracle.com>
Date:   Tue, 29 Jun 2021 18:38:22 +0000
From:   Chuck Lever III <chuck.lever@...cle.com>
To:     "Marciniszyn, Mike" <mike.marciniszyn@...nelisnetworks.com>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
        Jason Gunthorpe <jgg@...dia.com>,
        "Hillman, Richie" <Richie.Hillman@...nelisnetworks.com>,
        "Dalessandro, Dennis" <dennis.dalessandro@...nelisnetworks.com>,
        Linux NFS Mailing List <linux-nfs@...r.kernel.org>
Subject: Re: NFS trace new to 5.13.0 (GA)

Hi Mike-

> On Jun 29, 2021, at 2:28 PM, Marciniszyn, Mike <mike.marciniszyn@...nelisnetworks.com> wrote:
> 
> During our continuous integration testing on 5.13.0 kernel our testing trips on NFS testing with the following trace on the client:
> 
> [32936.156848] INFO: task kworker/9:1:519 blocked for more than 122 seconds.
> [32936.165201]       Tainted: G S                5.13.0 #1
> [32936.171562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [32936.180773] task:kworker/9:1     state:D stack:    0 pid:  519 ppid:     2 flags:0x00004000
> [32936.190565] Workqueue: events xprt_destroy_cb [sunrpc]
> [32936.196854] Call Trace:
> [32936.200107]  __schedule+0x38e/0x8b0
> [32936.204482]  schedule+0x3c/0xa0
> [32936.208464]  schedule_timeout+0x215/0x2b0
> [32936.213401]  ? check_preempt_curr+0x3f/0x70
> [32936.218518]  ? ttwu_do_wakeup+0x17/0x140
> [32936.223336]  wait_for_completion+0x98/0xf0
> [32936.228396]  __flush_work+0x128/0x1e0
> [32936.232942]  ? worker_attach_to_pool+0xb0/0xb0
> [32936.238351]  ? work_busy+0x80/0x80
> [32936.242555]  __cancel_work_timer+0x110/0x1a0
> [32936.247726]  ? xprt_rdma_bc_destroy+0xc6/0xe0 [rpcrdma]
> [32936.254034]  xprt_rdma_destroy+0x15/0x50 [rpcrdma]
> [32936.259873]  process_one_work+0x1cb/0x360
> [32936.264788]  ? process_one_work+0x360/0x360
> [32936.269915]  worker_thread+0x30/0x370
> [32936.274436]  ? process_one_work+0x360/0x360
> [32936.279526]  kthread+0x116/0x130
> [32936.283534]  ? __kthread_cancel_work+0x40/0x40
> [32936.288924]  ret_from_fork+0x22/0x30
> 
> The same tests and same servers see no such issue from rc4 to rc7, so the failure seems new.
> 
> Any thoughts?
> 
> I'm currently rerunning rc7 just to be sure.

The NFS server in v5.13 is afflicted by a late-breaking bug-fix
to the alloc_pages_bulk_array() API. It's been fixed in Linus'
tree, but that tree is otherwise unstable for me.

Have a look at commit 66d9282523b for a one-liner fix, it should
apply cleanly to v5.13.

--
Chuck Lever