lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <355c8355-a6bc-181f-73e7-1baf7749f984@huaweicloud.com>
Date: Thu, 6 Mar 2025 10:40:32 +0800
From: Li Nan <linan666@...weicloud.com>
To: Chuck Lever <chuck.lever@...cle.com>, Dai Ngo <Dai.Ngo@...cle.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 linux-nfs@...r.kernel.org, trondmy@...merspace.com, sagi@...mberg.me,
 cel@...nel.org, "wanghai (M)" <wanghai38@...wei.com>, yanhaitao2@...wei.com,
 chengjike.cheng@...wei.com, dingming09@...wei.com
Subject: Re: [Bug report] NULL pointer dereference in frwr_unmap_sync()



在 2025/3/5 22:02, Chuck Lever 写道:
> On 3/4/25 9:43 PM, Li Nan wrote:
>> We found a following problem in kernel 5.10, and the same problem should
>> exist in mainline:
>>
>> During NFS mount using 'soft' option over RoCE network, we observed kernel
>> crash with below trace when network issues occur (congestion/disconnect):
>>    nfs: server 10.10.253.211 not responding, timed out
>>    BUG: kernel NULL pointer dereference, address: 00000000000000a0
>>    RIP: 0010:frwr_unmap_sync+0x77/0x200 [rpcrdma]
>>    Call Trace:
>>     ? __die_body.cold+0x8/0xd
>>     ? no_context+0x155/0x230
>>     ? __bad_area_nosemaphore+0x52/0x1a0
>>     ? exc_page_fault+0x2dc/0x550
>>     ? asm_exc_page_fault+0x1e/0x30
>>     ? frwr_unmap_sync+0x77/0x200 [rpcrdma]
>>     xprt_release+0x9e/0x1a0 [sunrpc]
>>     rpc_release_resources_task+0xe/0x50 [sunrpc]
>>     rpc_release_task+0x19/0xa0 [sunrpc]
>>     rpc_async_schedule+0x29/0x40 [sunrpc]
>>     process_one_work+0x1b2/0x350
>>     worker_thread+0x49/0x310
>>     ? rescuer_thread+0x380/0x380
>>     kthread+0xfb/0x140
>>
>> Problem analysis:
>> The crash happens in frwr_unmap_sync() when accessing req->rl_registered
>> list, caused by either NULL pointer or accessing freed MR resources.
>> There's a race condition between:
>> T1
>> __ib_process_cq
>>   wc->wr_cqe->done (frwr_wc_localinv)
>>    rpcrdma_flush_disconnect
>>     rpcrdma_force_disconnect
>>      xprt_force_disconnect
>>       xprt_autoclose
>>        xprt_rdma_close
>>         rpcrdma_xprt_disconnect
>>          rpcrdma_reqs_reset
>>           frwr_reset
>>            rpcrdma_mr_pop(&req->rl_registered)
>> T2
>> rpc_async_schedule
>>   rpc_release_task
>>    rpc_release_resources_task
>>     xprt_release
>>      xprt_rdma_free
>>       frwr_unmap_sync
>>        rpcrdma_mr_pop(&req->rl_registered)
>>                     
>> This problem also exists in function rpcrdma_mrs_destroy().
>>
> 
> Dai, is this the same as the system test problem you've been looking at?
> 

Thank you for looking into it. Is there a patch that needs to be tested? We
are happy to help with the testing.

-- 
Thanks,
Nan


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ