lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c718d1c2-6c7e-47df-a3f4-097f7cadbbbf@linux.dev>
Date: Sun, 25 Jan 2026 13:24:39 -0800
From: Zhu Yanjun <yanjun.zhu@...ux.dev>
To: Leon Romanovsky <leon@...nel.org>, Li Zhijian <lizhijian@...itsu.com>
Cc: linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org,
 zyjzyj2000@...il.com, jgg@...pe.ca
Subject: Re: [PATCH] RDMA/rxe: Fix race condition in QP timer handlers

在 2026/1/25 6:08, Leon Romanovsky 写道:
> On Tue, Jan 20, 2026 at 03:44:37PM +0800, Li Zhijian wrote:
>> I encontered the following warning:
>>   WARNING: drivers/infiniband/sw/rxe/rxe_task.c:249 at rxe_sched_task+0x1c8/0x238 [rdma_rxe], CPU#0: swapper/0/0
>> ...
>>    libsha1 [last unloaded: ip6_udp_tunnel]
>>   CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G         C          6.19.0-rc5-64k-v8+ #37 PREEMPT
>>   Tainted: [C]=CRAP
>>   Hardware name: Raspberry Pi 4 Model B Rev 1.2
>>   Call trace:
>>    rxe_sched_task+0x1c8/0x238 [rdma_rxe] (P)
>>    retransmit_timer+0x130/0x188 [rdma_rxe]
>>    call_timer_fn+0x68/0x4d0
>>    __run_timers+0x630/0x888
>> ...
>>   WARNING: drivers/infiniband/sw/rxe/rxe_task.c:38 at rxe_sched_task+0x1c0/0x238 [rdma_rxe], CPU#0: swapper/0/0
>> ...
>>   WARNING: drivers/infiniband/sw/rxe/rxe_task.c:111 at do_work+0x488/0x5c8 [rdma_rxe], CPU#3: kworker/u17:4/93400
>> ...
>>   refcount_t: underflow; use-after-free.
>>   WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x138/0x1a0, CPU#3: kworker/u17:4/93400
>>
>> The issue is caused by a race condition between retransmit_timer() and
>> rxe_destroy_qp, leading to the Queue Pair's (QP) reference count dropping
>> to zero during timer handler execution.
>>
>> It seems this warning is harmless because rxe_qp_do_cleanup() will flush
>> all pending timers and requests.
>>
>> Example of flow causing the issue:
>>
>> CPU0                                   CPU1
>> retransmit_timer() {
>>      spin_lock_irqsave
>>                             rxe_destroy_qp()
>>                              __rxe_cleanup()
>>                                __rxe_put() // qp->ref_count decrease to 0
>>                              rxe_qp_do_cleanup() {
>>      if (qp->valid) {
>>          rxe_sched_task() {
>>              WARN_ON(rxe_read(task->qp) <= 0);
>>          }
>>      }
>>      spin_unlock_irqrestore
>> }
>>                                spin_lock_irqsave
>>                                qp->valid = 0
>>                                spin_unlock_irqrestore
>>                              }
>>
>> Ensure the QP's reference count is maintained and its validity is checked
>> within the timer callbacks by adding calls to rxe_get(qp) and corresponding
>> rxe_put(qp) after use.
>>
>> Signed-off-by: Li Zhijian <lizhijian@...itsu.com>
> 
> Fixes line?

The Fixes line should be the following?

Fixes: 8700e3e7c485 ("Soft RoCE driver")

Best Regards,
Zhu Yanjun

> 
> Thanks
> 
>> ---
>>   drivers/infiniband/sw/rxe/rxe_comp.c | 3 +++
>>   drivers/infiniband/sw/rxe/rxe_req.c  | 3 +++
>>   2 files changed, 6 insertions(+)


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ