[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260125140812.GE13967@unreal>
Date: Sun, 25 Jan 2026 16:08:12 +0200
From: Leon Romanovsky <leon@...nel.org>
To: Li Zhijian <lizhijian@...itsu.com>
Cc: linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org,
zyjzyj2000@...il.com, jgg@...pe.ca
Subject: Re: [PATCH] RDMA/rxe: Fix race condition in QP timer handlers
On Tue, Jan 20, 2026 at 03:44:37PM +0800, Li Zhijian wrote:
> I encontered the following warning:
> WARNING: drivers/infiniband/sw/rxe/rxe_task.c:249 at rxe_sched_task+0x1c8/0x238 [rdma_rxe], CPU#0: swapper/0/0
> ...
> libsha1 [last unloaded: ip6_udp_tunnel]
> CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G C 6.19.0-rc5-64k-v8+ #37 PREEMPT
> Tainted: [C]=CRAP
> Hardware name: Raspberry Pi 4 Model B Rev 1.2
> Call trace:
> rxe_sched_task+0x1c8/0x238 [rdma_rxe] (P)
> retransmit_timer+0x130/0x188 [rdma_rxe]
> call_timer_fn+0x68/0x4d0
> __run_timers+0x630/0x888
> ...
> WARNING: drivers/infiniband/sw/rxe/rxe_task.c:38 at rxe_sched_task+0x1c0/0x238 [rdma_rxe], CPU#0: swapper/0/0
> ...
> WARNING: drivers/infiniband/sw/rxe/rxe_task.c:111 at do_work+0x488/0x5c8 [rdma_rxe], CPU#3: kworker/u17:4/93400
> ...
> refcount_t: underflow; use-after-free.
> WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x138/0x1a0, CPU#3: kworker/u17:4/93400
>
> The issue is caused by a race condition between retransmit_timer() and
> rxe_destroy_qp, leading to the Queue Pair's (QP) reference count dropping
> to zero during timer handler execution.
>
> It seems this warning is harmless because rxe_qp_do_cleanup() will flush
> all pending timers and requests.
>
> Example of flow causing the issue:
>
> CPU0 CPU1
> retransmit_timer() {
> spin_lock_irqsave
> rxe_destroy_qp()
> __rxe_cleanup()
> __rxe_put() // qp->ref_count decrease to 0
> rxe_qp_do_cleanup() {
> if (qp->valid) {
> rxe_sched_task() {
> WARN_ON(rxe_read(task->qp) <= 0);
> }
> }
> spin_unlock_irqrestore
> }
> spin_lock_irqsave
> qp->valid = 0
> spin_unlock_irqrestore
> }
>
> Ensure the QP's reference count is maintained and its validity is checked
> within the timer callbacks by adding calls to rxe_get(qp) and corresponding
> rxe_put(qp) after use.
>
> Signed-off-by: Li Zhijian <lizhijian@...itsu.com>
Fixes line?
Thanks
> ---
> drivers/infiniband/sw/rxe/rxe_comp.c | 3 +++
> drivers/infiniband/sw/rxe/rxe_req.c | 3 +++
> 2 files changed, 6 insertions(+)
Powered by blists - more mailing lists