[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <620f8611-1e95-4ebd-9db2-eb7231cfb3f2@gmail.com>
Date: Thu, 14 Aug 2025 23:07:33 +0900
From: Daisuke Matsuda <dskmtsd@...il.com>
To: Zhu Yanjun <yanjun.zhu@...ux.dev>,
Philipp Reisner <philipp.reisner@...bit.com>
Cc: Zhu Yanjun <zyjzyj2000@...il.com>, Jason Gunthorpe <jgg@...pe.ca>,
Leon Romanovsky <leon@...nel.org>, linux-rdma@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] rdma_rxe: call comp_handler without holding cq->cq_lock
On 2025/08/14 14:33, Zhu Yanjun wrote:
> 在 2025/8/12 8:54, Daisuke Matsuda 写道:
>> On 2025/08/11 22:48, Zhu Yanjun wrote:
>>> 在 2025/8/10 22:26, Philipp Reisner 写道:
>>>> On Thu, Aug 7, 2025 at 3:09 AM Zhu Yanjun <yanjun.zhu@...ux.dev> wrote:
>>>>>
>>>>> 在 2025/8/6 5:39, Philipp Reisner 写道:
>>>>>> Allow the comp_handler callback implementation to call ib_poll_cq().
>>>>>> A call to ib_poll_cq() calls rxe_poll_cq() with the rdma_rxe driver.
>>>>>> And rxe_poll_cq() locks cq->cq_lock. That leads to a spinlock deadlock.
>>>>>>
>>>>>> The Mellanox and Intel drivers allow a comp_handler callback
>>>>>> implementation to call ib_poll_cq().
>>>>>>
>>>>>> Avoid the deadlock by calling the comp_handler callback without
>>>>>> holding cq->cw_lock.
>>>>>>
>>>>>> Signed-off-by: Philipp Reisner <philipp.reisner@...bit.com>
>>>>>
>>>>> ERROR: test_resize_cq (tests.test_cq.CQTest.test_resize_cq)
>>>>> Test resize CQ, start with specific value and then increase and decrease
>>>>> ----------------------------------------------------------------------
>>>>> Traceback (most recent call last):
>>>>> File "/root/deb/rdma-core/tests/test_cq.py", line 135, in test_resize_cq
>>>>> u.poll_cq(self.client.cq)
>>>>> File "/root/deb/rdma-core/tests/utils.py", line 687, in poll_cq
>>>>> wcs = _poll_cq(cq, count, data)
>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>> File "/root/deb/rdma-core/tests/utils.py", line 669, in _poll_cq
>>>>> raise PyverbsError(f'Got timeout on polling ({count} CQEs remaining)')
>>>>> pyverbs.pyverbs_error.PyverbsError: Got timeout on polling (1 CQEs
>>>>> remaining)
>>>>>
>>>>> After I applied your patch in kervel v6.16, I got the above errors.
>>>>>
>>>>> Zhu Yanjun
>>>>>
>>>>
>>>> Hello Zhu,
>>>>
>>>> When I run the test_resize_cq test in a loop (100 runs each) on the
>>>> original code and with my patch, I get about the same failure rate.
>>>
>>> Add Daisuke Matsuda
>>>
>>> If I remember it correctly, when Daisuke and I discussed ODP patches, we both made tests with rxe, from our tests results, it seems that this test_resize_cq error does not occur.
>>
>> Hi Zhu and Philipp,
>>
>> As far as I know, this error has been present for some time.
>> It might be possible to investigate further by capturing a memory dump while the polling is stuck, but I have not had time to do that yet.
>> At least, I can confirm that this is not a regression caused by Philipp's patch.
>
> Hi, Daisuke
>
> Thanks a lot. I’m now able to consistently reproduce this problem. I have created a commit here: https://github.com/zhuyj/linux/commit/8db3abc00bf49cac6ea1d5718d28c6516c94fb4e.
>
> After applying this commit, I ran test_resize_cq 10,000 times, and the problem did not occur.
>
> I’m not sure if there’s a better way to fix this issue. If anyone has a better solution, please share it.
Hi Zhu,
Thank you very much for the investigation.
I agree that the issue can be worked around by adding a delay in the rxe completer path.
However, since the issue is easily reproducible, introducing an explicit sleep might
add unnecessary overhead. I think a short busy-wait would be a more desirable alternative.
The intermediate change below does make the issue disappear on my node, but I don't think
this is a complete solution. In particular, it appears that ibcq->event_handler() —
typically ib_uverbs_cq_event_handler() — is not re-entrant, so simply spinning like this
could be risky.
===
diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index a5b2b62f596b..a10a173e53cf 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -454,7 +454,7 @@ static void do_complete(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
queue_advance_consumer(qp->sq.queue, QUEUE_TYPE_FROM_CLIENT);
if (post)
- rxe_cq_post(qp->scq, &cqe, 0);
+ while (rxe_cq_post(qp->scq, &cqe, 0) == -EBUSY);
if (wqe->wr.opcode == IB_WR_SEND ||
wqe->wr.opcode == IB_WR_SEND_WITH_IMM ||
===
If you agree with this direction, I can take some time in the next week or so to make a
formal patch. Of course, you are welcome to take over this idea if you prefer.
Thanks,
Daisuke
>
> Thanks a lot.
> Zhu Yanjun
>
>>
>> Thanks,
>> Daisuke
>>
>
Powered by blists - more mailing lists