[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <dc2efb0b-68be-102b-a041-47c799361d35@fujitsu.com>
Date: Mon, 27 Jun 2022 03:41:57 +0000
From: "lizhijian@...itsu.com" <lizhijian@...itsu.com>
To: Bob Pearson <rpearsonhpe@...il.com>,
Yanjun Zhu <yanjun.zhu@...ux.dev>,
Jason Gunthorpe <jgg@...pe.ca>,
Haakon Bugge <haakon.bugge@...cle.com>,
Cheng Xu <chengyou@...ux.alibaba.com>,
"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 1/2] RDMA/rxe: Update wqe_index for each wqe error
completion
On 27/06/2022 05:51, Bob Pearson wrote:
> On 5/15/22 20:53, Li Zhijian wrote:
>> Previously, if user space keeps sending abnormal wqe, queue.prod will
>> keep increasing while queue.index doesn't. Once
>> queue.index==queue.prod in next round, req_next_wqe() will treat queue
>> as empty. In such case, no new completion would be generated.
>>
>> Update wqe_index for each wqe completion so that req_next_wqe() can get
>> next wqe properly.
>>
>> Signed-off-by: Li Zhijian <lizhijian@...itsu.com>
>> ---
>> drivers/infiniband/sw/rxe/rxe_req.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>> index a0d5e57f73c1..8bdd0b6b578f 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>> @@ -773,6 +773,8 @@ int rxe_requester(void *arg)
>> if (ah)
>> rxe_put(ah);
>> err:
>> + /* update wqe_index for each wqe completion */
>> + qp->req.wqe_index = queue_next_index(qp->sq.queue, qp->req.wqe_index);
>> wqe->state = wqe_state_err
>> __rxe_do_task(&qp->comp.task);
>>
> This change looks plausible, but I am not sure if it will make a difference since the qp
> will get transitioned to the error state very shortly.
>
> In order for it to matter the requester must be a ways ahead of the completer in the send queue
> and someone be actively posting new wqes which will reschedule the requester. Currently it
> will fail on the same wqe again unless the error described above occurs but if we post a new valid
> wqe it will get executed even though we have detected an error that should have stopped the qp.
>
> It looks like the intent was to keep the qp in the non error state until all the old
> wqes get completed before making the transition.
Not really, My first intent was just let req_next_wqe() return wqe if the queue is not empty.
Since, currently if rxe_requester() always goes to the error path for some reasons, req_next_wqe()
will becomes false empty at next round though the queue is almost full.
BTW, i will review your newly private patches
Thanks
Zhijian
> But we should disable the requester
> from processing new wqes in this case. That seems like a safer solution to the problem.
>
> Bob
>
Powered by blists - more mailing lists