[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <TYCPR01MB845576AC0D07947D560F77DCE5449@TYCPR01MB8455.jpnprd01.prod.outlook.com>
Date: Mon, 12 Sep 2022 08:27:02 +0000
From: "matsuda-daisuke@...itsu.com" <matsuda-daisuke@...itsu.com>
To: 'Bob Pearson' <rpearsonhpe@...il.com>,
"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
"leonro@...dia.com" <leonro@...dia.com>,
"jgg@...dia.com" <jgg@...dia.com>,
"zyjzyj2000@...il.com" <zyjzyj2000@...il.com>,
"Ziemba, Ian" <ian.ziemba@....com>, Frank Zago <frank.zago@....com>
CC: "nvdimm@...ts.linux.dev" <nvdimm@...ts.linux.dev>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"yangx.jy@...itsu.com" <yangx.jy@...itsu.com>,
"lizhijian@...itsu.com" <lizhijian@...itsu.com>,
"y-goto@...itsu.com" <y-goto@...itsu.com>
Subject: RE: [RFC PATCH 2/7] RDMA/rxe: Convert the triple tasklets to
workqueues
On Sat, Sep 10, 2022 4:39 AM Bob Pearson wrote:
> On 9/6/22 21:43, Daisuke Matsuda wrote:
> > In order to implement On-Demand Paging on the rxe driver, triple tasklets
> > (requester, responder, and completer) must be allowed to sleep so that they
> > can trigger page fault when pages being accessed are not present.
> >
> > This patch replaces the tasklets with workqueues, but still allows direct-
> > call of works from softirq context if it is obvious that MRs are not going
> > to be accessed and there is no work being processed in the workqueue.
> >
> > As counterparts to tasklet_disable() and tasklet_enable() are missing for
> > workqueues, an atomic value is introduced to get works suspended while qp
> > reset is in progress.
> >
> > As a reference, performance change was measured using ib_send_bw and
> > ib_send_lat commands over RC connection. Both the client and the server
> > were placed on the same KVM host. An option "-n 100000" was given to the
> > respective commands to iterate over 100000 times.
> >
> > Before applying this patch:
> > [ib_send_bw]
> > ---------------------------------------------------------------------------------------
> > #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
> > 65536 100000 0.00 203.13 0.003250
> > ---------------------------------------------------------------------------------------
> > [ib_send_lat]
> > ---------------------------------------------------------------------------------------
> > #bytes #iterations t_min[usec] t_max[usec] t_typical[usec] t_avg[usec] t_stdev[usec] 99%
> percentile[usec] 99.9% percentile[usec]
> > 2 100000 30.91 1519.05 34.23 36.06 5.15 48.49
> 82.37
> > ---------------------------------------------------------------------------------------
> >
> > After applying this patch:
> > [ib_send_bw]
> > ---------------------------------------------------------------------------------------
> > #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
> > 65536 100000 0.00 240.88 0.003854
> > ---------------------------------------------------------------------------------------
> > [ib_send_lat]
> > ---------------------------------------------------------------------------------------
> > #bytes #iterations t_min[usec] t_max[usec] t_typical[usec] t_avg[usec] t_stdev[usec] 99%
> percentile[usec] 99.9% percentile[usec]
> > 2 100000 40.88 2994.82 47.80 50.25 13.70 76.42
> 185.04
> > ---------------------------------------------------------------------------------------
> >
> > The test was conducted 3 times for each kernel, and the results with median
> > "BW average" and "t_typical" are shown above. It shows the conversion
> > improves the bandwidth while causing higher latency.
> >
> > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@...itsu.com>
> > ---
> > drivers/infiniband/sw/rxe/Makefile | 2 +-
> > drivers/infiniband/sw/rxe/rxe_comp.c | 42 ++++---
> > drivers/infiniband/sw/rxe/rxe_loc.h | 2 +-
> > drivers/infiniband/sw/rxe/rxe_net.c | 4 +-
> > drivers/infiniband/sw/rxe/rxe_param.h | 2 +-
> > drivers/infiniband/sw/rxe/rxe_qp.c | 68 +++++------
> > drivers/infiniband/sw/rxe/rxe_recv.c | 2 +-
> > drivers/infiniband/sw/rxe/rxe_req.c | 14 +--
> > drivers/infiniband/sw/rxe/rxe_resp.c | 16 +--
> > drivers/infiniband/sw/rxe/rxe_task.c | 152 ------------------------
> > drivers/infiniband/sw/rxe/rxe_task.h | 69 -----------
> > drivers/infiniband/sw/rxe/rxe_verbs.c | 8 +-
> > drivers/infiniband/sw/rxe/rxe_verbs.h | 8 +-
> > drivers/infiniband/sw/rxe/rxe_wq.c | 161 ++++++++++++++++++++++++++
> > drivers/infiniband/sw/rxe/rxe_wq.h | 71 ++++++++++++
> > 15 files changed, 322 insertions(+), 299 deletions(-)
> > delete mode 100644 drivers/infiniband/sw/rxe/rxe_task.c
> > delete mode 100644 drivers/infiniband/sw/rxe/rxe_task.h
> > create mode 100644 drivers/infiniband/sw/rxe/rxe_wq.c
> > create mode 100644 drivers/infiniband/sw/rxe/rxe_wq.h
<...>
>
> Daiskuke,
>
> We (hpe) have an internal patch set that also allows using work queues instead of tasklets.
> There is a certain amount of work to allow control over cpu assignment from ulp or application
> level. This is critical for high performance in multithreaded IO applications. It would be in
> both of our interests if we could find a common implementation that works for everyone.
I agree with you, and I am interested in your work.
Would you post the patch set? It may be possible to rebase my work on your implementation.
I just simply replaced tasklets mechanically and modified the conditions when dispatching works
to responder and completer workqueues. The changes are not so complicated. I'll be away between
9/15-9/20. After that, I can take the time to look into it.
By the way, I would also like you to join in the discussion about whether to preserve tasklets or not.
Cf. https://lore.kernel.org/all/TYCPR01MB8455D739FC9FB034E3485C87E5449@TYCPR01MB8455.jpnprd01.prod.outlook.com/
>
> Perhaps we could arrange for a call to discuss?
This is an important change that may affect all of rxe users,
so I think the discussion should be open to anybody at least at this stage.
Thanks,
Daisuke
Powered by blists - more mailing lists