[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240104145629.GY50406@nvidia.com>
Date: Thu, 4 Jan 2024 10:56:29 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: "Daisuke Matsuda (Fujitsu)" <matsuda-daisuke@...itsu.com>,
"rpearsonhpe@...il.com" <rpearsonhpe@...il.com>
Cc: 'Zhu Yanjun' <yanjun.zhu@...ux.dev>,
"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
"leon@...nel.org" <leon@...nel.org>,
"zyjzyj2000@...il.com" <zyjzyj2000@...il.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Xiao Yang (Fujitsu)" <yangx.jy@...itsu.com>,
"Zhijian Li (Fujitsu)" <lizhijian@...itsu.com>,
"Yasunori Gotou (Fujitsu)" <y-goto@...itsu.com>
Subject: Re: [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE
On Thu, Dec 07, 2023 at 06:37:13AM +0000, Daisuke Matsuda (Fujitsu) wrote:
> On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote:
> >
> > 在 2023/12/5 8:11, Jason Gunthorpe 写道:
> > > On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
> > >>
> > >> Daisuke Matsuda (7):
> > >> RDMA/rxe: Always defer tasks on responder and completer to workqueue
> > >> RDMA/rxe: Make MR functions accessible from other rxe source code
> > >> RDMA/rxe: Move resp_states definition to rxe_verbs.h
> > >> RDMA/rxe: Add page invalidation support
> > >> RDMA/rxe: Allow registering MRs for On-Demand Paging
> > >> RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
> > >> RDMA/rxe: Add support for the traditional Atomic operations with ODP
> > >
> > > What is the current situation with rxe? I don't recall seeing the bugs
> > > that were reported get fixed?
>
> Well, I suppose Jason is mentioning "blktests srp/002 hang".
> cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/
>
> It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel",
> so the hang looks not specific to rxe.
> cf. https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@acm.org/T/
> I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue.
Bob? Is that what we think?
> There is another issue that causes kernel panic.
> [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size
> cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
This is more understandable, and the fix of matching the MTT size to
the PAGE_SIZE seems reasonable to me.
Jason
Powered by blists - more mailing lists