[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250103150546.GD26854@ziepe.ca>
Date: Fri, 3 Jan 2025 11:05:46 -0400
From: Jason Gunthorpe <jgg@...pe.ca>
To: "Daisuke Matsuda (Fujitsu)" <matsuda-daisuke@...itsu.com>,
'Joe Klein' <joe.klein812@...il.com>
Cc: "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
"leon@...nel.org" <leon@...nel.org>,
"zyjzyj2000@...il.com" <zyjzyj2000@...il.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"rpearsonhpe@...il.com" <rpearsonhpe@...il.com>,
"Zhijian Li (Fujitsu)" <lizhijian@...itsu.com>
Subject: Re: [PATCH for-next v9 0/5] On-Demand Paging on SoftRoCE
On Tue, Dec 24, 2024 at 08:52:24AM +0000, Daisuke Matsuda (Fujitsu) wrote:
> On Mon, Dec 23, 2024 10:55 AM Daisuke Matsuda (Fujitsu) <matsuda-daisuke@...itsu.com> wrote:
> > On Mon, Dec 23, 2024 2:25 AM Joe Klein <joe.klein812@...il.com> wrote:
> > > We have tested this patcheset and had a lot of problems, even without using the ODP option in softroce. I don't know if
> > others have done similar tests. If we have to merge this patchset into upstream, is it > possible to add a kernel option to
> > enable/disable this patchset?
> >
> > Hi Joe,
> >
> > Can you clarify the test and the problems you observed?
> > I wonder if you tried the test with the latest tree WITHOUT my patches.
> >
> > As far as I know, there is something wrong with the upstream right now.
> > It does not complete the rdma-core testcases, and 'segmentation fault' is observed
> > in the middle of the full test run, which did not happen before October 2024.
>
> It appears that the root cause of this issue lies within the userspace components.
> My report yesterday was based on experiments conducted on Ubuntu 24.04.1 LTS (x86_64).
> It seems to me that rxe is somehow broken regardless of kernel version
> as long as userspace components are provided by Ubuntu 24.04.1 LTS.
> I built and tried linux-6.11, linux-6.10, and linux-6.8, and they all failed as I reported.
>
> I switched to Ubuntu 22.04.5 LTS (aarch64) to test with the older libraries.
> All tests available passed using the rdma for-next tree without any problem.
> Then, I applied my ODP patches onto it, and everything is still fine.
> ####################
> ubuntu@...a-aarch64:~/rdma-core$ git branch -v
> * master fb965e2d0 Merge pull request #1531 from selvintxavier/pbuf_optimization
> ubuntu@...a-aarch64:~/rdma-core$ ./build/bin/run_tests.py
> ..........ss..........ssssssssss..............ssssssssssssssssssssssssss.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss........ssssss..ss....s.sssssss....ss....ss..............s......................ss.............sss...ssss
> ----------------------------------------------------------------------
> Ran 321 tests in 3.599s
>
> OK (skipped=211)
> ubuntu@...a-aarch64:~/rdma-core$ ./build/bin/run_tests.py -k odp
> sssssssss..ss....s.s
> ----------------------------------------------------------------------
> Ran 20 tests in 0.269s
>
> OK (skipped=13)
> ####################
>
> Possibly, there was a regression in libibverbs between v39.0-1 and v50.0-2build2.
> We need to take a closer look to resolve the malfunction of rxe on Ubuntu 24.04.
That's distressing.
> In conclusion, I believe there is nothing in my ODP patches that could cause
> the rxe driver to fail. I would appreciate any feedback on potential improvements.
What am I supposed to do with this though?
Joe, can you please answer Daisuke's questions on what problems you
observed and if you observe them without the ODP patches?
Jason
Powered by blists - more mailing lists