lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID:
 <OS3PR01MB98654FDD5E833D1C409B9C2CE5022@OS3PR01MB9865.jpnprd01.prod.outlook.com>
Date: Mon, 23 Dec 2024 01:54:53 +0000
From: "Daisuke Matsuda (Fujitsu)" <matsuda-daisuke@...itsu.com>
To: 'Joe Klein' <joe.klein812@...il.com>
CC: "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
	"leon@...nel.org" <leon@...nel.org>, "jgg@...pe.ca" <jgg@...pe.ca>,
	"zyjzyj2000@...il.com" <zyjzyj2000@...il.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "rpearsonhpe@...il.com"
	<rpearsonhpe@...il.com>, "Zhijian Li (Fujitsu)" <lizhijian@...itsu.com>
Subject: RE: [PATCH for-next v9 0/5] On-Demand Paging on SoftRoCE

On Mon, Dec 23, 2024 2:25 AM Joe Klein <joe.klein812@...il.com> wrote:
> We have tested this patcheset and had a lot of problems, even without using the ODP option in softroce. I don't know if others have done similar tests. If we have to merge this patchset into upstream, is it > possible to add a kernel option to enable/disable this patchset?

Hi Joe,

Can you clarify the test and the problems you observed?
I wonder if you tried the test with the latest tree WITHOUT my patches.

As far as I know, there is something wrong with the upstream right now.
It does not complete the rdma-core testcases, and 'segmentation fault' is observed
in the middle of the full test run, which did not happen before October 2024.

Here are the details of the issue:
===== test log =====
ubuntu@...a-dev:~$ sudo rdma link add rxe_ens3 type rxe netdev ens3
ubuntu@...a-dev:~$ cd rdma-core
ubuntu@...a-dev:~/rdma-core$ uname -r
6.13.0-rc1+
ubuntu@...a-dev:~/rdma-core$ pwd
/home/ubuntu/rdma-core
ubuntu@...a-dev:~/rdma-core$ ./build/bin/run_tests.py
..........ss.../usr/lib/python3.12/_weakrefset.py:39: ResourceWarning: unclosed file <_io.FileIO name='/tmp/tmpe7nsitov' mode='rb+' closefd=True>
  def _remove(item, selfref=ref(self)):
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/lib/python3.12/_weakrefset.py:39: ResourceWarning: unclosed file <_io.FileIO name='/tmp/tmpid85cbou' mode='rb+' closefd=True>
  def _remove(item, selfref=ref(self)):
ResourceWarning: Enable tracemalloc to get the object allocation traceback
.......ssssss/usr/lib/python3.12/contextlib.py:141: ResourceWarning: unclosed file <_io.FileIO name='/tmp/tmp9pgb7zo8' mode='rb+' closefd=True>
  def __exit__(self, typ, value, traceback):
ResourceWarning: Enable tracemalloc to get the object allocation traceback
ssss..............ssssssssssssssssssssssssss.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss........ssssssssssssssssss/usr/lib/python3.12/_weakrefset.py:39: ResourceWarning: unclosed file <_io.FileIO name='/tmp/tmpate1loci' mode='rb+' closefd=True>
  def _remove(item, selfref=ref(self)):
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Traceback (most recent call last):
  File "pd.pyx", line 120, in pyverbs.pd.PD.close
pyverbs.pyverbs_error.PyverbsRDMAError: Failed to dealloc PD. Errno: 9, Bad file descriptor
Exception ignored in: 'pyverbs.pd.PD.__dealloc__'
Traceback (most recent call last):
  File "pd.pyx", line 120, in pyverbs.pd.PD.close
pyverbs.pyverbs_error.PyverbsRDMAError: Failed to dealloc PD. Errno: 9, Bad file descriptor
ssssTraceback (most recent call last):
  File "pd.pyx", line 120, in pyverbs.pd.PD.close
pyverbs.pyverbs_error.PyverbsRDMAError: Failed to dealloc PD. Errno: 9, Bad file descriptor
Exception ignored in: 'pyverbs.pd.PD.__dealloc__'
Traceback (most recent call last):
  File "pd.pyx", line 120, in pyverbs.pd.PD.close
pyverbs.pyverbs_error.PyverbsRDMAError: Failed to dealloc PD. Errno: 9, Bad file descriptor
Traceback (most recent call last):
  File "pd.pyx", line 120, in pyverbs.pd.PD.close
pyverbs.pyverbs_error.PyverbsRDMAError: Failed to dealloc PD. Errno: 9, Bad file descriptor
Exception ignored in: 'pyverbs.pd.PD.__dealloc__'
Traceback (most recent call last):
  File "pd.pyx", line 120, in pyverbs.pd.PD.close
pyverbs.pyverbs_error.PyverbsRDMAError: Failed to dealloc PD. Errno: 9, Bad file descriptor
Traceback (most recent call last):
  File "pd.pyx", line 120, in pyverbs.pd.PD.close
pyverbs.pyverbs_error.PyverbsRDMAError: Failed to dealloc PD. Errno: 9, Bad file descriptor
Exception ignored in: 'pyverbs.pd.PD.__dealloc__'
Traceback (most recent call last):
  File "pd.pyx", line 120, in pyverbs.pd.PD.close
pyverbs.pyverbs_error.PyverbsRDMAError: Failed to dealloc PD. Errno: 9, Bad file descriptor
s....ssSegmentation fault (core dumped)
===========

=====dmesg=====
[  147.464243] rxe_ens3: qp#21 make_send_cqe: non-flush error status = 4
[  147.473843] rxe_ens3: qp#23 make_send_cqe: non-flush error status = 10
[  147.484540] rxe_ens3: qp#25 make_send_cqe: non-flush error status = 9
[  147.494541] rxe_ens3: qp#27 make_send_cqe: non-flush error status = 10
[  147.524080] rxe_ens3: rxe_create_cq: returned err = -22
[  147.574197] rxe_ens3: cq#26 rxe_resize_cq: returned err = -22
[  147.605719] rxe_ens3: rxe_create_cq: returned err = -95
[  147.606454] rxe_ens3: rxe_create_cq: returned err = -22
[  148.803131] rxe_ens3: qp#51 make_send_cqe: non-flush error status = 10
[  148.831587] rxe_ens3: qp#57 make_send_cqe: non-flush error status = 10
[  148.841627] rxe_ens3: qp#59 make_send_cqe: non-flush error status = 10
[  148.851719] rxe_ens3: qp#61 make_send_cqe: non-flush error status = 10
[  149.104223] python3[1702]: segfault at d0 ip 00007ff95ced16c7 sp 00007fff5e775de0 error 4 in libibverbs.so.1.14.56.0[e6c7,7ff95ceca000+14000] likely on CPU 2 (core 0, socket 2)
[  149.104235] Code: 00 00 c1 e0 04 8b bf 08 01 00 00 48 8d 53 20 48 c7 43 28 00 00 00 00 83 c0 18 c7 43 34 00 00 00 00 be 01 1b 18 c0 66 89 43 20 <49> 8b 80 d0 00 00 00 8b 40 10 89 43 30 31 c0 e8 05 99 ff ff 41 89
=====

If you encounter any problems that surely comes from my ODP patches, please let me know what symptoms you are seeing.
I would also appreciate any help you can offer in fixing the upstream issue.

Thanks,
Daisuke

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ