lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <dd98c1417b0c5027da8e712154eea99807fc4286.camel@linux.ibm.com>
Date: Mon, 22 Jan 2024 12:10:23 +0100
From: Niklas Schnelle <schnelle@...ux.ibm.com>
To: Saeed Mahameed <saeedm@...dia.com>, Leon Romanovsky <leon@...nel.org>
Cc: Linux regressions mailing list <regressions@...ts.linux.dev>,
        netdev@...r.kernel.org, linux-kernel <linux-kernel@...r.kernel.org>,
        linux-rdma <linux-rdma@...r.kernel.org>,
        Alexander Gordeev
 <agordeev@...ux.ibm.com>,
        Alexandra Winter <wintera@...ux.ibm.com>
Subject: mlx5: Regression VFs fail to probe on v6.8-rc1

Hi Saeed, Hi Leon,

On current v6.8-rc1 on both s390x and on an Intel x86_64 test system
with a ConnectX-6 DX the mlx5 driver fails to probe for VFs (On x86
"echo 1 > /sys/bus/pci/devices/<dev>/sriov_numvfs" after a fresh boot
is enough and is 100% reproducible).

In dmesg I see the following messages (from the Intel server but it's
basically the same on s390x):

[  110.443950] mlx5_core 0000:6f:00.1: E-Switch: Enable: mode(LEGACY), nvfs(1), necvfs(0), active vports(2)
[  110.546248] pci 0000:6f:08.2: [15b3:101e] type 00 class 0x020000 PCIe Endpoint
[  110.546340] pci 0000:6f:08.2: enabling Extended Tags
[  110.547626] pci 0000:6f:08.2: Adding to iommu group 115
[  110.553328] mlx5_core 0000:6f:08.2: enabling device (0000 -> 0002)
[  110.553478] mlx5_core 0000:6f:08.2: firmware version: 22.36.1010
[  110.718748] mlx5_core 0000:6f:08.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  110.730136] mlx5_core 0000:6f:08.2: Assigned random MAC address ce:a6:ec:9e:70:49
[  110.734351] mlx5_core 0000:6f:08.2: mlx5_cmd_out_err:808:(pid 650): CREATE_TIS(0x912) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x595b5d), err(-22)
[  110.735776] mlx5_core 0000:6f:08.2: mlx5e_create_mdev_resources:174:(pid 650): alloc tises failed, -22
[  110.736819] mlx5_core 0000:6f:08.2: _mlx5e_probe:6076:(pid 650): mlx5e_resume failed, -22
[  110.749146] mlx5_core.eth: probe of mlx5_core.eth.2 failed with error -22
[  110.776533] mlx5_core 0000:6f:08.2: is_dpll_supported:213:(pid 650): Missing SyncE capability

I've actually encountered this problem before on December 21 on linux-
next but then didn't investigate further as the holidays were coming up
and it was affecting x86 as well. It was gone after the holidays on
next-20240104. Somehow it's now back on both linux-next and v6.8-rc1.
This same configuration of course works fine on v6.7. On s390x at least
this also affects ConnectX-4 and ConnectX-5 as well and also occurs
when the VF is passed-through to a different logical partition from the
one controlling the PF.

One point of difference to other common setups may be that this Intel
Sapphire Rapids server as well as s390x are running with IOMMU enabled
and no pass-through for kernel code i.e. on the Intel server my kernel
command line includes "iommu=nopt intel_iommu=on".

Thanks,
Niklas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ