lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 20 Sep 2021 20:22:44 +0000
From:   <Patrick.Mclean@...y.com>
To:     <stable@...r.kernel.org>
CC:     <regressions@...ts.linux.dev>, <ayal@...dia.com>,
        <saeedm@...dia.com>, <netdev@...r.kernel.org>, <leonro@...dia.com>,
        <Aaron.U'ren@...y.com>, <Russell.Brown@...y.com>,
        <Victor.Payno@...y.com>
Subject: mlx5_core 5.10 stable series regression starting at 5.10.65

In 5.10 stable kernels since 5.10.65 certain mlx5 cards are no longer usable (relevant dmesg logs and lspci output are pasted below).

Bisecting the problem tracks the problem down to this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=fe6322774ca28669868a7e231e173e09f7422118

Here is how lscpi -nn identifies the cards:
41:00.0 Ethernet controller [0200]: Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017]
41:00.1 Ethernet controller [0200]: Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017]

Here are the relevant dmesg logs:
[   13.409473] mlx5_core 0000:41:00.0: firmware version: 16.31.1014
[   13.415944] mlx5_core 0000:41:00.0: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link)
[   13.707425] mlx5_core 0000:41:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 24414Mbps
[   13.718221] mlx5_core 0000:41:00.0: E-Switch: Total vports 2, per vport: max uc(128) max mc(2048)
[   13.740607] mlx5_core 0000:41:00.0: Port module event: module 0, Cable plugged
[   13.759857] mlx5_core 0000:41:00.0: mlx5_pcie_event:294:(pid 586): PCIe slot advertised sufficient power (75W).
[   17.986973] mlx5_core 0000:41:00.0: E-Switch: cleanup
[   18.686204] mlx5_core 0000:41:00.0: init_one:1371:(pid 803): mlx5_load_one failed with error code -22
[   18.701352] mlx5_core: probe of 0000:41:00.0 failed with error -22
[   18.727364] mlx5_core 0000:41:00.1: firmware version: 16.31.1014
[   18.743853] mlx5_core 0000:41:00.1: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link)
[   19.015349] mlx5_core 0000:41:00.1: Rate limit: 127 rates are supported, range: 0Mbps to 24414Mbps
[   19.025157] mlx5_core 0000:41:00.1: E-Switch: Total vports 2, per vport: max uc(128) max mc(2048)
[   19.053569] mlx5_core 0000:41:00.1: Port module event: module 1, Cable unplugged
[   19.062093] mlx5_core 0000:41:00.1: mlx5_pcie_event:294:(pid 591): PCIe slot advertised sufficient power (75W).
[   22.826932] mlx5_core 0000:41:00.1: E-Switch: cleanup
[   23.544747] mlx5_core 0000:41:00.1: init_one:1371:(pid 803): mlx5_load_one failed with error code -22
[   23.555071] mlx5_core: probe of 0000:41:00.1 failed with error -22

Please let me know if I can provide any further information.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ