lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YUrLfMhATS3u6jq5@unreal>
Date:   Wed, 22 Sep 2021 09:21:48 +0300
From:   Leon Romanovsky <leonro@...dia.com>
To:     <Patrick.Mclean@...y.com>
CC:     <greg@...ah.com>, <stable@...r.kernel.org>,
        <regressions@...ts.linux.dev>, <ayal@...dia.com>,
        <saeedm@...dia.com>, <netdev@...r.kernel.org>,
        <Aaron.U'ren@...y.com>, <Russell.Brown@...y.com>,
        <Victor.Payno@...y.com>
Subject: Re: mlx5_core 5.10 stable series regression starting at 5.10.65

On Tue, Sep 21, 2021 at 10:22:57PM +0000, Patrick.Mclean@...y.com wrote:
> > On Mon, Sep 20, 2021 at 08:22:44PM +0000, Patrick.Mclean@...y.com wrote:
> > > In 5.10 stable kernels since 5.10.65 certain mlx5 cards are no longer usable (relevant dmesg logs and lspci output are pasted below).
> > >
> > > Bisecting the problem tracks the problem down to this commit:
> > > https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=fe6322774ca28669868a7e231e173e09f7422118__;!!JmoZiZGBv3RvKRSx!phUrsR595UusBY2Q9eNJQS7-VNtnb72Rcvhe-W0QKDPir1WY9mvWOkLLfe63k-6Uvw$
> > >
> > > Here is how lscpi -nn identifies the cards:
> > > 41:00.0 Ethernet controller [0200]: Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017]
> > > 41:00.1 Ethernet controller [0200]: Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017]
> > >
> > > Here are the relevant dmesg logs:
> > > [   13.409473] mlx5_core 0000:41:00.0: firmware version: 16.31.1014
> > > [   13.415944] mlx5_core 0000:41:00.0: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link)
> > > [   13.707425] mlx5_core 0000:41:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 24414Mbps
> > > [   13.718221] mlx5_core 0000:41:00.0: E-Switch: Total vports 2, per vport: max uc(128) max mc(2048)
> > > [   13.740607] mlx5_core 0000:41:00.0: Port module event: module 0, Cable plugged
> > > [   13.759857] mlx5_core 0000:41:00.0: mlx5_pcie_event:294:(pid 586): PCIe slot advertised sufficient power (75W).
> > > [   17.986973] mlx5_core 0000:41:00.0: E-Switch: cleanup
> > > [   18.686204] mlx5_core 0000:41:00.0: init_one:1371:(pid 803): mlx5_load_one failed with error code -22
> > > [   18.701352] mlx5_core: probe of 0000:41:00.0 failed with error -22
> > > [   18.727364] mlx5_core 0000:41:00.1: firmware version: 16.31.1014
> > > [   18.743853] mlx5_core 0000:41:00.1: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link)
> > > [   19.015349] mlx5_core 0000:41:00.1: Rate limit: 127 rates are supported, range: 0Mbps to 24414Mbps
> > > [   19.025157] mlx5_core 0000:41:00.1: E-Switch: Total vports 2, per vport: max uc(128) max mc(2048)
> > > [   19.053569] mlx5_core 0000:41:00.1: Port module event: module 1, Cable unplugged
> > > [   19.062093] mlx5_core 0000:41:00.1: mlx5_pcie_event:294:(pid 591): PCIe slot advertised sufficient power (75W).
> > > [   22.826932] mlx5_core 0000:41:00.1: E-Switch: cleanup
> > > [   23.544747] mlx5_core 0000:41:00.1: init_one:1371:(pid 803): mlx5_load_one failed with error code -22
> > > [   23.555071] mlx5_core: probe of 0000:41:00.1 failed with error -22
> > >
> > > Please let me know if I can provide any further information.
> > 
> > If you revert that single change, do things work properly?
> 
> Yes, things work properly after reverting that single change (tested with 5.10.67).

The stable@ kernel is missing commit 3d347b1b19da ("net/mlx5: Add support for devlink traps
in mlx5 core driver"), which added mlx5 devlink callbacks (.trap_init and .trap_fini).

I don't know why the commit that you reverted was added to stable@ in
the first place. It doesn't fix any bug and has no Fixes tag.

Thanks

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ