lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAHjRaAcO866yePi_XJdPm5R05bkyLVYyzzMRCbPNwko5d=rY1A@mail.gmail.com>
Date: Wed, 25 Dec 2024 06:11:30 +0100
From: Joe Klein <joe.klein812@...il.com>
To: Zhu Yanjun <yanjun.zhu@...ux.dev>
Cc: Holger Kiehl <Holger.Kiehl@....de>, Jason Gunthorpe <jgg@...pe.ca>, Leon Romanovsky <leon@...nel.org>, 
	linux-rdma@...r.kernel.org, linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: failed to allocate device WQ

On Sat, Dec 21, 2024 at 9:38 AM Zhu Yanjun <yanjun.zhu@...ux.dev> wrote:
>
> 在 2024/12/20 18:10, Holger Kiehl 写道:
> > Hello,
> >
> > since upgrading from kernel 6.10 to 6.11 (also 6.12) one Infiniband
> > card sometimes hits this error:
> >
> >     kernel: workqueue: Failed to create a rescuer kthread for wq "ipoib_wq": -EINTR
> >     kernel: ib0: failed to allocate device WQ
> >     kernel: mlx5_1: failed to initialize device: ib0 port 1 (ret = -12)
> >     kernel: mlx5_1: couldn't register ipoib port 1; error -12
> >
> > The system has two cards:
> >
> >     41:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
> >     c4:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
> >
> > If that happens one cannot use that card for TCP/IP communication. It does
> > not always happen, but when it does it always happens with the second
> > card mlx5_1. Never with mlx5_0. This happens on four different systems.
> >
> > Any idea what I can do to stop this from happening?
> >
> > Regards,
> > Holger
> >
> > PS: Firmware for both cards is 20.41.1000
>
> It is very possible that FW is not compatible with the driver. IMO, you
> can make tests with Mellanox OFED.
>
> If the driver is compatible with FW, this problem should disappear.

Thanks, Zhu. We have the similar problem and have been fixed by your solution.
We are in the same boat. Appreciate your help.

>
> Zhu Yanjun
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ