lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <nvs4i2v7o6vn6zhmtq4sgazy2hu5kiulukxcntdelggmznnl7h@so3oul6uwgbl>
Date: Thu, 21 Nov 2024 11:04:13 +0100
From: Uwe Kleine-König <ukleinek@...ian.org>
To: Francesco Poli <invernomuto@...anoici.org>, 1086520@...s.debian.org
Cc: Mark Zhang <markzhang@...dia.com>, Leon Romanovsky <leonro@...dia.com>, 
	linux-rdma@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: Bug#1086520: linux-image-6.11.2-amd64: makes opensm fail to start

Hello Francesco,

[for the new-comers: This is about a regression in 6.11. Details
available at https://bugs.debian.org/1086520. The TL;DR; is that on
6.10.11 opensm works as expected, while it fails to start on 6.11.7.]

On Mon, Nov 18, 2024 at 08:06:16PM +0100, Francesco Poli wrote:
> On Mon, 18 Nov 2024 09:58:03 +0100 Uwe Kleine-König wrote:
> 
> [...]
> > On Wed, Nov 13, 2024 at 11:15:03PM +0100, Francesco Poli wrote:
> > > On Mon, 11 Nov 2024 11:22:26 +0100 Uwe Kleine-König wrote:
> [...]
> > > > I guess the kernel provides a directory "/sys/class/infiniband_mad". Do
> > > > its contents look different on 6.10.x and 6.11.x?
> > > 
> > > I will look into this as soon as I can reboot the cluster head node.
> 
> I looked into this, while testing the new Debian Linux kernel that has
> just migrated to testing (which, once again, makes opensm fail to
> start, just like other 6.11.x versions).
> 
> With a working kernel:
> 
>   $ uname -v
>   #1 SMP PREEMPT_DYNAMIC Debian 6.10.11-1 (2024-09-22)
>   $ ls -altrF /sys/class/infiniband_mad/
>   total 0
>   lrwxrwxrwx  1 root root    0 Nov  4 15:58 umad0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/umad0/
>   lrwxrwxrwx  1 root root    0 Nov  4 15:58 umad1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/umad1/
>   lrwxrwxrwx  1 root root    0 Nov 11 15:54 issm1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/issm1/
>   lrwxrwxrwx  1 root root    0 Nov 11 15:54 issm0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/issm0/
>   drwxr-xr-x  2 root root    0 Nov 11 15:54 ./
>   drwxr-xr-x 72 root root    0 Nov 11 15:54 ../
>   -r--r--r--  1 root root 4096 Nov 11 15:54 abi_version
>   $ cat /sys/class/infiniband_mad/abi_version 
>   5
> 
> With a kernel that makes opensm fail to start:
> 
>   $ uname -v
>   #1 SMP PREEMPT_DYNAMIC Debian 6.11.7-1 (2024-11-09)
>   $ ls -altrF /sys/class/infiniband_mad/
>   total 0
>   drwxr-xr-x 73 root root    0 Nov 18 09:41 ../
>   -r--r--r--  1 root root 4096 Nov 18 09:41 abi_version
>   lrwxrwxrwx  1 root root    0 Nov 18 09:41 umad0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/umad0/
>   lrwxrwxrwx  1 root root    0 Nov 18 09:41 umad1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/umad1/
>   drwxr-xr-x  2 root root    0 Nov 18 09:43 ./
>   $ cat /sys/class/infiniband_mad/abi_version
>   5
> 
> As you can see, a couple of files (symlinks) are missing here...

It looks like the commit that is biting you is

https://git.kernel.org/linus/50660c5197f52b8137e223dc3ba8d43661179a1d

So if you bisect, try 50660c5197f52b8137e223dc3ba8d43661179a1d and its
parent 24943dcdc156cf294d97a36bf5c51168bf574c22 first.

I don't know about infiniband, but I'd say: Either your machine doesn't
have these issmX devices and opensm should cope with that, or these
issmX devices are available then
50660c5197f52b8137e223dc3ba8d43661179a1d is buggy.

> Does this ring a bell?

It doesn't for me, but maybe Mark Zhang or someone else among the new
recipients has an idea?

Best regards
Uwe


Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ