[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20241125195443.0ddf0d0176d7c34bd29942c7@paranoici.org>
Date: Mon, 25 Nov 2024 19:54:43 +0100
From: Francesco Poli <invernomuto@...anoici.org>
To: Uwe Kleine-König <ukleinek@...ian.org>
Cc: 1086520@...s.debian.org, Mark Zhang <markzhang@...dia.com>, Leon
Romanovsky <leonro@...dia.com>, linux-rdma@...r.kernel.org,
netdev@...r.kernel.org
Subject: Re: Bug#1086520: linux-image-6.11.2-amd64: makes opensm fail to
start
On Thu, 21 Nov 2024 11:04:13 +0100 Uwe Kleine-König wrote:
[...]
> It looks like the commit that is biting you is
>
> https://git.kernel.org/linus/50660c5197f52b8137e223dc3ba8d43661179a1d
>
> So if you bisect, try 50660c5197f52b8137e223dc3ba8d43661179a1d and its
> parent 24943dcdc156cf294d97a36bf5c51168bf574c22 first.
I started to bisect.
The first surprise is that 50660c5197f52b8137e223dc3ba8d43661179a1d is
good... :-o
$ git checkout 50660c5197f52b8137e223dc3ba8d43661179a1d
$ make -j 12 my_defconfig bindeb-pkg
[install and reboot with this kernel version]
# ls /sys/class/infiniband_mad/ -altrF
total 0
drwxr-xr-x 70 root root 0 Nov 25 12:05 ../
-r--r--r-- 1 root root 4096 Nov 25 12:05 abi_version
lrwxrwxrwx 1 root root 0 Nov 25 12:05 umad0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/umad0/
lrwxrwxrwx 1 root root 0 Nov 25 12:05 umad1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/umad1/
lrwxrwxrwx 1 root root 0 Nov 25 12:08 issm1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/issm1/
lrwxrwxrwx 1 root root 0 Nov 25 12:08 issm0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/issm0/
drwxr-xr-x 2 root root 0 Nov 25 12:08 ./
[InfiniBand works]
$ git bisect start
$ git bisect good
$ git checkout v6.11
$ make -j 12 my_defconfig bindeb-pkg
[install and reboot with this kernel version]
# ls /sys/class/infiniband_mad/ -altrF
total 0
drwxr-xr-x 70 root root 0 Nov 25 12:29 ../
-r--r--r-- 1 root root 4096 Nov 25 12:29 abi_version
lrwxrwxrwx 1 root root 0 Nov 25 12:29 umad0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/umad0/
lrwxrwxrwx 1 root root 0 Nov 25 12:29 umad1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/umad1/
drwxr-xr-x 2 root root 0 Nov 25 12:30 ./
[InfiniBand fails, because OpenSM fails to start]
$ git bisect bad
Bisecting: 7036 revisions left to test after this (roughly 13 steps)
[b3ce7a30847a54a7f96a35e609303d8afecd460b] Merge tag 'drm-next-2024-07-18' of https://gitlab.freedesktop.org/drm/kernel
$ make -j 12 my_defconfig bindeb-pkg
Woooha, 13 steps are a lot...
I went on until 10 steps are left:
[test b3ce7a30847a54a7f96a35e609303d8afecd460b]
$ git bisect good
Bisecting: 3385 revisions left to test after this (roughly 12 steps)
[fbc90c042cd1dc7258ebfebe6d226017e5b5ac8c] Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
[test fbc90c042cd1dc7258ebfebe6d226017e5b5ac8c]
$ git bisect bad
Bisecting: 1763 revisions left to test after this (roughly 11 steps)
[09ea8089abb5d851ce08a9b1a43706e42ef39db2] Merge tag 'staging-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
[test 09ea8089abb5d851ce08a9b1a43706e42ef39db2]
$ git bisect bad
Bisecting: 910 revisions left to test after this (roughly 10 steps)
[4305ca0087dd99c3c3e0e2ac8a228b7e53a21c78] Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Since I could not afford to keep the cluster out of service any longer
(each step takes at least 20 or 25 minutes: build + install + reboot +
check InfiniBand), I decided to return the cluster to service.
I will try to continue to bisect by testing the resulting kernels on a
compute node: there's no OpenSM there and it cannot run anyway, if
there's another OpenSM on the same InfiniBand network.
However, I can check whether those issm* symlinks are created in
/sys/class/infiniband_mad/
I really hope that this is enough to pinpoint the first bad
commit...
Any better ideas?
--
http://www.inventati.org/frx/
There's not a second to spare! To the laboratory!
..................................................... Francesco Poli .
GnuPG key fpr == CA01 1147 9CD2 EFDF FB82 3925 3E1C 27E1 1F69 BFFE
Content of type "application/pgp-signature" skipped
Powered by blists - more mailing lists