lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20241125195443.0ddf0d0176d7c34bd29942c7@paranoici.org>
Date: Mon, 25 Nov 2024 19:54:43 +0100
From: Francesco Poli <invernomuto@...anoici.org>
To: Uwe Kleine-König <ukleinek@...ian.org>
Cc: 1086520@...s.debian.org, Mark Zhang <markzhang@...dia.com>, Leon
 Romanovsky <leonro@...dia.com>, linux-rdma@...r.kernel.org,
 netdev@...r.kernel.org
Subject: Re: Bug#1086520: linux-image-6.11.2-amd64: makes opensm fail to
 start

On Thu, 21 Nov 2024 11:04:13 +0100 Uwe Kleine-König wrote:

[...]
> It looks like the commit that is biting you is
> 
> https://git.kernel.org/linus/50660c5197f52b8137e223dc3ba8d43661179a1d
> 
> So if you bisect, try 50660c5197f52b8137e223dc3ba8d43661179a1d and its
> parent 24943dcdc156cf294d97a36bf5c51168bf574c22 first.

I started to bisect.

The first surprise is that 50660c5197f52b8137e223dc3ba8d43661179a1d is
good...   :-o

  $ git checkout 50660c5197f52b8137e223dc3ba8d43661179a1d
  $ make -j 12 my_defconfig bindeb-pkg

  [install and reboot with this kernel version]

  # ls /sys/class/infiniband_mad/ -altrF
  total 0
  drwxr-xr-x 70 root root    0 Nov 25 12:05 ../
  -r--r--r--  1 root root 4096 Nov 25 12:05 abi_version
  lrwxrwxrwx  1 root root    0 Nov 25 12:05 umad0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/umad0/
  lrwxrwxrwx  1 root root    0 Nov 25 12:05 umad1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/umad1/
  lrwxrwxrwx  1 root root    0 Nov 25 12:08 issm1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/issm1/
  lrwxrwxrwx  1 root root    0 Nov 25 12:08 issm0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/issm0/
  drwxr-xr-x  2 root root    0 Nov 25 12:08 ./

  [InfiniBand works]

  $ git bisect start
  $ git bisect good
  $ git checkout v6.11
  $ make -j 12 my_defconfig bindeb-pkg

  [install and reboot with this kernel version]

  # ls /sys/class/infiniband_mad/ -altrF
  total 0
  drwxr-xr-x 70 root root    0 Nov 25 12:29 ../
  -r--r--r--  1 root root 4096 Nov 25 12:29 abi_version
  lrwxrwxrwx  1 root root    0 Nov 25 12:29 umad0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/umad0/
  lrwxrwxrwx  1 root root    0 Nov 25 12:29 umad1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/umad1/
  drwxr-xr-x  2 root root    0 Nov 25 12:30 ./

  [InfiniBand fails, because OpenSM fails to start]

  $ git bisect bad
  Bisecting: 7036 revisions left to test after this (roughly 13 steps)
  [b3ce7a30847a54a7f96a35e609303d8afecd460b] Merge tag 'drm-next-2024-07-18' of https://gitlab.freedesktop.org/drm/kernel
  $ make -j 12 my_defconfig bindeb-pkg


Woooha, 13 steps are a lot...

I went on until 10 steps are left:

  [test b3ce7a30847a54a7f96a35e609303d8afecd460b]
  $ git bisect good
  Bisecting: 3385 revisions left to test after this (roughly 12 steps)
  [fbc90c042cd1dc7258ebfebe6d226017e5b5ac8c] Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
  
  [test fbc90c042cd1dc7258ebfebe6d226017e5b5ac8c]
  $ git bisect bad
  Bisecting: 1763 revisions left to test after this (roughly 11 steps)
  [09ea8089abb5d851ce08a9b1a43706e42ef39db2] Merge tag 'staging-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

  [test 09ea8089abb5d851ce08a9b1a43706e42ef39db2]
  $ git bisect bad
  Bisecting: 910 revisions left to test after this (roughly 10 steps)
  [4305ca0087dd99c3c3e0e2ac8a228b7e53a21c78] Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi


Since I could not afford to keep the cluster out of service any longer
(each step takes at least 20 or 25 minutes: build + install + reboot +
check InfiniBand), I decided to return the cluster to service.

I will try to continue to bisect by testing the resulting kernels on a
compute node: there's no OpenSM there and it cannot run anyway, if
there's another OpenSM on the same InfiniBand network.
However, I can check whether those issm* symlinks are created in
/sys/class/infiniband_mad/ 
I really hope that this is enough to pinpoint the first bad
commit...

Any better ideas?


-- 
 http://www.inventati.org/frx/
 There's not a second to spare! To the laboratory!
..................................................... Francesco Poli .
 GnuPG key fpr == CA01 1147 9CD2 EFDF FB82  3925 3E1C 27E1 1F69 BFFE

Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ