lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 10 Jan 2024 09:42:47 +0530
From: Pavan Chebbi <pavan.chebbi@...adcom.com>
To: Heiner Kallweit <hkallweit1@...il.com>
Cc: Andrea Fois <andrea.fois@...ntsense.it>, Michael Chan <mchan@...adcom.com>, 
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, 
	George Shuklin <george.shuklin@...il.com>, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] tg3: add new module param to force device power down on reboot

On Wed, Jan 10, 2024 at 2:01 AM Heiner Kallweit <hkallweit1@...il.com> wrote:
>
> On 09.01.2024 20:45, Andrea Fois wrote:
> > The bug #1917471 was fixed in commit 2ca1c94ce0b6 ("tg3: Disable tg3
> > device on system reboot to avoid triggering AER") but was reintroduced
> > by commit 9fc3bc764334 ("tg3: power down device only on
> > SYSTEM_POWER_OFF").
> >
> > The problem described in #1917471 is still consistently replicable on
> > reboots on Dell Servers (i.e. R750xs with BCM5720 LOM), causing NMIs
> > (i.e. NMI received for unknown reason 38 on cpu 0) after 9fc3bc764334
> > was committed.
> >
> > The problem is detected also by the Lifecycle controller and logged as
> > a PCI Bus Error for the device.
> >
> > As the problems addressed by 2ca1c94ce0b6 and by 9fc3bc764334 requires
> > opposite strategies, a new module param "force_pwr_down_on_reboot"
> > <bool> is introduced to fix both scenarios:
> >
> Adding module parameters is discouraged. What I see could try:
Ack.
>
> - limit 9fc3bc764334 to the specific machine type mentioned in the
>   commit message (based DMI info)
> - 2ca1c94ce0b6 performs two actions: power down tg3 and disable device
>   Based on the commit description disabling the device might be sufficient.

I think the second suggestion could be a better solution. Helps to
solve the issue 9fc3bc764334 is trying to fix.
But I am not sure how easy it is to test. As I recall, Goerge was
unable to reach out to the author of 2ca1c94ce0b6 when he wanted to
test his patch for regression.
We did discuss the risk of this regression.
https://patchwork.kernel.org/project/netdevbpf/patch/20231101130418.44164-1-george.shuklin@gmail.com/
Unfortunately, looks like it has come true :(

Download attachment "smime.p7s" of type "application/pkcs7-signature" (4209 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ