lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <769c50c5-51ac-42d1-9c8e-97783a621a0e@intel.com>
Date: Fri, 31 Jan 2025 13:56:00 +0100
From: Przemek Kitszel <przemyslaw.kitszel@...el.com>
To: Lenny Szubowicz <lszubowi@...hat.com>, <pavan.chebbi@...adcom.com>,
	<mchan@...adcom.com>, <andrew+netdev@...n.ch>, <davem@...emloft.net>,
	<edumazet@...gle.com>, <kuba@...nel.org>, <pabeni@...hat.com>,
	<george.shuklin@...il.com>, <andrea.fois@...ntsense.it>
CC: <netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>, Yue Zhao
	<yue.zhao@...pee.com>, <chunguang.xu@...pee.com>, <haifeng.xu@...pee.com>,
	Dawid Osuchowski <dawid.osuchowski@...ux.intel.com>
Subject: Re: [patch v2] tg3: Disable tg3 PCIe AER on system reboot

On 11/29/24 21:36, Lenny Szubowicz wrote:
> Disable PCIe AER on the tg3 device on system reboot on a limited
> list of Dell PowerEdge systems. This prevents a fatal PCIe AER event
> on the tg3 device during the ACPI _PTS (prepare to sleep) method for
> S5 on those systems. The _PTS is invoked by acpi_enter_sleep_state_prep()
> as part of the kernel's reboot sequence as a result of commit
> 38f34dba806a ("PM: ACPI: reboot: Reinstate S5 for reboot").
> 
> There was an earlier fix for this problem by commit 2ca1c94ce0b6
> ("tg3: Disable tg3 device on system reboot to avoid triggering AER").
> But it was discovered that this earlier fix caused a reboot hang
> when some Dell PowerEdge servers were booted via ipxe. To address
> this reboot hang, the earlier fix was essentially reverted by commit
> 9fc3bc764334 ("tg3: power down device only on SYSTEM_POWER_OFF").
> This re-exposed the tg3 PCIe AER on reboot problem.
> 
> This fix is not an ideal solution because the root cause of the AER
> is in system firmware. Instead, it's a targeted work-around in the
> tg3 driver.
> 
> Note also that the PCIe AER must be disabled on the tg3 device even
> if the system is configured to use "firmware first" error handling.
> 
> Fixes: 9fc3bc764334 ("tg3: power down device only on SYSTEM_POWER_OFF")
> Signed-off-by: Lenny Szubowicz <lszubowi@...hat.com>

the bug occurs also on Intel drivers, we even got the ~very same fix
proposed:
https://lore.kernel.org/netdev/20241227035459.90602-1-yue.zhao@shopee.com/T/

I believe that such fix should be centralized, instead of repeating for
each driver. Especially that the list of platforms is likely to be
extended in the future.

It's sad that we don't have Dell cced here, I'm trying to get some
relevant contacts, but without success so far.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ