lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250502150027.GA818097@bhelgaas>
Date: Fri, 2 May 2025 10:00:27 -0500
From: Bjorn Helgaas <helgaas@...nel.org>
To: hans.zhang@...tech.com
Cc: kbusch@...nel.org, axboe@...nel.dk, hch@....de, sagi@...mberg.me,
	manivannan.sadhasivam@...aro.org, linux-nvme@...ts.infradead.org,
	linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org
Subject: Re: [PATCH] nvme-pci: Fix system hang when ASPM L1 is enabled during
 suspend

On Fri, May 02, 2025 at 11:20:51AM +0800, hans.zhang@...tech.com wrote:
> From: Hans Zhang <hans.zhang@...tech.com>
> 
> When PCIe ASPM L1 is enabled (CONFIG_PCIEASPM_POWERSAVE=y), certain

CONFIG_PCIEASPM_POWERSAVE=y only sets the default.  L1 can be enabled
dynamically regardless of the config.

> NVMe controllers fail to release LPI MSI-X interrupts during system
> suspend, leading to a system hang. This occurs because the driver's
> existing power management path does not fully disable the device
> when ASPM is active.

I have no idea what this has to do with ASPM L1.  I do see that
nvme_suspend() tests pcie_aspm_enabled(pdev) (which seems kind of
janky and racy).  But this doesn't explain anything about what would
cause a system hang.

> The fix adds an explicit device disable and reset preparation step
> in the suspend path after successfully setting the power state.
> This ensures proper cleanup of interrupt resources even when ASPM
> L1 is enabled, preventing the system from hanging during suspend.

Maybe there's a clue in the 600 lines of debug output that I trimmed,
but without some interpretation, I have no idea how to find it.

Unless you see similar problems on other systems, I would suspect an
issue with the SoC or the SoC driver where you do see problems.

Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ