lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <qfrffijnf37qkdgvbsbfpzm3dsud7q3fvzrm33lzj7hwttqw3g@coah4xzo7zbs>
Date: Tue, 28 Oct 2025 11:03:28 +0530
From: Manivannan Sadhasivam <mani@...nel.org>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: linux-pci@...r.kernel.org, 
	Manivannan Sadhasivam <manivannan.sadhasivam@....qualcomm.com>, Krzysztof Wilczyński <kwilczynski@...nel.org>, 
	Johan Hovold <johan@...nel.org>, Frank Li <Frank.li@....com>, 
	Shawn Lin <shawn.lin@...k-chips.com>, Rob Herring <robh@...nel.org>, 
	"David E . Box" <david.e.box@...ux.intel.com>, Kai-Heng Feng <kai.heng.feng@...onical.com>, 
	"Rafael J . Wysocki" <rafael@...nel.org>, Heiner Kallweit <hkallweit1@...il.com>, 
	Chia-Lin Kao <acelan.kao@...onical.com>, Gustavo Pimentel <Gustavo.Pimentel@...opsys.com>, 
	Han Jingoo <jingoohan1@...il.com>, Bjorn Andersson <andersson@...nel.org>, 
	Konrad Dybcio <konrad.dybcio@....qualcomm.com>, Bartosz Golaszewski <brgl@...ev.pl>, 
	linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Bjorn Helgaas <bhelgaas@...gle.com>
Subject: Re: [PATCH] Revert "PCI: qcom: Remove custom ASPM enablement code"

On Mon, Oct 27, 2025 at 06:07:03PM -0500, Bjorn Helgaas wrote:
> On Mon, Oct 27, 2025 at 05:21:30PM +0530, Manivannan Sadhasivam wrote:
> > On Sun, Oct 26, 2025 at 02:37:54PM -0500, Bjorn Helgaas wrote:
> > > On Sun, Oct 26, 2025 at 08:58:29PM +0530, Manivannan Sadhasivam wrote:
> > > > On Fri, Oct 24, 2025 at 04:04:57PM -0500, Bjorn Helgaas wrote:
> > > > > From: Bjorn Helgaas <bhelgaas@...gle.com>
> > > > > 
> > > > > This reverts commit a729c16646198872e345bf6c48dbe540ad8a9753.
> > > > > 
> > > > > Prior to a729c1664619 ("PCI: qcom: Remove custom ASPM enablement code"),
> > > > > the qcom controller driver enabled ASPM, including L0s, L1, and L1 PM
> > > > > Substates, for all devices powered on at the time the controller driver
> > > > > enumerates them.
> > > > > 
> > > > > ASPM was *not* enabled for devices powered on later by pwrctrl (unless the
> > > > > kernel was built with PCIEASPM_POWERSAVE or PCIEASPM_POWER_SUPERSAVE, or
> > > > > the user enabled ASPM via module parameter or sysfs).
> > > > > 
> > > > > After f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for
> > > > > devicetree platforms"), the PCI core enabled all ASPM states for all
> > > > > devices whether powered on initially or by pwrctrl, so a729c1664619 was
> > > > > unnecessary and reverted.
> > > > > 
> > > > > But f3ac2ff14834 was too aggressive and broke platforms that didn't support
> > > > > CLKREQ# or required device-specific configuration for L1 Substates, so
> > > > > df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
> > > > > enabled only L0s and L1.
> > > > > 
> > > > > On Qualcomm platforms, this left L1 Substates disabled, which was a
> > > > > regression.  Revert a729c1664619 so L1 Substates will be enabled on devices
> > > > > that are initially powered on.  Devices powered on by pwrctrl will be
> > > > > addressed later.
> > ...
> 
> > > I have some heartburn about both the revert and the
> > > pci_host_set_default_pcie_link_state() approach because they apply to
> > > the entire hierarchy under a qcom or VMD root port, potentially
> > > including add-in cards with switches.  CLKREQ# (and possibly more) is
> > > required to enable L1SS, and I don't know if we can assume it's
> > > supported on add-in links.
> > 
> > I don't think we can assume, but at the same time, I don't think we
> > will ever be able to come up with a logical way to enable L1ss on
> > all devices. But if we leave the decision to the host controller
> > drivers, they can at least guarantee that the CLKREQ# and other
> > requirements are satisfied from the host perspective for L1ss. Then,
> > if any device exhibit erratic behavior, we will for sure know that
> > the device is at fault and we can quirk them.
> 
> If we can figure out that an endpoint is defective, a quirk is great.
> But the issue might be something in the path, e.g., some connector in
> the path leading to the endpoint doesn't include CLKREQ#, and we can't
> quirk the endpoint then.
> 

I got your concern, in this case, we can have board specific quirks for those
with broken connectors and such. Let's say if the SoC supports CLKREQ#, but one
of the downstream connectors of a switch connected to the Root Port doesn't,
then we can have a quirk based on the board compatible and the switch combo.
It is implied that the person who is designing the board is aware of this issue
and they can add the quirk by themselves (without affecting users).

> To me it sounds like the mainline kernel should be safe and only
> enable L1SS when it has a clear signal that it is safe, either via
> devicetree, ACPI, or L1SS configuration inherited from firmware.  I
> don't want a future of telling users to boot with "pcie_aspm=off" if
> a device doesn't work.
> 

It is a trade-off. I completely agree that we do not want to break users, but at
the same time, the kernel should provide optimum power savings by default also
(if it has the knowledge).

The SoCs targetting embedded/laptop usecases want to make use of L1ss to prevent
draining battery too fast. Currently, on Qcom platforms, we are asking our users
to enable ASPM through Kconfig/cmdline to make sure their laptop battery last
long. And we cannot ask distros to enable these for users as distros would only
use a common kernel/config for all platforms they support. So the burden
essentially lies on the user. All this could've been fixed if the BIOS enabled
L1ss, but it doesn't unfortunately.

Also, it is not possible to rely on DT/ACPI or other platform description to
tell the OS about CLKREQ# routing for all the devices in the topology. They can
give the information about Root Ports and other devices attached to the board in
the enclosure, but not about the devices that will get attached to the open slot
dynamically.

Also, for a discoverable bus like PCI, DT do not necessarily describe all
devices unless they requires some special handling like power sequencing.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ