[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250923232712.GA2092207@bhelgaas>
Date: Tue, 23 Sep 2025 18:27:12 -0500
From: Bjorn Helgaas <helgaas@...nel.org>
To: Brian Norris <briannorris@...omium.org>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org, stable@...r.kernel.org,
Ethan Zhao <etzhao1900@...il.com>,
Andrey Ryabinin <ryabinin.a.a@...il.com>
Subject: Re: [PATCH] PCI/sysfs: Ensure devices are powered for config reads
On Tue, Sep 23, 2025 at 04:07:29PM -0700, Brian Norris wrote:
> On Tue, Sep 23, 2025 at 02:02:31PM -0500, Bjorn Helgaas wrote:
> > On Wed, Aug 20, 2025 at 10:26:08AM -0700, Brian Norris wrote:
> > > From: Brian Norris <briannorris@...gle.com>
> > >
> > > max_link_speed, max_link_width, current_link_speed, current_link_width,
> > > secondary_bus_number, and subordinate_bus_number all access config
> > > registers, but they don't check the runtime PM state. If the device is
> > > in D3cold, we may see -EINVAL or even bogus values.
> > >
> > > Wrap these access in pci_config_pm_runtime_{get,put}() like most of the
> > > rest of the similar sysfs attributes.
> >
> > Protecting the config reads seems right to me.
> >
> > If the device is in D3cold, a config read will result in a Completion
> > Timeout. On most x86 platforms that's "fine" and merely results in ~0
> > data. But that's merely convention, not a PCIe spec requirement.
> >
> > I think it's a potential issue with PCIe controllers used on arm64 and
> > might result in an SError or synchronous abort from which we don't
> > recover well. I'd love to hear actual experience about how reading
> > "current_link_speed" works on a device in D3cold in an arm64 system.
>
> I'm working on a few such arm64 systems :) (pcie-qcom Chromebooks, and
> non-upstream DWC-based Pixel phones; I have a little more knowledge of
> the latter.) The answers may vary by SoC, and especially by PCIe
> implementation. ARM SoCs are notoriously ... diverse.
>
> To my knowledge, it can be several of the above on arm64 + DWC.
>
> * pci_generic_config_read() -> pci_ops::map_bus() may return NULL, in
> which case we get PCIBIOS_DEVICE_NOT_FOUND / -EINVAL. And specifically
> on arm64 with DWC PCIe, dw_pcie_other_conf_map_bus() may see the link
> down on a suspended bridge and return NULL.
>
> * The map_bus() check is admittedly racy, so we might still *actually*
> hit the hardware, at which point this gets even more
> implementation-defined:
>
> (a) if the PCIe HW is not powered (for example, if we put the link to
> L3 and fully powered off the controller to save power), we might
> not even get a completion timeout, and it depends on how the
> SoC is wired up. But I believe this tends to be SError, and a
> crash.
>
> (b) if the PCIe HW is powered but something else is down (e.g., link
> in L2, device in D3cold, etc.), we'll get a Completion Timeout,
> and a ~0 response. I also was under the impression a ~0 response
> is not spec-mandated, but I believe it's noted in the Synopsys
> documentation.
The ~0 response is not required by the PCIe spec, although there's at
least one implementation note to the effect that a Root Complex
intended for use with software that depends on ~0 data when a config
request fails with Unsupported Request must synthesize that value
(this one is from PCIe r7.0, sec 2.3.2).
> NB: I'm not sure there is really great upstream support for arm64 +
> D3cold yet. If they're not using ACPI (as few arm64 systems do), they
> probably don't have the appropriate platform_pci_* hooks to really
> manage it properly. There have been some prior attempts at adding
> non-x86/ACPI hooks for this, although for different reasons:
>
> https://lore.kernel.org/linux-pci/a38e76d6f3a90d7c968c32cee97604f3c41cbccf.camel@mediatek.com/
> [PATCH] PCI:PM: Support platforms that do not implement ACPI
>
> That submission stalled because it didn't really have the whole picture
> (in that case, the wwan/modem driver in question).
>
> > As Ethan and Andrey pointed out, we could skip max_link_speed_show()
> > because pcie_get_speed_cap() already uses a cached value and doesn't
> > do a config access.
>
> Ack, I'll drop that part of the change.
>
> > max_link_width_show() is similar and also comes from PCI_EXP_LNKCAP
> > but is not currently cached, so I think we do need that one. Worth a
> > comment to explain the non-obvious difference.
>
> Sure, I'll add a comment for max_link_width.
>
> > PCI_EXP_LNKCAP is ostensibly read-only and could conceivably be
> > cached, but the ASPM exit latencies can change based on the Common
> > Clock Configuration.
>
> I'll plan not to add additional caching, unless excess wakeups becomes a
> problem.
Perfect, thanks, I'll watch for this.
Bjorn
Powered by blists - more mailing lists