[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <048bd3c4-887c-4d17-9636-354cc626afa3@gmail.com>
Date: Thu, 21 Aug 2025 20:41:28 +0800
From: Ethan Zhao <etzhao1900@...il.com>
To: Brian Norris <briannorris@...omium.org>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH] PCI/sysfs: Ensure devices are powered for config reads
On 8/21/2025 10:56 AM, Brian Norris wrote:
> On Thu, Aug 21, 2025 at 08:54:52AM +0800, Ethan Zhao wrote:
>> On 8/21/2025 1:26 AM, Brian Norris wrote:
>>> From: Brian Norris <briannorris@...gle.com>
>>>
>>> max_link_speed, max_link_width, current_link_speed, current_link_width,
>>> secondary_bus_number, and subordinate_bus_number all access config
>>> registers, but they don't check the runtime PM state. If the device is
>>> in D3cold, we may see -EINVAL or even bogus values.
>> My understanding, if your device is in D3cold, returning of -EINVAL is
>> the right behavior.
>
> That's not the guaranteed result though. Some hosts don't properly
> return PCIBIOS_DEVICE_NOT_FOUND, for one. But also, it's racy -- because
> we don't even try to hold a pm_runtime reference, the device could
> possibly enter D3cold while we're in the middle of reading from it. If
> you're lucky, that'll get you a completion timeout and an all-1's
> result, and we'll return a garbage result.
>
> So if we want to purposely not resume the device and retain "I can't
> give you what you asked for" behavior, we'd at least need a
> pm_runtime_get_noresume() or similar.
I understand you just want the stable result of these caps, meanwhile
you don't want the side effect either.>
>>> Wrap these access in pci_config_pm_runtime_{get,put}() like most of the
>>> rest of the similar sysfs attributes.
>>>
>>> Fixes: 56c1af4606f0 ("PCI: Add sysfs max_link_speed/width, current_link_speed/width, etc")
>>> Cc: stable@...r.kernel.org
>>> Signed-off-by: Brian Norris <briannorris@...gle.com>
>>> Signed-off-by: Brian Norris <briannorris@...omium.org>
>>> ---
>>>
>>> drivers/pci/pci-sysfs.c | 32 +++++++++++++++++++++++++++++---
>>> 1 file changed, 29 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
>>> index 5eea14c1f7f5..160df897dc5e 100644
>>> --- a/drivers/pci/pci-sysfs.c
>>> +++ b/drivers/pci/pci-sysfs.c
>>> @@ -191,9 +191,16 @@ static ssize_t max_link_speed_show(struct device *dev,
>>> struct device_attribute *attr, char *buf)
>>> {
>>> struct pci_dev *pdev = to_pci_dev(dev);
>>> + ssize_t ret;
>>> +
>>> + pci_config_pm_runtime_get(pdev);
>> This function would potentially change the power state of device,
>> that would be a complex process, beyond the meaning of
>> max_link_speed_show(), given the semantics of these functions (
>> max_link_speed_show()/max_link_width_show()/current_link_speed_show()/
>> ....),
>> this cannot be done !
>
> What makes this different than the 'config' attribute (i.e., "read
> config register")? Why shouldn't that just return -EINVAL? I don't
> really buy your reasoning -- "it's a complex process" is not a reason
It is a reason to know there is side effect to be taken into account.>
not to do something. The user asked for the link speed; why not give it?
> If the user wanted to know if the device was powered, they could check
> the 'power_state' attribute instead.
>
> (Side note: these attributes don't show up anywhere in Documentation/,
> so it's also a bit hard to declare "best" semantics for them.)
>
> To flip this question around a bit: if I have a system that aggressively
> suspends devices when there's no recent activity, how am I supposed to
> check what the link speed is? Probabilistically hammer the file while
> hoping some other activity wakes the device, so I can find the small
> windows of time where it's RPM_ACTIVE? Disable runtime_pm for the device
> while I check?
Hold a PM reference by pci_config_pm_runtime_get() and then write some
data to the PCIe config space, no objection.
To know about the linkspeed etc capabilities/not status, how about
creating a cached version of these caps, no need to change their
power state.
If there is aggressive power saving requirement, and the polling
of these caps will make up wakeup/poweron bugs.
Thanks,
Ethan
>
> Brian
Powered by blists - more mailing lists