[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4a973fe7-e801-49cc-88b8-77d3d0ba3673@amd.com>
Date:   Tue, 5 Sep 2023 17:16:46 -0500
From:   Mario Limonciello <mario.limonciello@....com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     hdegoede@...hat.com, bhelgaas@...gle.com, rafael@...nel.org,
        Shyam-sundar.S-k@....com, linux-kernel@...r.kernel.org,
        linux-acpi@...r.kernel.org, platform-driver-x86@...r.kernel.org,
        Iain Lane <iain@...ngesquash.org.uk>
Subject: Re: [PATCH v16 3/3] platform/x86/amd: pmc: Don't let PCIe root ports
 go into D3
On 9/5/2023 15:51, Bjorn Helgaas wrote:
> [+cc Hans]
> 
> On Tue, Aug 29, 2023 at 12:12:12PM -0500, Mario Limonciello wrote:
>> commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>> changed pci_bridge_d3_possible() so that any vendor's PCIe ports
>> from modern machines (>=2015) are allowed to be put into D3.
>>
>> Iain reports that USB devices can't be used to wake a Lenovo Z13
>> from suspend. This is because the PCIe root port has been put
>> into D3 and AMD's platform can't handle USB devices waking from
>> a hardware sleep state in this case.
> 
> Can you be specific in the subject and commit log about whether "D3"
> refers to "D3hot", "D3cold", or both?  It's probably obvious to PM
> folks, but it's always a stumbling block for me.
> 
> I assume "can't handle USB devices waking" does not refer to a problem
> with the USB adapter and whatever mechanism it uses to send a wakeup
> event to request that power be turned on, but rather it means that the
> wakeup event doesn't get propagated through the Root Port?
> 
> Is this actually specific to USB devices?  Or could a NIC below the
> Root Port suffer the same problem when a wake-on-lan packet causes it
> to send a wakeup event?  It seems like we've had this conversation
> before; sorry to ask the same questions again.
> 
> If it's not specific to USB, I would say something like "when the Root
> Port is in D3cold, wakeup events from devices below it are lost" (or
> whatever the actual problem is).
The problem is specific to the root port in D3hot over s2idle after the 
hardware has entered the deepest state.
It's fine any other time.
This particular root port only connects to the XHCI controllers and USB4 
controllers, so I can't confirm whether anything else is affected.
> 
>> This problem only occurs on Linux, and only when the AMD PMC driver
>> is utilized to put the device into a hardware sleep state.
> 
> Is the AMD PMC driver doing something magic that can't be done via
> other power management paths?  That's what "only when the AMD PMC
> driver is utilized" suggests.  But if the problem occurs when the Root
> Port is put into D3cold via *any* means, just say that.
> 
> And if you can say a specific PCI power state instead of "hardware
> sleep state", that would be good, too.
Yes; the AMD PMC driver does a notification to the platform that the OS 
is ready for it to go into a hardware sleep state [1].
If the AMD PMC driver isn't used, the platform is not notified that the 
OS is ready for it to go into hardware sleep state, and this issue will 
not occur.
So the PCI root port being in D3 while the hardware is in a sleep state 
is very accurate.
[1] 
https://github.com/torvalds/linux/blob/v6.5/drivers/platform/x86/amd/pmc.c#L816
> 
>> Comparing
>> the behavior on Windows and Linux, Windows doesn't put the root ports
>> into D3.
>>
>> A variety of approaches were discussed to change PCI core to handle this
>> case generically but no consensus was reached. To limit the scope of
>> effect only to the affected machines introduce a workaround into the
>> amd-pmc driver to only apply to the PCI root ports in affected machines
>> when going into hardware sleep.
> 
>> +/* only allow PCIe root ports with a LPS0 constraint configured to go to D3 */
>> +static int amd_pmc_rp_wa(struct amd_pmc_dev *pdev)
>> +{
>> +	struct pci_dev *pci_dev = NULL;
>> +
>> +	while ((pci_dev = pci_get_device(PCI_VENDOR_ID_AMD, PCI_ANY_ID, pci_dev))) {
> 
> I hate to add more uses of pci_get_device() because it doesn't account
> for hot-added devices.  Maybe there's no need to support hot-add of
> AMD Root Ports, but that's not obvious to readers here.
This function is only called during suspend, so it should cover hot 
added / hot removed devices.
If this ends up staying for v17 as is I'll add more verbose comments.
> 
> One mechanism to avoid pci_get_device() is to use quirks, although it
> might be hard to deal with PCI/ACPI ordering issues
I did quirks in an earlier version of this series, but you had feedback 
that the solution isn't scalable.  That's why it's morphed into this 
approach, which I'd like to think is more scalable as it looks at the 
constraints advertised by the platform in an AMD specific driver.
> 
>> +		struct acpi_device *adev;
>> +		int constraint;
>> +
>> +		if (!pci_is_pcie(pci_dev) ||
>> +		    !(pci_pcie_type(pci_dev) == PCI_EXP_TYPE_ROOT_PORT))
>> +			continue;
>> +
>> +		if (pci_dev->current_state == PCI_D3hot ||
>> +		    pci_dev->current_state == PCI_D3cold)
>> +			continue;
> 
> If we're trying to determine a property of the device, why does the
> current power state make a difference?
Hans left feedback in v15 that if the device was already in D3 at the 
time of this function it wouldn't work properly.  So I excluded those 
devices.
> 
> It looks like this loop runs every time we suspend (from
> amd_pmc_suspend_handler()), even though this is something we should
> know at boot-time, so we only need it once.
It's was because pci_bridge_d3_update() can be called and change it 
again in other places.
I think if we want to optimize it to only run a single time we need a 
new variable or bit in the pci_dev structure that can be used to mark 
such an exclusion which pci_bridge_d3_update() could take into account.
This could fit in well with Hans' idea of drivers could register a 
callback to "veto" D3 support.  It could be something like 
pci_bridge_d3_update() is called whenever a new driver 
registers/unregisters the callback.  It might also fit in well with your 
previous comments about how you want to separate "spec compliant" things 
and "quirk" things in pci_bridge_d3_possible().
Could you comment on that?  He suggested it in the cover letter responses.
> 
>> +		adev = ACPI_COMPANION(&pci_dev->dev);
>> +		if (!adev)
>> +			continue;
>> +
>> +		constraint = acpi_get_lps0_constraint(adev);
>> +		if (constraint != ACPI_STATE_UNKNOWN &&
>> +		    constraint >= ACPI_STATE_S3)
>> +			continue;
>> +
>> +		if (pci_dev->bridge_d3 == 0)
>> +			continue;
>> +		pci_dev->bridge_d3 = 0;
>> +		dev_info(&pci_dev->dev, "Disabling D3 on PCIe root port due lack of constraint\n");
> 
> D3hot?  D3cold?  Both?  "lack of constraint"?
It's disabling both, which is why I left it as D3 to cover both.  The 
lack of constraint can't be explained in a single line message.  If this 
is too noisy for a user and you think would cause more questions than 
help I'm fine to downgrade it to debug.
> 
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>   static int amd_pmc_verify_czn_rtc(struct amd_pmc_dev *pdev, u32 *arg)
>>   {
>>   	struct rtc_device *rtc_device;
>> @@ -893,6 +928,10 @@ static int amd_pmc_suspend_handler(struct device *dev)
>>   	case AMD_CPU_ID_CZN:
>>   		rc = amd_pmc_czn_wa_irq1(pdev);
>>   		break;
>> +	case AMD_CPU_ID_YC:
>> +	case AMD_CPU_ID_PS:
>> +		rc = amd_pmc_rp_wa(pdev);
>> +		break;
>>   	default:
>>   		break;
>>   	}
>> -- 
>> 2.34.1
>>
Powered by blists - more mailing lists
 
