lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <12857e01-f6cc-4489-935b-7e6c354706e9@amd.com>
Date: Fri, 13 Sep 2024 15:33:33 -0500
From: Mario Limonciello <mario.limonciello@....com>
To: Mika Westerberg <mika.westerberg@...ux.intel.com>,
 Kai-Heng Feng <kaihengfeng@...il.com>
Cc: Bjorn Helgaas <helgaas@...nel.org>, bhelgaas@...gle.com,
 linux-pci@...r.kernel.org, linux-pm@...r.kernel.org,
 linux-kernel@...r.kernel.org, "Rafael J. Wysocki" <rjw@...ysocki.net>
Subject: Re: [PATCH] PCI/PM: Put devices to low power state on shutdown

On 9/13/2024 03:01, Mika Westerberg wrote:
> Hi,
> 
> On Fri, Sep 13, 2024 at 02:00:58PM +0800, Kai-Heng Feng wrote:
>> On Fri, Sep 13, 2024 at 12:57 AM Bjorn Helgaas <helgaas@...nel.org> wrote:
>>>
>>> [+cc Rafael]
>>>
>>> On Thu, Sep 12, 2024 at 11:00:43AM +0800, Kai-Heng Feng wrote:
>>>> On Thu, Sep 12, 2024 at 3:05 AM Bjorn Helgaas <helgaas@...nel.org> wrote:
>>>>> On Fri, Jul 12, 2024 at 02:24:11PM +0800, Kai-Heng Feng wrote:
>>>>>> Some laptops wake up after poweroff when HP Thunderbolt Dock G4 is
>>>>>> connected.
>>>>>>
>>>>>> The following error message can be found during shutdown:
>>>>>> pcieport 0000:00:1d.0: AER: Correctable error message received from 0000:09:04.0
>>>>>> pcieport 0000:09:04.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
>>>>>> pcieport 0000:09:04.0:   device [8086:0b26] error status/mask=00000080/00002000
>>>>>> pcieport 0000:09:04.0:    [ 7] BadDLLP
>>>>>>
>>>>>> Calling aer_remove() during shutdown can quiesce the error message,
>>>>>> however the spurious wakeup still happens.
>>>>>>
>>>>>> The issue won't happen if the device is in D3 before system shutdown, so
>>>>>> putting device to low power state before shutdown to solve the issue.
>>>>>>
>>>>>> I don't have a sniffer so this is purely guesswork, however I believe
>>>>>> putting device to low power state it's the right thing to do.
>>>>>
>>>>> My objection here is that we don't have an explanation of why this
>>>>> should matter or a pointer to any spec language about this situation,
>>>>> so it feels a little bit random.
>>>>
>>>> I have the same feeling too. The PCIe spec doesn't specify what's the
>>>> correct power state for shutdown.
>>>> So we can only "logically" think the software should put devices to
>>>> low power state during shutdown.
>>>>
>>>>> I suppose the problem wouldn't happen if AER interrupts were disabled?
>>>>> We already do disable them in aer_suspend(), but maybe that's not used
>>>>> in the shutdown path?
>>>>
>>>> That was my first thought, so I modified pcie_port_shutdown_service()
>>>> to disable AER interrupt.
>>>> That approach didn't work though.
>>>>
>>>>> My understanding is that .shutdown() should turn off device interrupts
>>>>> and stop DMA.  So maybe we need an aer_shutdown() that disables
>>>>> interrupts?
>>>>
>>>> Logically we should do that. However that approach doesn't solve this issue.
>>>
>>> I'm not completely clear on the semantics of the .shutdown()
>>> interface.  The doc at
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/device/driver.h?id=v6.10#n73
>>> says "@shutdown: Called at shut-down time to quiesce the device"
>>>
>>> Turning off device interrupts and DMA *would* fit within the idea of
>>> quiescing the device.  Does that also include changing the device
>>> power state?  I dunno.  The power state isn't *mentioned* in the
>>> .shutdown() context, while it *is* mentioned for .suspend().
>>
>> IMO putting a device to low power also qualifies as "quiesce the device".
>>
>>>
>>> IIUC, this patch and commit log uses "shutdown" to refer to a
>>> system-wide *poweroff*, which is a different concept despite using the
>>> same "shutdown" name.
>>
>> For ACPI based system, there are .suspend for S3/s2idle, .poweroff for
>> S4, and .shutdown for S5.
>> Unless we want to introduce a new callback for S5, I think the concept
>> is quite similar.
>>
>> For DT based system, the OS should also perform the same thing, as
>> there's no firmware to cleanup the power state.
>>
>> We can also move .shutdown to be part of pm_ops, but I don't think
>> it's necessary,
>>
>>>
>>> So should the system poweroff procedure use .suspend()?  Should it use
>>> both .shutdown() and .suspend()?  I think it only uses .shutdown()
>>> today:
>>>
>>>    kernel_power_off
>>>      kernel_shutdown_prepare(SYSTEM_POWER_OFF)
>>>        device_shutdown
>>>          while (!list_empty(&devices_kset->list))
>>>            dev->bus->shutdown(dev)
>>>              pci_device_shutdown
>>>
>>> There are several driver .shutdown() methods that do things like this:
>>>
>>>    e1000_shutdown
>>>      if (system_state == SYSTEM_POWER_OFF)
>>>        pci_set_power_state(pdev, PCI_D3hot)
>>>
>>> Maybe that's the right thing and should be done by the PCI core, which
>>> is similar to what you propose here.  But I think it muddies the
>>> definition of .shutdown() a bit by mixing in power management stuff.
>>
>> Do you think adding a new "low power state" callback to be called
>> after .shutdown a good idea?
>> That would make the concept of .shutdown different to .suspend and
>> .poweroff. I personally see .suspend, .poweroff and .shutdown the same
>> action but target different power states.
> 
> I don't mean to confuse you guys but with this one too, I wonder if you
> tried to "disable" the device instead of putting it into D3? On another
> thread (Mario at least is aware of this) I mentioned that our PCIe SV
> folks identified a couple issues in Linux implementation around power
> management and one thing that we are missing is to disable I/O and MMIO
> upon entering D3.
> 
> I know this is about entering S5 (power off) but I wonder if simply
> disabling the device (I/O, MMIO and bus mastering) could stop it from
> waking up? 

To me, it's a two-fold problem.  The device consumes too much power, and 
the device issues interrupts when system is in S5.

Putting it in D3 should nip both, disabling the device might help the 
latter.

I did the same thing a vendor did for KH where I double checked the 
waveform at S5 and could see the devices still in D0.

Or do you think that by the device being in D0 but disabled should be 
enough for decreasing power?

> To my understanding this can be interpreted as quiesce too :)
> Something like the below patch (it also includes the runtime suspend
> path which should not matter here. This is the similar patch I shared in
> another thread).
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index f412ef73a6e4..79406814699d 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -514,11 +514,9 @@ static void pci_device_shutdown(struct device *dev)
>   	 * If this is a kexec reboot, turn off Bus Master bit on the
>   	 * device to tell it to not continue to do DMA. Don't touch
>   	 * devices in D3cold or unknown states.
> -	 * If it is not a kexec reboot, firmware will hit the PCI
> -	 * devices with big hammer and stop their DMA any way.
>   	 */
> -	if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
> -		pci_clear_master(pci_dev);
> +	if (pci_dev->current_state <= PCI_D3hot)
> +		pci_disable_device(pci_dev);
>   }
>   
>   #ifdef CONFIG_PM_SLEEP
> @@ -1332,6 +1330,7 @@ static int pci_pm_runtime_suspend(struct device *dev)
>   
>   	if (!pci_dev->state_saved) {
>   		pci_save_state(pci_dev);
> +		pci_pm_default_suspend(pci_dev);
>   		pci_finish_runtime_suspend(pci_dev);
>   	}
>   
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index ffaaca0978cb..91f4e7a03c94 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2218,6 +2218,13 @@ static void do_pci_disable_device(struct pci_dev *dev)
>   		pci_command &= ~PCI_COMMAND_MASTER;
>   		pci_write_config_word(dev, PCI_COMMAND, pci_command);
>   	}
> +	/*
> +	 * PCI PM 1.2 sec 8.2.2 says that when a function is put into D3
> +	 * the OS needs to disable I/O and MMIO space in addition to bus
> +	 * mastering so do that here.
> +	 */
> +	pci_command &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY);
> +	pci_write_config_word(dev, PCI_COMMAND, pci_command);
>   
>   	pcibios_disable_device(dev);
>   }


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ