lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAd53p4W3Amee9dJN0usG=spHfg=s1KZM3cdJ_rJjCgDhEymAw@mail.gmail.com>
Date:   Fri, 11 Aug 2023 16:00:00 +0800
From:   Kai-Heng Feng <kai.heng.feng@...onical.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     sathyanarayanan.kuppuswamy@...ux.intel.com,
        linux-pci@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Mahesh J Salgaonkar <mahesh@...ux.ibm.com>,
        linux-kernel@...r.kernel.org, koba.ko@...onical.com,
        "Oliver O'Halloran" <oohall@...il.com>, bhelgaas@...gle.com,
        mika.westerberg@...ux.intel.com
Subject: Re: [PATCH v6 2/3] PCI/AER: Disable AER interrupt on suspend

On Thu, Aug 10, 2023 at 6:51 PM Bjorn Helgaas <helgaas@...nel.org> wrote:
>
> On Thu, Aug 10, 2023 at 04:17:21PM +0800, Kai-Heng Feng wrote:
> > On Thu, Aug 10, 2023 at 2:52 AM Bjorn Helgaas <helgaas@...nel.org> wrote:
> > > On Fri, Jul 21, 2023 at 11:58:24AM +0800, Kai-Heng Feng wrote:
> > > > On Tue, Jul 18, 2023 at 7:17 PM Bjorn Helgaas <helgaas@...nel.org> wrote:
> > > > > On Fri, May 12, 2023 at 08:00:13AM +0800, Kai-Heng Feng wrote:
> > > > > > PCIe services that share an IRQ with PME, such as AER or DPC,
> > > > > > may cause a spurious wakeup on system suspend. To prevent this,
> > > > > > disable the AER interrupt notification during the system suspend
> > > > > > process.
> > > > >
> > > > > I see that in this particular BZ dmesg log, PME, AER, and DPC do share
> > > > > the same IRQ, but I don't think this is true in general.
> > > > >
> > > > > Root Ports usually use MSI or MSI-X.  PME and hotplug events use the
> > > > > Interrupt Message Number in the PCIe Capability, but AER uses the one
> > > > > in the AER Root Error Status register, and DPC uses the one in the DPC
> > > > > Capability register.  Those potentially correspond to three distinct
> > > > > MSI/MSI-X vectors.
> > > > >
> > > > > I think this probably has nothing to do with the IRQ being *shared*,
> > > > > but just that putting the downstream component into D3cold, where the
> > > > > link state is L3, may cause the upstream component to log and signal a
> > > > > link-related error as the link goes completely down.
> > > >
> > > > That's quite likely a better explanation than my wording.
> > > > Assuming AER IRQ and PME IRQ are not shared, does system get woken up
> > > > by AER IRQ?
> > >
> > > Rafael could answer this better than I can, but
> > > Documentation/power/suspend-and-interrupts.rst says device interrupts
> > > are generally disabled during suspend after the "late" phase of
> > > suspending devices, i.e.,
> > >
> > >   dpm_suspend_noirq
> > >     suspend_device_irqs           <-- disable non-wakeup IRQs
> > >     dpm_noirq_suspend_devices
> > >       ...
> > >         pci_pm_suspend_noirq      # (I assume)
> > >           pci_prepare_to_sleep
> > >
> > > I think the downstream component would be put in D3cold by
> > > pci_prepare_to_sleep(), so non-wakeup interrupts should be disabled by
> > > then.
> > >
> > > I assume PME would generally *not* be disabled since it's needed for
> > > wakeup, so I think any interrupt that shares the PME IRQ and occurs
> > > during suspend may cause a spurious wakeup.
> >
> > Yes, that's the case here.
> >
> > > If so, it's exactly as you said at the beginning: AER/DPC/etc sharing
> > > the PME IRQ may cause spurious wakeups, and we would have to disable
> > > those other interrupts at the source, e.g., by clearing
> > > PCI_ERR_ROOT_CMD_FATAL_EN etc (exactly as your series does).
> >
> > So is the series good to be merged now?
>
> If we merge as-is, won't we disable AER & DPC interrupts unnecessarily
> in the case where the link goes to D3hot?  In that case, there's no
> reason to expect interrupts related to the link going down, but things
> like PTM messages still work, and they may cause errors that we should
> know about.

Because the issue can be observed on D3hot as well [0].
The root port device [0] is power managed by ACPI, so I wonder if it's
reasonable to disable AER & DPC for devices that power managed by
firmware?

[0] https://bugzilla.kernel.org/show_bug.cgi?id=216295#c3

Kai-Heng

>
> > > > > I don't think D0-D3hot should be relevant here because in all those
> > > > > states, the link should be active because the downstream config space
> > > > > remains accessible.  So I'm not sure if it's possible, but I wonder if
> > > > > there's a more targeted place we could do this, e.g., in the path that
> > > > > puts downstream devices in D3cold.
> > > >
> > > > Let me try to work on this.
> > > >
> > > > Kai-Heng
> > > >
> > > > >
> > > > > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power Management",
> > > > > > TLP and DLLP transmission are disabled for a Link in L2/L3 Ready (D3hot), L2
> > > > > > (D3cold with aux power) and L3 (D3cold) states. So disabling the AER
> > > > > > notification during suspend and re-enabling them during the resume process
> > > > > > should not affect the basic functionality.
> > > > > >
> > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295
> > > > > > Reviewed-by: Mika Westerberg <mika.westerberg@...ux.intel.com>
> > > > > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@...onical.com>
> > > > > > ---
> > > > > > v6:
> > > > > > v5:
> > > > > >  - Wording.
> > > > > >
> > > > > > v4:
> > > > > > v3:
> > > > > >  - No change.
> > > > > >
> > > > > > v2:
> > > > > >  - Only disable AER IRQ.
> > > > > >  - No more check on PME IRQ#.
> > > > > >  - Use helper.
> > > > > >
> > > > > >  drivers/pci/pcie/aer.c | 22 ++++++++++++++++++++++
> > > > > >  1 file changed, 22 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > > > > index 1420e1f27105..9c07fdbeb52d 100644
> > > > > > --- a/drivers/pci/pcie/aer.c
> > > > > > +++ b/drivers/pci/pcie/aer.c
> > > > > > @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev)
> > > > > >       return 0;
> > > > > >  }
> > > > > >
> > > > > > +static int aer_suspend(struct pcie_device *dev)
> > > > > > +{
> > > > > > +     struct aer_rpc *rpc = get_service_data(dev);
> > > > > > +     struct pci_dev *pdev = rpc->rpd;
> > > > > > +
> > > > > > +     aer_disable_irq(pdev);
> > > > > > +
> > > > > > +     return 0;
> > > > > > +}
> > > > > > +
> > > > > > +static int aer_resume(struct pcie_device *dev)
> > > > > > +{
> > > > > > +     struct aer_rpc *rpc = get_service_data(dev);
> > > > > > +     struct pci_dev *pdev = rpc->rpd;
> > > > > > +
> > > > > > +     aer_enable_irq(pdev);
> > > > > > +
> > > > > > +     return 0;
> > > > > > +}
> > > > > > +
> > > > > >  /**
> > > > > >   * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
> > > > > >   * @dev: pointer to Root Port, RCEC, or RCiEP
> > > > > > @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = {
> > > > > >       .service        = PCIE_PORT_SERVICE_AER,
> > > > > >
> > > > > >       .probe          = aer_probe,
> > > > > > +     .suspend        = aer_suspend,
> > > > > > +     .resume         = aer_resume,
> > > > > >       .remove         = aer_remove,
> > > > > >  };
> > > > > >
> > > > > > --
> > > > > > 2.34.1
> > > > > >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ