lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAA85sZszpwU4MNKzb=W3MaK8YHwfuEmdXkc03TJkDtOD6a3L0A@mail.gmail.com>
Date:   Fri, 9 Oct 2020 16:34:30 +0200
From:   Ian Kumlien <ian.kumlien@...il.com>
To:     Kai-Heng Feng <kai.heng.feng@...onical.com>
Cc:     Bjorn Helgaas <helgaas@...nel.org>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        "Derrick, Jonathan" <jonathan.derrick@...el.com>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Rob Herring <robh@...nel.org>,
        linux-pci <linux-pci@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] PCI: vmd: Enable ASPM for mobile platforms

On Thu, Oct 8, 2020 at 6:19 AM Kai-Heng Feng
<kai.heng.feng@...onical.com> wrote:
>
>
>
> > On Oct 7, 2020, at 21:30, Bjorn Helgaas <helgaas@...nel.org> wrote:
> >
> > On Wed, Oct 07, 2020 at 12:26:19PM +0800, Kai-Heng Feng wrote:
> >>> On Oct 6, 2020, at 03:19, Bjorn Helgaas <helgaas@...nel.org> wrote:
> >>> On Tue, Oct 06, 2020 at 02:40:32AM +0800, Kai-Heng Feng wrote:
> >>>>> On Oct 3, 2020, at 06:18, Bjorn Helgaas <helgaas@...nel.org> wrote:
> >>>>> On Wed, Sep 30, 2020 at 04:24:54PM +0800, Kai-Heng Feng wrote:
> >
> > ...
> >>>> I wonder whether other devices that add PCIe domain have the same
> >>>> behavior?  Maybe it's not a special case at all...
> >>>
> >>> What other devices are these?
> >>
> >> Controllers which add PCIe domain.
> >
> > I was looking for specific examples, not just a restatement of what
> > you said before.  I'm just curious because there are a lot of
> > controllers I'm not familiar with, and I can't think of an example.
> >
> >>>> I understand the end goal is to keep consistency for the entire ASPM
> >>>> logic. However I can't think of any possible solution right now.
> >>>>
> >>>>> - If we built with CONFIG_PCIEASPM_POWERSAVE=y, would that solve the
> >>>>>  SoC power state problem?
> >>>>
> >>>> Yes.
> >>>>
> >>>>> - What issues would CONFIG_PCIEASPM_POWERSAVE=y introduce?
> >>>>
> >>>> This will break many systems, at least for the 1st Gen Ryzen
> >>>> desktops and laptops.
> >>>>
> >>>> All PCIe ASPM are not enabled by BIOS, and those systems immediately
> >>>> freeze once ASPM is enabled.
> >>>
> >>> That indicates a defect in the Linux ASPM code.  We should fix that.
> >>> It should be safe to use CONFIG_PCIEASPM_POWERSAVE=y on every system.
> >>
> >> On those systems ASPM are also not enabled on Windows. So I think
> >> ASPM are disabled for a reason.
> >
> > If the platform knows ASPM needs to be disabled, it should be using
> > ACPI_FADT_NO_ASPM or _OSC to prevent the OS from using it.  And if
> > CONFIG_PCIEASPM_POWERSAVE=y means Linux enables ASPM when it
> > shouldn't, that's a Linux bug that we need to fix.
>
> Yes that's a bug which fixed by Ian's new patch.
>
> >
> >>> Are there bug reports for these? The info we would need to start with
> >>> includes "lspci -vv" and dmesg log (with CONFIG_PCIEASPM_DEFAULT=y).
> >>> If a console log with CONFIG_PCIEASPM_POWERSAVE=y is available, that
> >>> might be interesting, too.  We'll likely need to add some
> >>> instrumentation and do some experimentation, but in principle, this
> >>> should be fixable.
> >>
> >> Doing this is asking users to use hardware settings that ODM/OEM
> >> never tested, and I think the risk is really high.
> >
> > What?  That's not what I said at all.  I'm asking for information
> > about these hangs so we can fix them.  I'm not suggesting that you
> > should switch to CONFIG_PCIEASPM_POWERSAVE=y for the distro.
>
> Ah, I thought your suggestion is switching to CONFIG_PCIEASPM_POWERSAVE=y, because I sense you want to use that to cover the VMD ASPM this patch tries to solve.
>
> Do we have a conclusion how to enable VMD ASPM with CONFIG_PCIEASPM_DEFAULT=y?
>
> >
> > Let's back up.  You said:
> >
> >  CONFIG_PCIEASPM_POWERSAVE=y ... will break many systems, at least
> >  for the 1st Gen Ryzen desktops and laptops.
> >
> >  All PCIe ASPM are not enabled by BIOS, and those systems immediately
> >  freeze once ASPM is enabled.
> >
> > These system hangs might be caused by (1) some hardware issue that
> > causes a hang when ASPM is enabled even if it is configured correctly
> > or (2) Linux configuring ASPM incorrectly.
>
> It's (2) here.
>
> >
> > For case (1), the platform should be using ACPI_FADT_NO_ASPM or _OSC
> > to prevent the OS from enabling ASPM.  Linux should pay attention to
> > that even when CONFIG_PCIEASPM_POWERSAVE=y.
> >
> > If the platform *should* use these mechanisms but doesn't, the
> > solution is a quirk, not the folklore that "we can't use
> > CONFIG_PCIEASPM_POWERSAVE=y because it breaks some systems."
>
> The platform in question doesn't prevent OS from enabling ASPM.
>
> >
> > For case (2), we should fix Linux so it configures ASPM correctly.
> >
> > We cannot use the build-time CONFIG_PCIEASPM settings to avoid these
> > hangs.  We need to fix the Linux run-time code so the system operates
> > correctly no matter what CONFIG_PCIEASPM setting is used.
> >
> > We have sysfs knobs to control ASPM (see 72ea91afbfb0 ("PCI/ASPM: Add
> > sysfs attributes for controlling ASPM link states")).  They can do the
> > same thing at run-time as CONFIG_PCIEASPM_POWERSAVE=y does at
> > build-time.  If those knobs cause hangs on 1st Gen Ryzen systems, we
> > need to fix that.
>
> Ian's patch solves the issue, at least for the systems I have.

Could you add:
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 15d64832a988..cd9f2101f9a2 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -482,7 +482,12 @@ static void pcie_aspm_check_latency(struct
pci_dev *endpoint)
                        latency = max_t(u32, link->latency_up.l1,
link->latency_dw.l1);
                        l1_max_latency = max_t(u32, latency, l1_max_latency);
                        if (l1_max_latency + l1_switch_latency > acceptable->l1)
+                       {
+                               pci_info(endpoint, "L1 latency
exceeded - path: %i - max: %i\n", l1_switch_latency, l1_max_latency);
+                               pci_info(link->pdev, "Upstream device
- %i\n", link->latency_up.l1);
+                               pci_info(link->downstream, "Downstream
device - %i\n", link->latency_dw.l1);
                                link->aspm_capable &= ~ASPM_STATE_L1;
+                       }

                        l1_switch_latency += 1000;
                }

So we can see what device triggers what links to be disabled?

I think your use-case is much more important than mine - mine fixes
something as a side effect

Also, please send me the lspci -vvv output as well as lspci -PP -s
<device id> for the device id:s mentioned in dmesg with the patch
above applied ;)

> Kai-Heng
>
> >
> > Bjorn
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ