lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221216174820.GA209655@bhelgaas>
Date:   Fri, 16 Dec 2022 11:48:20 -0600
From:   Bjorn Helgaas <helgaas@...nel.org>
To:     "Lee, Ron" <ron.lee@...el.com>
Cc:     "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "lmajczak@...gle.com" <lmajczak@...gle.com>,
        "Jain, Rajat" <rajatja@...gle.com>,
        Ron Lee <ron.lee.intel@...il.com>
Subject: Re: [PATCH v2] PCI: Fix up L1SS capability for Intel Apollolake PCIe
 bridge

On Fri, Dec 16, 2022 at 04:29:39PM +0000, Lee, Ron wrote:
> > On Thu, Dec 15, 2022 at 05:13:57PM +0800, Ron Lee wrote:
> > > On Google Coral and Reef family chromebooks, the PCIe bridge lost its
> > > L1 PM Substates capability after resumed from D3cold, and identify
> > > that the pointer to the this capability and capapability header are
> > > missing from the capability list.

> > This should say what problem we're solving.  I assume some devices
> > used L1 PM Substates before suspend, but after resume they do not,
> > so the user-visible effect is that battery life is worse after
> > resume.
>
> This bug has existed since these series of Chromebooks was shipping,
> it seems no harm for system execution, and we didn't identified
> battery life drop after resume. However we still expect this issue
> could be solved and follow spec criteria as per PCIe spec rev6.0,
> section 5.5.4 L1 PM Substates Configuration
> 
>     An L1 PM Substate enable bit must only be Set in the Upstream
>     and Downstream Ports on a Link when the corresponding supported
>     capability bit is Set by both the Upstream and Downstream Ports
>     on that Link, otherwise the behavior is undefined

Even if you haven't seen a battery life issue, I suspect you might be
able to measure a power consumption difference if you looked for it
and likely could see issues with manual ASPM enable/disable using
sysfs.  That might be a legitimate reason for this quirk, and if it
is, we should mention it here.

> The following merged commit can save/restore the
> L1SubCap/L1SubCtl1/L1SubCtl2 registers for this bridge, However this
> bridge not only lost its capability contents but also lost the link
> to this capability.
> 
>     PCI/ASPM: Save/restore L1SS Capability for suspend/resume
>     https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/pci/pcie/aspm.c?h=v6.1&id=4257f7e008ea394fcecc050f1569c3503b8bcc15

The current version of that code:
  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/aspm.c?id=v6.1#n760
doesn't search for the L1SS capability; it uses dev->l1ss just like
your patch does.  So it should restore the capability even though the
linked list is broken.

> > > This patch fix up the header and the pointer to the L1SS
> > > capability after resuming from D3Cold.
> > 
> > The main problem here is that this patch covers up an issue
> > without saying what the root cause is.  Presumably this is a
> > firmware issue.  Has that been identified?  Has it been fixed for
> > future firmware releases?
>
> This issue could be and should be fixed by BIOS, however the
> manufacturers have no resource for firmware validation and it's
> risky for firmware update per their assessment.

This fix is risky, too, because it writes to random places in config
space and there's no guarantee that this is safe or even that the
capabilities are at those locations.

> > Is there a bug report for this issue?  Include the URL here.
> > 
> > Is there a bug report for the firmware?
> > 
> There is a Google's internal issue tracker for this bug, seems not
> available for public.

Maybe you can make a public report with any secret details removed?
A simple email would be enough.  I haven't seen the internal issue;
hopefully it has more details than are in this patch.

> Actually this bug had a discussion on this thread, and Lukasz
> Majczak identified this issue on Apollo Lake platform.
> https://patchwork.kernel.org/project/linux-pci/patch/20220705060014.10050-1-vidyas@nvidia.com/

That patch mentions Dell XPS 13, not a Chromebook, so your patch
wouldn't affect it.  Are you saying this issue is common across all
Apollo Lake platforms?  If so, maybe a fix should be more generic?

> > > +#ifdef CONFIG_PCIEASPM
> > > +static void chromeos_fixup_apl_bridge_l1ss_capability(struct pci_dev
> > > +*pdev) {
> > > +	if (!dmi_match(DMI_SYS_VENDOR, "Google") ||
> > > +		(!dmi_match(DMI_PRODUCT_FAMILY, "Google_Coral") &&
> > > +		 !dmi_match(DMI_PRODUCT_FAMILY, "Google_Reef")))
> > > +		return;
> > > +
> > > +	pci_info(pdev, "Fix up L1SS Capability\n");
> > > +	/* Fix up the L1SS Capability Header*/
> > > +	pci_write_config_dword(pdev, pdev->l1ss, (0x220 << 20) | (1 << 16) |
> > > +(PCI_EXT_CAP_ID_L1SS));
> > 
> > This looks like it adds a link to another capability at offset
> > 0x220.  What is that, and how do we know this is safe?
>
> The following is the dump of this bridge config before suspend, the
> L1SS capability is at offset 0x200 and it point to offset 0x220
> which is a null capability. This patch just add it to keep
> consistent during suspend/resume.
> ...

My point is that there is no PCI requirement for capabilities to be at
0x150 and 0x220.  The only defined way to find these capabilities is
to traverse the list starting at offset 0x100.

The L1SS capability at pdev->l1ss is reasonable since we found it by
traversing that list, but 0x150 and 0x220 are magic numbers with no
justification.  We have no reason to believe there are capabilities
there.

We might know this based on device-specific knowledge of the Root
Port.  If that's the case, please cite the Intel spec for device 5ad6
so we can tell this quirk can't be blindly applied to other Root
Ports.

> > > +	/* Fix up the pointer to L1SS Capability*/
> > > +	pci_write_config_dword(pdev, 0x150, pdev->l1ss << 20); }
> > > +DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x5ad6,
> > > +chromeos_fixup_apl_bridge_l1ss_capability);
> > > +#endif

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ