lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BL0PR11MB34577E415B692A2B1CB000ACFEE69@BL0PR11MB3457.namprd11.prod.outlook.com>
Date:   Fri, 16 Dec 2022 16:29:39 +0000
From:   "Lee, Ron" <ron.lee@...el.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
CC:     "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "lmajczak@...gle.com" <lmajczak@...gle.com>,
        "Jain, Rajat" <rajatja@...gle.com>,
        Ron Lee <ron.lee.intel@...il.com>
Subject: RE: [PATCH v2] PCI: Fix up L1SS capability for Intel Apollolake PCIe
 bridge

> On Thu, Dec 15, 2022 at 05:13:57PM +0800, Ron Lee wrote:
> > On Google Coral and Reef family chromebooks, the PCIe bridge lost its
> > L1 PM Substates capability after resumed from D3cold, and identify
> > that the pointer to the this capability and capapability header are
> > missing from the capability list.
> 
> s/chromebooks/Chromebooks/
> s/to the this/this/
> s/capapability/capability/
I will submit patch v3 to correct these errors.
> 
> This should say what problem we're solving.  I assume some devices used L1 PM
> Substates before suspend, but after resume they do not, so the user-visible
> effect is that battery life is worse after resume.
This bug has existed since these series of Chromebooks was shipping, it seems no harm for system execution, 
and we didn't identified battery life drop after resume. However we still expect this issue could be solved and 
follow spec criteria as per PCIe spec rev6.0, section 5.5.4 L1 PM Substates Configuration

    An L1 PM Substate enable bit must only be Set in the Upstream and Downstream Ports on a Link when the
    corresponding supported capability bit is Set by both the Upstream and Downstream Ports on that Link, otherwise the
    behavior is undefined

> 
> > Capabilities: [150 v0] Null
> > Capabilities: [200 v1] L1 PM Substates
> >         L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ ...
> >                   PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
> >         L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >                    T_CommonMode=40us LTR1.2_Threshold=98304ns
> >         L1SubCtl2: T_PwrOn=60us
> 
> I'm not sure what this snippet is telling me.  Based on the patch, I guess before
> suspend, lspci would show:
> 
>   Capabilities: [150 v0] Null
>   Capabilities: [200 v1] L1 PM Substates
>   Capabilities: [220] <some other valid capability?>
> 
> but after resume, you see only:
> 
>   Capabilities: [150 v0] Null
> 
> Right?
Yes, but there is a difference in this case.
Before suspend:
        ....
        Capabilities: [140 v1] Access Control Services
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [150 v0] Null
        Capabilities: [200 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
                L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
                           T_CommonMode=40us LTR1.2_Threshold=98304ns
                L1SubCtl2: T_PwrOn=60us
        Kernel driver in use: pcieport

After resume:
        ....
        Capabilities: [140 v1] Access Control Services
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Kernel driver in use: pcieport

The following merged commit can save/restore the L1SubCap/L1SubCtl1/L1SubCtl2 registers for this bridge, 
However this bridge not only lost its capability contents but also lost the link to this capability.

    PCI/ASPM: Save/restore L1SS Capability for suspend/resume
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/pci/pcie/aspm.c?h=v6.1&id=4257f7e008ea394fcecc050f1569c3503b8bcc15

> 
> > This patch fix up the header and the pointer to the L1SS capability
> > after resuming from D3Cold.
> 
> The main problem here is that this patch covers up an issue without saying what
> the root cause is.  Presumably this is a firmware issue.
> Has that been identified?  Has it been fixed for future firmware releases?
This issue could be and should be fixed by BIOS, however the manufacturers have no resource for firmware validation and it's risky for firmware update per their assessment.
> 
> s/D3Cold/D3cold/ to match above.
> 
> Is there a bug report for this issue?  Include the URL here.
> 
> Is there a bug report for the firmware?
> 
There is a Google's internal issue tracker for this bug, seems not available for public.
Actually this bug had a discussion on this thread, and Lukasz Majczak identified this issue on Apollo Lake platform.
https://patchwork.kernel.org/project/linux-pci/patch/20220705060014.10050-1-vidyas@nvidia.com/

> > Signed-off-by: Ron Lee <ron.lee@...el.com>
> > Reported-by: kernel test robot <lkp@...el.com>
> > ---
> 
> Nits:
> 
>   - Use "Apollo Lake" to match Intel usage.
> 
>   - Below the "---" line, mention what changed between v1 and v2 (I
>     see that you added the "#ifdef CONFIG_PCIEASPM", but you should
>     save readers the effort of figuring that out).
> 
>   - For work-in-progress, the "Reported-by: kernel test robot" is
>     pointless and I will remove it.  This quirk is not fixing a bug
>     reported by the robot.
> 
> >  drivers/pci/quirks.c | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> > 285acc4aaccc..fc959be17a9d 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -5992,3 +5992,20 @@
> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL,
> > 0x9a2d, dpc_log_size);  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL,
> > 0x9a2f, dpc_log_size);  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL,
> > 0x9a31, dpc_log_size);  #endif
> > +
> > +#ifdef CONFIG_PCIEASPM
> > +static void chromeos_fixup_apl_bridge_l1ss_capability(struct pci_dev
> > +*pdev) {
> > +	if (!dmi_match(DMI_SYS_VENDOR, "Google") ||
> > +		(!dmi_match(DMI_PRODUCT_FAMILY, "Google_Coral") &&
> > +		 !dmi_match(DMI_PRODUCT_FAMILY, "Google_Reef")))
> > +		return;
> > +
> > +	pci_info(pdev, "Fix up L1SS Capability\n");
> > +	/* Fix up the L1SS Capability Header*/
> > +	pci_write_config_dword(pdev, pdev->l1ss, (0x220 << 20) | (1 << 16) |
> > +(PCI_EXT_CAP_ID_L1SS));
> 
> This looks like it adds a link to another capability at offset 0x220.
> What is that, and how do we know this is safe?
The following is the dump of this bridge config before suspend, the L1SS capability is at offset 0x200 and 
it point to offset 0x220 which is a null capability. This patch just add it to keep consistent during suspend/resume.
...
150: 00 00 00 20 00 04 00 00 00 00 00 00 00 00 00 00
160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
200: 1e 00 01 22 1f 28 28 00 0f 28 03 60 f0 00 00 00
210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
220: 00 00 00 00 00 00 00 00 00 00 00 00 7f 7f 7f 7f...

> 
> These registers are read-only per spec (PCIe r6.0, sec 7.8.3.1), but I guess you
> have device-specific knowledge that they are writable?
> 
> > +	/* Fix up the pointer to L1SS Capability*/
> > +	pci_write_config_dword(pdev, 0x150, pdev->l1ss << 20); }
> > +DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x5ad6,
> > +chromeos_fixup_apl_bridge_l1ss_capability);
> > +#endif
> >
> > base-commit: e2ca6ba6ba0152361aa4fcbf6067db71b2c7a770
> > --
> > 2.17.1
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ