lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dc3fce1400541a269ecee2b2ee33d8f1ec8e52e7.camel@linux.intel.com>
Date: Mon, 05 Feb 2024 11:37:16 -0800
From: "David E. Box" <david.e.box@...ux.intel.com>
To: Bjorn Helgaas <helgaas@...nel.org>, Jian-Hong Pan <jhp@...lessos.org>
Cc: Johan Hovold <johan@...nel.org>, Mika Westerberg
 <mika.westerberg@...ux.intel.com>, Damien Le Moal <dlemoal@...nel.org>, 
 Niklas Cassel <cassel@...nel.org>, Nirmal Patel
 <nirmal.patel@...ux.intel.com>, Jonathan Derrick
 <jonathan.derrick@...ux.dev>, linux-pci@...r.kernel.org, 
 linux-kernel@...r.kernel.org, linux@...lessos.org
Subject: Re: [PATCH v2] PCI: vmd: Enable PCI PM's L1 substates of remapped
 PCIe Root Port and NVMe

On Fri, 2024-02-02 at 18:05 -0600, Bjorn Helgaas wrote:
> On Fri, Feb 02, 2024 at 03:11:12PM +0800, Jian-Hong Pan wrote:
> > The remapped PCIe Root Port and NVMe have PCI PM L1 substates
> > capability, but they are disabled originally:
> > 
> > Here is an example on ASUS B1400CEAE:
> > 
> > Capabilities: [900 v1] L1 PM Substates
> >         L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > L1_PM_Substates+
> >                   PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> >         L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> >                    T_CommonMode=0us LTR1.2_Threshold=0ns
> >         L1SubCtl2: T_PwrOn=10us
> > 
> > Power on all of the VMD remapped PCI devices and quirk max snoop LTR
> > before enable PCI-PM L1 PM Substates by following "Section 5.5.4 of PCIe
> > Base Spec Revision 6.0". Then, PCI PM's L1 substates control are
> > initialized & enabled accordingly.
> 
> > Also, update the comments of
> > pci_enable_link_state() and pci_enable_link_state_locked() for
> > kernel-doc, too.
> 
> The aspm.c changes should be in a separate patch.  Presumably the
> aspm.c code change fixes a defect, and that defect can be described in
> that patch.  That fix may be needed for non-VMD situations, and having
> it in this vmd patch means it won't be as easy to find and backport.
> 
> Nit: rewrap commit log to fill 75 columns.
> 
> > @@ -751,11 +751,9 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev,
> > void *userdata)
> >  	if (!(features & VMD_FEAT_BIOS_PM_QUIRK))
> >  		return 0;
> >  
> > -	pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL);
> > -
> >  	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR);
> >  	if (!pos)
> > -		return 0;
> > +		goto out_enable_link_state;
> >  
> >  	/*
> >  	 * Skip if the max snoop LTR is non-zero, indicating BIOS has set
> > it
> > @@ -763,7 +761,7 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev,
> > void *userdata)
> >  	 */
> >  	pci_read_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, &ltr_reg);
> >  	if (!!(ltr_reg & (PCI_LTR_VALUE_MASK | PCI_LTR_SCALE_MASK)))
> > -		return 0;
> > +		goto out_enable_link_state;
> >  
> >  	/*
> >  	 * Set the default values to the maximum required by the platform
> > to
> > @@ -775,6 +773,14 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev,
> > void *userdata)
> >  	pci_write_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, ltr_reg);
> >  	pci_info(pdev, "VMD: Default LTR value set by driver\n");
> 
> You're not changing this part, and I don't understand exactly how LTR
> works, but it makes me a little bit queasy to read "set the LTR value
> to the maximum required to allow the deepest power management
> savings" and then we set the max snoop values to a fixed constant.
> 
> I don't think the goal is to "allow the deepest power savings"; I
> think it's to enable L1.2 *when the device has enough buffering to
> absorb L1.2 entry/exit latencies*.
> 
> The spec (PCIe r6.0, sec 7.8.2.2) says "Software should set this to
> the platform's maximum supported latency or less," so it seems like
> that value must be platform-dependent, not fixed.
> 
> And I assume the "_DSM for Latency Tolerance Reporting" is part of the
> way to get those platform-dependent values, but Linux doesn't actually
> use that yet.

This may indeed be the best way but we need to double check with our BIOS folks.
AFAIK BIOS writes the LTR values directly so there hasn't been a need to use
this _DSM. But under VMD the ports are hidden from BIOS which is why we added it
here. I've brought up the question internally to find out how Windows handles
the DSM and to get a recommendation from our firmware leads.

> 
> I assume that setting the max values incorrectly may lead to either
> being too conservative, so we don't use L1.2 when we could, or too
> aggressive, so we use L1.2 when we shouldn't, and the device loses
> data because it doesn't have enough internal buffering to absorb the
> entry/exit delays.
> 
> This paper has a lot of background and might help answer some of my
> questions:
> https://www.intel.co.za/content/dam/doc/white-paper/energy-efficient-platforms-white-paper.pdf
> 
> > +out_enable_link_state:
> > +	/*
> > +	 * Make PCI devices at D0 when enable PCI-PM L1 PM Substates from
> > +	 * Section 5.5.4 of PCIe Base Spec Revision 6.0
> > +	 */
> > +	pci_set_power_state_locked(pdev, PCI_D0);
> > +	pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL);
> 
> Hmmm.  PCIE_LINK_STATE_ALL includes ASPM L1.2.  We may do this even if
> the device doesn't have an LTR Capability.  ASPM L1.2 cannot work
> without LTR.
> 
> I only took a quick look but was not convinced that
> pci_enable_link_state() does the right thing when we enable
> PCIE_LINK_STATE_ALL (including ASPM L1.2) on a device that doesn't
> have LTR.  I think it *should* decline to set PCI_L1SS_CTL1_ASPM_L1_2,
> but I'm not sure it does.  Can you double check that path?  Maybe
> that's another defect in aspm.c.

It doesn't currently decline. The same scenario happens when the user selects
powersupersave. If a device advertises L1.2 with the capable registers set, it
should also have the LTR register present. But it doesn't hurt to check.

David

> 
> > @@ -1164,6 +1164,8 @@ static int __pci_enable_link_state(struct pci_dev
> > *pdev, int state, bool locked)
> >  		link->aspm_default |= ASPM_STATE_L1_1_PCIPM |
> > ASPM_STATE_L1;
> >  	if (state & PCIE_LINK_STATE_L1_2_PCIPM)
> >  		link->aspm_default |= ASPM_STATE_L1_2_PCIPM |
> > ASPM_STATE_L1;
> > +	if (state & ASPM_STATE_L1_2_MASK)
> > +		aspm_l1ss_init(link);
> >  	pcie_config_aspm_link(link, policy_to_aspm_state(link));
> >  
> >  	link->clkpm_default = (state & PCIE_LINK_STATE_CLKPM) ? 1 : 0;


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ