lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y6MfXltck34gSwU9@google.com>
Date:   Wed, 21 Dec 2022 14:59:42 +0000
From:   Matthias Kaehlcke <mka@...omium.org>
To:     Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
Cc:     Krishna chaitanya chundru <quic_krichai@...cinc.com>,
        helgaas@...nel.org, linux-pci@...r.kernel.org,
        linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org,
        quic_vbadigan@...cinc.com, quic_hemantk@...cinc.com,
        quic_nitegupt@...cinc.com, quic_skananth@...cinc.com,
        quic_ramkri@...cinc.com, swboyd@...omium.org,
        dmitry.baryshkov@...aro.org,
        Prasad Malisetty <quic_pmaliset@...cinc.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        "Saheed O. Bolarinwa" <refactormyself@...il.com>,
        Vidya Sagar <vidyas@...dia.com>,
        Krzysztof Wilczyński <kw@...ux.com>,
        Kai-Heng Feng <kai.heng.feng@...onical.com>
Subject: Re: [PATCH v7] PCI/ASPM: Update LTR threshold based upon reported
 max latencies

On Wed, Dec 21, 2022 at 11:19:53AM +0530, Manivannan Sadhasivam wrote:
> On Mon, Dec 05, 2022 at 06:18:36PM +0000, Matthias Kaehlcke wrote:
> > On Mon, Dec 05, 2022 at 04:55:00PM +0530, Manivannan Sadhasivam wrote:
> > > On Fri, Sep 16, 2022 at 01:38:37PM +0530, Krishna chaitanya chundru wrote:
> > > > In ASPM driver, LTR threshold scale and value are updated based on
> > > > tcommon_mode and t_poweron values. In Kioxia NVMe L1.2 is failing due to
> > > > LTR threshold scale and value are greater values than max snoop/non-snoop
> > > > value.
> > > > 
> > > > Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when
> > > > reported snoop/no-snoop values is greater than or equal to
> > > > LTR_L1.2_THRESHOLD value.
> > > > 
> > > > Signed-off-by: Prasad Malisetty  <quic_pmaliset@...cinc.com>
> > > > Signed-off-by: Krishna chaitanya chundru <quic_krichai@...cinc.com>
> > > > Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
> > > 
> > > I take my Ack back... Sorry that I did not look into this patch closer.
> > > 
> > > > ---
> > > > 
> > > > I am taking this patch forward as prasad is no more working with our org.
> > > > changes since v6:
> > > > 	- Rebasing with pci/next.
> > > > changes since v5:
> > > > 	- no changes, just reposting as standalone patch instead of reply to
> > > > 	  previous patch.
> > > > Changes since v4:
> > > > 	- Replaced conditional statements with min and max.
> > > > changes since v3:
> > > > 	- Changed the logic to include this condition "snoop/nosnoop
> > > > 	  latencies are not equal to zero and lower than LTR_L1.2_THRESHOLD"
> > > > Changes since v2:
> > > > 	- Replaced LTRME logic with max snoop/no-snoop latencies check.
> > > > Changes since v1:
> > > > 	- Added missing variable declaration in v1 patch
> > > > ---
> > > >  drivers/pci/pcie/aspm.c | 30 ++++++++++++++++++++++++++++++
> > > >  1 file changed, 30 insertions(+)
> > > > 
> > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > > index 928bf64..2bb8470 100644
> > > > --- a/drivers/pci/pcie/aspm.c
> > > > +++ b/drivers/pci/pcie/aspm.c
> > > > @@ -486,13 +486,35 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
> > > >  {
> > > >  	struct pci_dev *child = link->downstream, *parent = link->pdev;
> > > >  	u32 val1, val2, scale1, scale2;
> > > > +	u32 max_val, max_scale, max_snp_scale, max_snp_val, max_nsnp_scale, max_nsnp_val;
> > > >  	u32 t_common_mode, t_power_on, l1_2_threshold, scale, value;
> > > >  	u32 ctl1 = 0, ctl2 = 0;
> > > >  	u32 pctl1, pctl2, cctl1, cctl2;
> > > > +	u16 ltr;
> > > > +	u16 max_snoop_lat, max_nosnoop_lat;
> > > >  
> > > >  	if (!(link->aspm_support & ASPM_STATE_L1_2_MASK))
> > > >  		return;
> > > >  
> > > > +	ltr = pci_find_ext_capability(child, PCI_EXT_CAP_ID_LTR);
> > > > +	if (!ltr)
> > > > +		return;
> > > > +
> > > > +	pci_read_config_word(child, ltr + PCI_LTR_MAX_SNOOP_LAT, &max_snoop_lat);
> > > > +	pci_read_config_word(child, ltr + PCI_LTR_MAX_NOSNOOP_LAT, &max_nosnoop_lat);
> > > > +
> > > > +	max_snp_scale = (max_snoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
> > > > +	max_snp_val = max_snoop_lat & PCI_LTR_VALUE_MASK;
> > > > +
> > > > +	max_nsnp_scale = (max_nosnoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
> > > > +	max_nsnp_val = max_nosnoop_lat & PCI_LTR_VALUE_MASK;
> > > > +
> > > > +	/* choose the greater max scale value between snoop and no snoop value*/
> > > > +	max_scale = max(max_snp_scale, max_nsnp_scale);
> > > > +
> > > > +	/* choose the greater max value between snoop and no snoop scales */
> > > > +	max_val = max(max_snp_val, max_nsnp_val);
> > > > +
> > > >  	/* Choose the greater of the two Port Common_Mode_Restore_Times */
> > > >  	val1 = (parent_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
> > > >  	val2 = (child_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
> > > > @@ -525,6 +547,14 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
> > > >  	 */
> > > >  	l1_2_threshold = 2 + 4 + t_common_mode + t_power_on;
> > > >  	encode_l12_threshold(l1_2_threshold, &scale, &value);
> > > > +
> > > > +	/*
> > > > +	 * Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when reported
> > > > +	 * snoop/no-snoop values are greater than or equal to LTR_L1.2_THRESHOLD value.
> > > 
> > > Apart from the bug in calculating the LTR_Threshold as reported by Matthias
> > > and Bjorn, I'm wondering if we are covering up for the device firmware issue.
> > 
> > Yes, I think the patch is doing exactly that.
> > 
> > > As per section 6.18, if the device reports snoop/no-snoop scale/value as 0, then
> > > it implies that the device won't tolerate any additional delays from the host.
> > >
> > > In that case, how can we allow the link to go into L1.2 since that would incur
> > > high delay compared to L1.1?
> > 
> > I had the same doubt, a value of 0 doesn't make sense, if it literally means
> > 'max delay of 0ns'. I did some debugging around this issue. One thing I found
> > is that there are NVMe models that don't have issues with entering L1.2 with
> > max (no-)snoop latencies of 0. From that I infer that a value of 0 does not
> > literally mean a max delay of 0ns.
> > 
> 
> This is interesting.
> 
> > The PCIe spec doesn't say specifically what a value of 0 in those registers
> > means, but chapter "6.18 Latency Tolerance Reporting (LTR) Mechanism" of the
> > PCIe 4.0 base spec says something about the latency requirements in LTR
> > messages:
> > 
> >   Setting the value and scale fields to all 0’s indicates that the device will
> >   be impacted by any delay and that the best possible service is requested.
> > 
> > With that and the fact that several NVMe's don't have issues with all 0 values
> > I deduce that all 0's means 'best possible service' and not 'max latency of
> > 0ns'. It seems the Kioxia firmware has a bug which interprets all 0 values as
> > a max latency of 0ns.
> > 
> > Another finding is that the Kioxia NVMe can enter L1.2 if the max latencies
> > are set to values >= the LTR threshold. Unfortunately that isn't a viable
> > fix for existing devices in the field, devices under development could possibly
> > adjust the latencies in the BIOS (coreboot code [1] suggests that this is done
> > at least in some cases).
> > 
> 
> I fully agree that it is a firmware issue. And yes, we should refrain to fixes
> in the bootloader if possible.
> 
> Another option would be to add a quirk for specific devices in the ASPM code.
> But in that case, I'm not sure what would be the optimal snoop/no-snoop value
> that could be used.

I had/have the same doubt.

> There is another issue where if we have some other device on the same bus
> that explicitly requires 0ns latency.

Would that be reasonable requirement, i.e. can 0ns latency ever be achieved?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ