lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <487786e702b8418b43c7f799c58d1948efd3d85c.camel@intel.com>
Date:   Tue, 29 Jan 2019 07:05:09 +0000
From:   "Grumbach, Emmanuel" <emmanuel.grumbach@...el.com>
To:     "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>
CC:     "linux-wireless@...r.kernel.org" <linux-wireless@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "bjorn@...gaas.com" <bjorn@...gaas.com>,
        "russianneuromancer@...ru" <russianneuromancer@...ru>,
        "david.ward@...mit.edu" <david.ward@...mit.edu>,
        "Shaish, Dor" <dor.shaish@...el.com>,
        "Wysocki, Rafael J" <rafael.j.wysocki@...el.com>
Subject: PCI LTR - ASPM handling upon suspend / resume cycle. Regression
 since 4.18

Hi,

Lately we (Intel) have got a few bugs on suspend / resume. The
complaint is that our device becomes unavailable after suspend / resume
cycle. The bug on which we have most data is [1].

The original submitter reported a regression since commit
9ab105deb60fa76d66cae5548819b4e8703d2056:

   PCI/ASPM: Disable ASPM L1.2 Substate if we don't have LTR

   When in the ASPM L1.0 state (but not the PCI-PM L1.0 state), the
   most
   recent LTR value and the LTR_L1.2_THRESHOLD determines whether the
   link
   enters the L1.2 substate.

   If we don't have LTR enabled, prevent the use of ASPM L1.2.

   PCI-PM L1.2 may still be used because it doesn't depend on
   LTR_L1.2_THRESHOLD (see PCIe r4.0, sec 5.5.1).


After this commit, L1.2 is disabled upon resume:
	L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+
	
	   T_CommonMode=0us LTR1.2_Threshold=163840ns

Whereas it wasn't before this commit:
	L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
		   T_CommonMode=0us LTR1.2_Threshold=163840ns

I am copying here an initial analysis by Bjorn (from [2]):

1) Linux has no support for saving/restoring the Max Latency values in
the LTR Capability.  This results in the latencies being zero after you
resume, as you see in the lspci output.  The device still *works* after
resume, but power consumption should increase because the device is
effectively requesting the best possible service, so we probably don't
use the L1.2 state at all.

2) Linux has no support for programming the Max Latency values for hot-
added devices.  When using ACPI hotplug, firmware may do this, but for
native PCIe hotplug (pciehp), the new device should again be requesting
the best possible service, resulting in more power consumption than
necessary.  The platform is supposed to supply a _DSM method with
information required to program these values


Another user found another commit impacting his device after suspend /
resume:
commit 6f9db69ad93cd6ab77d5571cf748ff7cdcfb0285

    ACPI / PM: Default to s2idle in all machines supporting LP S0
    
    The Dell Venue Pro 7140 supports the Low Power S0 Idle state, but
    does not support any of the _DSM functions that the current
heuristic
    checks for.
    
    Since suspend-to-mem can not be safely performed on this machine,
    and since the bitfield check can't cover this case, it is safer
    to enable s2idle by default by checking for the presence of the
    _DSM alone and removing the bitfield check.

This user confirmed that using suspend-to-mem instead of suspend-to-
idle works for him.

A user contacted my privately to let me know that he has issues with
devices from other vendors although I can't tell if the problem is the
same or not.

Note that this problem started from kernel 4.18.

Thank you.

[1] - https://bugzilla.kernel.org/show_bug.cgi?id=201469
[2] - https://bugzilla.kernel.org/show_bug.cgi?id=201469#c26

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ