lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210127155023.GA2988674@bjorn-Precision-5520>
Date:   Wed, 27 Jan 2021 09:50:23 -0600
From:   Bjorn Helgaas <helgaas@...nel.org>
To:     "Kenneth R. Crudup" <kenny@...ix.com>
Cc:     Vidya Sagar <vidyas@...dia.com>, linux-pci@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: Commit 4257f7e0 ("PCI/ASPM: Save/restore L1SS Capability for
 suspend/resume") causing hibernate resume failures

On Fri, Jan 22, 2021 at 12:11:08PM -0800, Kenneth R. Crudup wrote:
> > > From: Kenneth R. Crudup <kenny@...ix.com>
> > > I've been running Linus' master branch on my laptop (Dell XPS 13
> > > 2-in-1).  With this commit in place, after resuming from hibernate
> > > my machine is essentially useless, with a torrent of disk I/O errors
> > > on my NVMe device (at least, and possibly other devices affected)
> > > until a reboot.
> > >
> > > I do use tlp to set the PCIe ASPM to "performance" on AC and
> > > "powersupersave" on battery.
> 
> On Sun, 27 Dec 2020, Bjorn Helgaas wrote:
> 
> > Thanks a lot for the report, and sorry for the breakage.
> > 4257f7e008ea restores PCI_L1SS_CTL1, then PCI_L1SS_CTL2.  I think it
> > should do those in the reverse order, since the Enable bits are in
> > PCI_L1SS_CTL1.  It also restores L1SS state (potentially enabling
> > L1.x) before we restore the PCIe Capability (potentially enabling ASPM
> > as a whole).  Those probably should also be in the other order.
> 
> Any new news on this? Disabling "tlp" (which just shifts the problem around
> on my machine) shouldn't be a solution for this issue.

Agreed; disabling "tlp" is a workaround but not a solution.

> I'd thought it may have been tied to some of the PM regressions of the last
> week of December, but all of those have been fixed but this still remains.

I haven't seen anything yet and haven't had a chance to look into it
more myself.

We're at v5.11-rc5 already, so I guess we'll have to think about
reverting 4257f7e008ea ("PCI/ASPM: Save/restore L1SS Capability for
suspend/resume") before v5.11-final unless we can make some progress.

That would mean ASPM L1 substate configuration would be lost by a
suspend/resume, so we'd give up some power saving.  But that's better
than the regression you're seeing.

I'll tentatively queue up a revert on for-linus pending progress on a
better fix.  For some reason I can't find your initial report of the
regression.  The first thing I can find is this:

https://lore.kernel.org/linux-pci/20201228040513.GA611645@bjorn-Precision-5520/

Do you have a URL for your initial report that I could include in the
revert commit log?

Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ