lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 23 Jan 2024 16:36:48 -0600
From: Bjorn Helgaas <helgaas@...nel.org>
To: Johan Hovold <johan@...nel.org>
Cc: Michael Schaller <michael@...aller.de>,
	Kai-Heng Feng <kai.heng.feng@...onical.com>,
	linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
	regressions@...ts.linux.dev,
	"Maciej W . Rozycki" <macro@...am.me.uk>,
	Ajay Agarwal <ajayagarwal@...gle.com>,
	Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Heiner Kallweit <hkallweit1@...il.com>,
	Johan Hovold <johan+linaro@...nel.org>,
	Bjorn Helgaas <bhelgaas@...gle.com>, stable@...r.kernel.org,
	regressions@...mhuis.info
Subject: Re: PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH]
 Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()")

On Tue, Jan 23, 2024 at 06:25:52PM +0100, Johan Hovold wrote:
> On Mon, Jan 22, 2024 at 12:26:15PM -0600, Bjorn Helgaas wrote:
> > On Mon, Jan 22, 2024 at 11:53:35AM +0100, Johan Hovold wrote:
> > > I never got a reply to this one so resending with updated Subject in
> > > case it got buried in your inbox.
> > 
> > I did see it but decided it was better to fix the problem with resume
> > causing an unintended reboot, even though fixing that meant breaking
> > lockdep again, since I don't think we have user reports of the
> > potential deadlock lockdep finds.
> 
> That may be because I fixed the previous regression in 6.7-rc1 before
> any users had a chance to hit the deadlock on Qualcomm platforms.
> 
> I can easily trigger a deadlock on the X13s by instrumenting 6.7-final
> with a delay to increase the race window.
> 
> And any user hitting this occasionally is likely not going to be able to
> track it down to this lock inversion (unless they have lockdep enabled).

I agree, it's a problem we need to fix.

> > 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") was a
> > start at fixing other problems and also improving the ASPM style, so I
> > hope somebody steps up to fix both it and the lockdep issue.  I
> > haven't looked at it enough to have a preference for *how* to fix it.
> 
> Ok, but since you were the one introducing the locking regression in
> 6.7-final shouldn't you look into fixing it?
> 
> Especially if there were alternatives to restoring the offending commit
> which would solve the underlying issue for the resume failure without
> breaking other platforms.

Did somebody propose an alternate patch?  If so, I missed it, but we
could look at it now.

> I don't want to spend more time on this if the offending commit could
> simply be reverted.

I don't quite follow.  By simply reverting, do you mean to revert
f93e71aea6c6 ("Revert "PCI/ASPM: Remove
pcie_aspm_pm_state_change()"")?  IIUC that would break Michael's
machine again.

Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ