lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aIp_Z9IdwSjMtDho@wunner.de>
Date: Wed, 30 Jul 2025 22:24:07 +0200
From: Lukas Wunner <lukas@...ner.de>
To: Niklas Schnelle <schnelle@...ux.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>,
	Mahesh J Salgaonkar <mahesh@...ux.ibm.com>,
	Linas Vepstas <linasvepstas@...il.com>,
	Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
	Manivannan Sadhasivam <mani@...nel.org>,
	Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
	Heiko Carstens <hca@...ux.ibm.com>,
	Vasily Gorbik <gor@...ux.ibm.com>,
	Alexander Gordeev <agordeev@...ux.ibm.com>,
	Christian Borntraeger <borntraeger@...ux.ibm.com>,
	Sven Schnelle <svens@...ux.ibm.com>,
	Peter Oberparleiter <oberpar@...ux.ibm.com>,
	Matthew Rosato <mjrosato@...ux.ibm.com>,
	Oliver O'Halloran <oohall@...il.com>, Sinan Kaya <okaya@...nel.org>,
	linuxppc-dev@...ts.ozlabs.org, linux-s390@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
	Keith Busch <kbusch@...nel.org>
Subject: Re: [PATCH v3 1/2] PCI/AER: Fix missing uevent on recovery when a
 reset is requested

On Wed, Jul 30, 2025 at 10:01:50PM +0200, Lukas Wunner wrote:
> On Wed, Jul 30, 2025 at 01:20:57PM +0200, Niklas Schnelle wrote:
> > Since commit 7b42d97e99d3 ("PCI/ERR: Always report current recovery
> > status for udev") AER uses the result of error_detected() as parameter
> > to pci_uevent_ers(). As pci_uevent_ers() however does not handle
> > PCI_ERS_RESULT_NEED_RESET this results in a missing uevent for the
> > beginning of recovery if drivers request a reset. Fix this by treating
> > PCI_ERS_RESULT_NEED_RESET as beginning recovery.
> [...]
> > +++ b/drivers/pci/pci-driver.c
> > @@ -1592,6 +1592,7 @@ void pci_uevent_ers(struct pci_dev *pdev, enum pci_ers_result err_type)
> >  	switch (err_type) {
> >  	case PCI_ERS_RESULT_NONE:
> >  	case PCI_ERS_RESULT_CAN_RECOVER:
> > +	case PCI_ERS_RESULT_NEED_RESET:
> >  		envp[idx++] = "ERROR_EVENT=BEGIN_RECOVERY";
> >  		envp[idx++] = "DEVICE_ONLINE=0";
> >  		break;
> 
> I note that PCI_ERS_RESULT_NO_AER_DRIVER is also missing in that
> switch/case statement.  I guess for the patch to be complete,
> it needs to be added to the PCI_ERS_RESULT_DISCONNECT case.
> Do you agree?

I realize now there's a bigger problem here:  In pcie_do_recovery(),
when control reaches the "failed:" label, a uevent is only signaled
for the *bridge*.  Shouldn't a uevent instead be signaled for every
device *below* the bridge?  (And possibly the bridge itself if it was
the device reporting the error.)

In that case you don't need to add PCI_ERS_RESULT_NO_AER_DRIVER to
the switch/case statement because we wouldn't want to have multiple
uevents reporting disconnect, so the one emitted below the "failed:"
label would be sufficient.

Right now we may report BEGIN_RECOVERY to user space, but we fail to
later on signal FAILED_RECOVERY (unless I'm missing something).

This all looks so broken that I'm starting to wonder if there's any
user space application at all that takes advantage of these uevents?

Thanks,

Lukas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ