lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aJL9IBagGNUagEbv@wunner.de>
Date: Wed, 6 Aug 2025 08:58:40 +0200
From: Lukas Wunner <lukas@...ner.de>
To: Brett Creeley <bcreeley@....com>
Cc: Shannon Nelson <shannon.nelson@....com>, netdev@...r.kernel.org,
	davem@...emloft.net, kuba@...nel.org, edumazet@...gle.com,
	pabeni@...hat.com, brett.creeley@....com, drivers@...sando.io
Subject: Re: [PATCH net-next 1/3] pds_core: add simple AER handler

On Tue, Aug 05, 2025 at 03:10:19PM -0700, Brett Creeley wrote:
> On 8/5/2025 8:01 AM, Lukas Wunner wrote:
> > On Fri, Feb 16, 2024 at 02:29:50PM -0800, Shannon Nelson wrote:
> > > Set up the pci_error_handlers error_detected and resume to be
> > > useful in handling AER events.
> > 
> > The above was committed as d740f4be7cf0 ("pds_core: add simple
> > AER handler").
> > 
> > Just noticed the following while inspecting the pci_error_handlers
> > of this driver:
> > 
> > > +static pci_ers_result_t pdsc_pci_error_detected(struct pci_dev *pdev,
> > > +                                             pci_channel_state_t error)
> > > +{
> > > +     if (error == pci_channel_io_frozen) {
> > > +             pdsc_reset_prepare(pdev);
> > > +             return PCI_ERS_RESULT_NEED_RESET;
> > > +     }
> > > +
> > > +     return PCI_ERS_RESULT_NONE;
> > > +}
> > 
> > The ->error_detected() callback of this driver invokes
> > pdsc_reset_prepare(), which unmaps BARs and calls pci_disable_device(),
> > but there is no corresponding ->slot_reset() callback which would invoke
> > pdsc_reset_done() to re-enable the device after reset recovery.
> 
> Thanks for the note. It's been a bit since I have looked at this, but I
> believe that it's working in the following way:
> 
> 1. pds_core's pci_error_handlers.error_detected callback returns
> PCI_ERS_RESULT_NEED_RESET
> 2. status is initialized to PCI_ERS_RESULT_RECOVERED in the pci core and
> since pds_core doesn't have a slot_reset callback then status remains
> PCI_ERS_RESULT_RECOVERED
> 3. pds_core's pci_error_handlers.resume callback is called, which will
> attempt reset/recover the device to a functional state

My point is, you're calling pdsc_reset_prepare() but you're never calling
pdsc_reset_done().  The former performs various teardown steps and calls
pci_disable_device(), which disables MMIO access to the device.  Since
you're never calling pdsc_reset_done(), you're not re-enabling MMIO
access to the device and re-initializing the device.  So I'd expect
any subsequent device access to fail.

Normally you'd have a ->slot_reset() callback which would call
pdsc_reset_done().  Then the code would look sane.

Moreover, the AER driver in the PCI core performs an unconditional 
Secondary Bus Reset on Fatal Errors (channel state pci_channel_io_frozen).
You're performing an additional reset of the PCI Function in
pdsc_pci_error_resume().  At least for Fatal Errors, this seems
superfluous.

You're only resetting the PCI Function if the PDSC_S_FW_DEAD bit is set,
irrespective whether you're dealing with a Fatal or Non-Fatal Error.

Normally I'd expect that you need to perform some re-initialization
after resetting the PCI Function, so this also looks weird.

Thanks,

Lukas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ