[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aJL9IBagGNUagEbv@wunner.de>
Date: Wed, 6 Aug 2025 08:58:40 +0200
From: Lukas Wunner <lukas@...ner.de>
To: Brett Creeley <bcreeley@....com>
Cc: Shannon Nelson <shannon.nelson@....com>, netdev@...r.kernel.org,
davem@...emloft.net, kuba@...nel.org, edumazet@...gle.com,
pabeni@...hat.com, brett.creeley@....com, drivers@...sando.io
Subject: Re: [PATCH net-next 1/3] pds_core: add simple AER handler
On Tue, Aug 05, 2025 at 03:10:19PM -0700, Brett Creeley wrote:
> On 8/5/2025 8:01 AM, Lukas Wunner wrote:
> > On Fri, Feb 16, 2024 at 02:29:50PM -0800, Shannon Nelson wrote:
> > > Set up the pci_error_handlers error_detected and resume to be
> > > useful in handling AER events.
> >
> > The above was committed as d740f4be7cf0 ("pds_core: add simple
> > AER handler").
> >
> > Just noticed the following while inspecting the pci_error_handlers
> > of this driver:
> >
> > > +static pci_ers_result_t pdsc_pci_error_detected(struct pci_dev *pdev,
> > > + pci_channel_state_t error)
> > > +{
> > > + if (error == pci_channel_io_frozen) {
> > > + pdsc_reset_prepare(pdev);
> > > + return PCI_ERS_RESULT_NEED_RESET;
> > > + }
> > > +
> > > + return PCI_ERS_RESULT_NONE;
> > > +}
> >
> > The ->error_detected() callback of this driver invokes
> > pdsc_reset_prepare(), which unmaps BARs and calls pci_disable_device(),
> > but there is no corresponding ->slot_reset() callback which would invoke
> > pdsc_reset_done() to re-enable the device after reset recovery.
>
> Thanks for the note. It's been a bit since I have looked at this, but I
> believe that it's working in the following way:
>
> 1. pds_core's pci_error_handlers.error_detected callback returns
> PCI_ERS_RESULT_NEED_RESET
> 2. status is initialized to PCI_ERS_RESULT_RECOVERED in the pci core and
> since pds_core doesn't have a slot_reset callback then status remains
> PCI_ERS_RESULT_RECOVERED
> 3. pds_core's pci_error_handlers.resume callback is called, which will
> attempt reset/recover the device to a functional state
My point is, you're calling pdsc_reset_prepare() but you're never calling
pdsc_reset_done(). The former performs various teardown steps and calls
pci_disable_device(), which disables MMIO access to the device. Since
you're never calling pdsc_reset_done(), you're not re-enabling MMIO
access to the device and re-initializing the device. So I'd expect
any subsequent device access to fail.
Normally you'd have a ->slot_reset() callback which would call
pdsc_reset_done(). Then the code would look sane.
Moreover, the AER driver in the PCI core performs an unconditional
Secondary Bus Reset on Fatal Errors (channel state pci_channel_io_frozen).
You're performing an additional reset of the PCI Function in
pdsc_pci_error_resume(). At least for Fatal Errors, this seems
superfluous.
You're only resetting the PCI Function if the PDSC_S_FW_DEAD bit is set,
irrespective whether you're dealing with a Fatal or Non-Fatal Error.
Normally I'd expect that you need to perform some re-initialization
after resetting the PCI Function, so this also looks weird.
Thanks,
Lukas
Powered by blists - more mailing lists