[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220313195220.GA436941@bhelgaas>
Date: Sun, 13 Mar 2022 14:52:20 -0500
From: Bjorn Helgaas <helgaas@...nel.org>
To: Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@...ux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>,
Russell Currey <ruscur@...sell.cc>,
Oliver OHalloran <oohall@...il.com>,
Kuppuswamy Sathyanarayanan <knsathya@...nel.org>,
Ashok Raj <ashok.raj@...el.com>, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org,
Eric Badger <ebadger@...estorage.com>,
linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCH v1] PCI/AER: Handle Multi UnCorrectable/Correctable
errors properly
On Fri, Mar 11, 2022 at 02:58:07AM +0000, Kuppuswamy Sathyanarayanan wrote:
> Currently the aer_irq() handler returns IRQ_NONE for cases without bits
> PCI_ERR_ROOT_UNCOR_RCV or PCI_ERR_ROOT_COR_RCV are set. But this
> assumption is incorrect.
>
> Consider a scenario where aer_irq() is triggered for a correctable
> error, and while we process the error and before we clear the error
> status in "Root Error Status" register, if the same kind of error
> is triggered again, since aer_irq() only clears events it saw, the
> multi-bit error is left in tact. This will cause the interrupt to fire
> again, resulting in entering aer_irq() with just the multi-bit error
> logged in the "Root Error Status" register.
>
> Repeated AER recovery test has revealed this condition does happen
> and this prevents any new interrupt from being triggered. Allow to
> process interrupt even if only multi-correctable (BIT 1) or
> multi-uncorrectable bit (BIT 3) is set.
>
> Reported-by: Eric Badger <ebadger@...estorage.com>
Is there a bug report with any concrete details (dmesg, lspci, etc)
that we can include here?
Powered by blists - more mailing lists