[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240326211954.GA1497572@bhelgaas>
Date: Tue, 26 Mar 2024 16:19:54 -0500
From: Bjorn Helgaas <helgaas@...nel.org>
To: Kai-Heng Feng <kai.heng.feng@...onical.com>
Cc: adrian.hunter@...el.com, ulf.hansson@...aro.org,
Victor Shih <victor.shih@...esyslogic.com.tw>,
Ben Chuang <benchuanggli@...il.com>, linux-mmc@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mmc: sdhci-pci-gli: GL975x: Mask rootport's replay
timer timeout during suspend
On Tue, Mar 26, 2024 at 09:52:28AM +0800, Kai-Heng Feng wrote:
> On Tue, Mar 26, 2024 at 3:02 AM Bjorn Helgaas <helgaas@...nel.org> wrote:
> > On Mon, Mar 25, 2024 at 10:02:27AM +0800, Kai-Heng Feng wrote:
> > > On Sat, Mar 23, 2024 at 12:43 AM Bjorn Helgaas <helgaas@...nel.org> wrote:
> ...
> > > > If that's the case, why do the
> > > > masking in the suspend/resume callbacks?
> > >
> > > Because there's no functional impact when the error happens, other
> > > than suspend/resume.
> >
> > Oh, I think I see. Is this accurate?
> >
> > Due to a hardware defect in GL975x, config accesses when ASPM is
> > enabled frequently cause Replay Timer Timeouts in the Port leading
> > to the device.
> >
> > These are Correctable Errors, so the Downstream Port logs it in its
> > PCI_ERR_COR_STATUS and, when the error is not masked, sends an
> > ERR_COR message upstream. The message terminates at a Root Port,
> > which may generate an AER interrupt so the OS can log it.
> >
> > The Correctable Error logging is an annoyance but normally not a
> > major issue. But when the AER interrupt happens during suspend, it
> > can prevent the system from suspending.
>
> That's totally the case here.
>
> This brings up another different but related topic - should the port
> driver disable AER/DPC IRQ during suspend?
> We've discussed this many times, I still think that's the right
> approach to "quiesce" many unexpected errors during system state
> transition.
Maybe so. We can continue that in the context of that patch. Maybe
it needs to be reposted; I can't remember where it's at right now.
> > I think we should log a hint in dmesg that we're masking
> > PCI_ERR_COR_REP_TIMER because the error will still be logged in the
> > PCI_ERR_COR_STATUS register, and that will be visible via lspci, and a
> > dmesg hint will save debugging time when people report that.
>
> Sure. Where do you think it's a better place to implement the quirk? I
> Assume PCI quirk is a better place than driver's probe routine?
Yes, I think drivers/pci/quirks.c is a better place so we can mask it
even if the driver isn't loaded. Users can still run lspci and see
these errors even if the driver isn't loaded.
Bjorn
Powered by blists - more mailing lists