[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+-6iNzq_BV_fK9T4LK0ncZuufqp9E9+DUyFU3jKCnSCjN8n-w@mail.gmail.com>
Date: Wed, 6 Aug 2025 14:38:12 -0400
From: Jim Quinlan <james.quinlan@...adcom.com>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: linux-pci@...r.kernel.org, Nicolas Saenz Julienne <nsaenz@...nel.org>,
Bjorn Helgaas <bhelgaas@...gle.com>, Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
Cyril Brulebois <kibi@...ian.org>, bcm-kernel-feedback-list@...adcom.com,
jim2101024@...il.com, Florian Fainelli <florian.fainelli@...adcom.com>,
Lorenzo Pieralisi <lpieralisi@...nel.org>, Krzysztof Wilczyński <kwilczynski@...nel.org>,
Manivannan Sadhasivam <mani@...nel.org>, Rob Herring <robh@...nel.org>,
"moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE" <linux-rpi-kernel@...ts.infradead.org>,
"moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE" <linux-arm-kernel@...ts.infradead.org>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] PCI: brcmstb: Add panic/die handler to driver
On Wed, Aug 6, 2025 at 2:15 PM Bjorn Helgaas <helgaas@...nel.org> wrote:
>
> On Fri, Jun 13, 2025 at 06:08:43PM -0400, Jim Quinlan wrote:
> > Whereas most PCIe HW returns 0xffffffff on illegal accesses and the like,
> > by default Broadcom's STB PCIe controller effects an abort. Some SoCs --
> > 7216 and its descendants -- have new HW that identifies error details.
>
> What's the long term plan for this? This abort is a huge problem that
> we're seeing across arm64 platforms. Forcing a panic and reboot for
> every uncorrectable error is pretty hard to deal with.
Hello Bjorn,
Are you referring to STB/CM systems, Rpi, or something else altogether?
>
> Is there a plan to someday recover from these aborts? Or change the
> hardware so it can at least be configured to return ~0 data after
> logging the error in the hardware registers?
Some of our upcoming chips will have the ability to do nothing on
errant PCIe writes and return 0xffffffff on errant PCIe reads. But
none of our STB/CM chips do this currently. I've been asking for
this behavior for years but I have limited influence on what happens
in HW.
>
>
> > This simple handler determines if the PCIe controller was the cause of the
> > abort and if so, prints out diagnostic info. Unfortunately, an abort still
> > occurs.
> >
> > Care is taken to read the error registers only when the PCIe bridge is
> > active and the PCIe registers are acceptable. Otherwise, a "die" event
> > caused by something other than the PCIe could cause an abort if the PCIe
> > "die" handler tried to access registers when the bridge is off.
>
> Checking whether the bridge is active is a "mostly-works" situation
> since it's always racy.
I'm not sure I understand the "racy" comment. If the PCIe bridge is
off, we do not read the PCIe error registers. In this case, PCIe is
probably not the cause of the panic. In the rare case the PCIe
bridge is off and it was the PCIe that caused the panic, nothing gets
reported, and this is where we are without this commit. Perhaps this
is what you mean by "mostly-works". But this is the best that can be
done with SW given our HW.
Regards,
Jim Quinlan
Broadcom STB/CM
>
>
> > Example error output:
> > brcm-pcie 8b20000.pcie: Error: Mem Acc: 32bit, Read, @0x38000000
> > brcm-pcie 8b20000.pcie: Type: TO=0 Abt=0 UnspReq=1 AccDsble=0 BadAddr=0
Download attachment "smime.p7s" of type "application/pkcs7-signature" (4197 bytes)
Powered by blists - more mailing lists