[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5356a01c-5aab-fbff-b0a9-157b961c66ee@arm.com>
Date: Fri, 25 Jun 2021 00:51:16 +0100
From: Robin Murphy <robin.murphy@....com>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: Javier Martinez Canillas <javierm@...hat.com>,
linux-kernel@...r.kernel.org,
Peter Robinson <pbrobinson@...il.com>,
Shawn Lin <shawn.lin@...k-chips.com>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Heiko Stuebner <heiko@...ech.de>,
Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
Rob Herring <robh@...nel.org>,
linux-arm-kernel@...ts.infradead.org, linux-pci@...r.kernel.org,
linux-rockchip@...ts.infradead.org
Subject: Re: [PATCH v2] PCI: rockchip: Avoid accessing PCIe registers with
clocks gated
On 2021-06-25 00:28, Bjorn Helgaas wrote:
> On Fri, Jun 25, 2021 at 12:18:48AM +0100, Robin Murphy wrote:
>> On 2021-06-24 22:57, Bjorn Helgaas wrote:
>>> On Tue, Jun 08, 2021 at 10:04:09AM +0200, Javier Martinez Canillas wrote:
>>>> IRQ handlers that are registered for shared interrupts can be called at
>>>> any time after have been registered using the request_irq() function.
>>>>
>>>> It's up to drivers to ensure that's always safe for these to be called.
>>>>
>>>> Both the "pcie-sys" and "pcie-client" interrupts are shared, but since
>>>> their handlers are registered very early in the probe function, an error
>>>> later can lead to these handlers being executed before all the required
>>>> resources have been properly setup.
>>>>
>>>> For example, the rockchip_pcie_read() function used by these IRQ handlers
>>>> expects that some PCIe clocks will already be enabled, otherwise trying
>>>> to access the PCIe registers causes the read to hang and never return.
>>>
>>> The read *never* completes? That might be a bit problematic because
>>> it implies that we may not be able to recover from PCIe errors. Most
>>> controllers will timeout eventually, log an error, and either
>>> fabricate some data (typically ~0) to complete the CPU's read or cause
>>> some kind of abort or machine check.
>>>
>>> Just asking in case there's some controller configuration that should
>>> be tweaked.
>>
>> If I'm following correctly, that'll be a read transaction to the native side
>> of the controller itself; it can't complete that read, or do anything else
>> either, because it's clock-gated, and thus completely oblivious (it might be
>> that if another CPU was able to enable the clocks then everything would
>> carry on as normal, or it might end up totally deadlocking the SoC
>> interconnect). I think it's safe to assume that in that state nothing of
>> importance would be happening on the PCIe side, and even if it was we'd
>> never get to know about it.
>
> Oh, right, that makes sense. I was thinking about the PCIe side, but
> if the controller itself isn't working, of course we wouldn't get that
> far.
>
> I would expect that the CPU itself would have some kind of timeout for
> the read, but that's far outside of the PCI world.
Nah, in AMBA I'm not sure if it's even legal to abandon a transaction
without waiting for the handshake to complete. If you're lucky the
interconnect might have a clock/power domain bridge which can reply with
an error when it knows its other side isn't running, otherwise the
initiator will just happily sit there waiting for a response to come
back "in a timely manner" :)
Robin.
Powered by blists - more mailing lists