linux-kernel - Re: [PATCH v2] PCI: rockchip: Avoid accessing PCIe registers with clocks gated

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5356a01c-5aab-fbff-b0a9-157b961c66ee@arm.com>
Date:   Fri, 25 Jun 2021 00:51:16 +0100
From:   Robin Murphy <robin.murphy@....com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     Javier Martinez Canillas <javierm@...hat.com>,
        linux-kernel@...r.kernel.org,
        Peter Robinson <pbrobinson@...il.com>,
        Shawn Lin <shawn.lin@...k-chips.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Heiko Stuebner <heiko@...ech.de>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Rob Herring <robh@...nel.org>,
        linux-arm-kernel@...ts.infradead.org, linux-pci@...r.kernel.org,
        linux-rockchip@...ts.infradead.org
Subject: Re: [PATCH v2] PCI: rockchip: Avoid accessing PCIe registers with
 clocks gated

On 2021-06-25 00:28, Bjorn Helgaas wrote:
> On Fri, Jun 25, 2021 at 12:18:48AM +0100, Robin Murphy wrote:
>> On 2021-06-24 22:57, Bjorn Helgaas wrote:
>>> On Tue, Jun 08, 2021 at 10:04:09AM +0200, Javier Martinez Canillas wrote:
>>>> IRQ handlers that are registered for shared interrupts can be called at
>>>> any time after have been registered using the request_irq() function.
>>>>
>>>> It's up to drivers to ensure that's always safe for these to be called.
>>>>
>>>> Both the "pcie-sys" and "pcie-client" interrupts are shared, but since
>>>> their handlers are registered very early in the probe function, an error
>>>> later can lead to these handlers being executed before all the required
>>>> resources have been properly setup.
>>>>
>>>> For example, the rockchip_pcie_read() function used by these IRQ handlers
>>>> expects that some PCIe clocks will already be enabled, otherwise trying
>>>> to access the PCIe registers causes the read to hang and never return.
>>>
>>> The read *never* completes?  That might be a bit problematic because
>>> it implies that we may not be able to recover from PCIe errors.  Most
>>> controllers will timeout eventually, log an error, and either
>>> fabricate some data (typically ~0) to complete the CPU's read or cause
>>> some kind of abort or machine check.
>>>
>>> Just asking in case there's some controller configuration that should
>>> be tweaked.
>>
>> If I'm following correctly, that'll be a read transaction to the native side
>> of the controller itself; it can't complete that read, or do anything else
>> either, because it's clock-gated, and thus completely oblivious (it might be
>> that if another CPU was able to enable the clocks then everything would
>> carry on as normal, or it might end up totally deadlocking the SoC
>> interconnect). I think it's safe to assume that in that state nothing of
>> importance would be happening on the PCIe side, and even if it was we'd
>> never get to know about it.
> 
> Oh, right, that makes sense.  I was thinking about the PCIe side, but
> if the controller itself isn't working, of course we wouldn't get that
> far.
> 
> I would expect that the CPU itself would have some kind of timeout for
> the read, but that's far outside of the PCI world.

Nah, in AMBA I'm not sure if it's even legal to abandon a transaction 
without waiting for the handshake to complete. If you're lucky the 
interconnect might have a clock/power domain bridge which can reply with 
an error when it knows its other side isn't running, otherwise the 
initiator will just happily sit there waiting for a response to come 
back "in a timely manner" :)

Robin.