linux-kernel - Re: [PATCH v2] PCI: rockchip: Avoid accessing PCIe registers with clocks gated

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <44c551d7-fee4-13cf-2929-6d2383dd5497@arm.com>
Date:   Fri, 25 Jun 2021 00:18:48 +0100
From:   Robin Murphy <robin.murphy@....com>
To:     Bjorn Helgaas <helgaas@...nel.org>,
        Javier Martinez Canillas <javierm@...hat.com>
Cc:     linux-kernel@...r.kernel.org,
        Peter Robinson <pbrobinson@...il.com>,
        Shawn Lin <shawn.lin@...k-chips.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Heiko Stuebner <heiko@...ech.de>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Rob Herring <robh@...nel.org>,
        linux-arm-kernel@...ts.infradead.org, linux-pci@...r.kernel.org,
        linux-rockchip@...ts.infradead.org
Subject: Re: [PATCH v2] PCI: rockchip: Avoid accessing PCIe registers with
 clocks gated

On 2021-06-24 22:57, Bjorn Helgaas wrote:
> On Tue, Jun 08, 2021 at 10:04:09AM +0200, Javier Martinez Canillas wrote:
>> IRQ handlers that are registered for shared interrupts can be called at
>> any time after have been registered using the request_irq() function.
>>
>> It's up to drivers to ensure that's always safe for these to be called.
>>
>> Both the "pcie-sys" and "pcie-client" interrupts are shared, but since
>> their handlers are registered very early in the probe function, an error
>> later can lead to these handlers being executed before all the required
>> resources have been properly setup.
>>
>> For example, the rockchip_pcie_read() function used by these IRQ handlers
>> expects that some PCIe clocks will already be enabled, otherwise trying
>> to access the PCIe registers causes the read to hang and never return.
> 
> The read *never* completes?  That might be a bit problematic because
> it implies that we may not be able to recover from PCIe errors.  Most
> controllers will timeout eventually, log an error, and either
> fabricate some data (typically ~0) to complete the CPU's read or cause
> some kind of abort or machine check.
> 
> Just asking in case there's some controller configuration that should
> be tweaked.

If I'm following correctly, that'll be a read transaction to the native 
side of the controller itself; it can't complete that read, or do 
anything else either, because it's clock-gated, and thus completely 
oblivious (it might be that if another CPU was able to enable the clocks 
then everything would carry on as normal, or it might end up totally 
deadlocking the SoC interconnect). I think it's safe to assume that in 
that state nothing of importance would be happening on the PCIe side, 
and even if it was we'd never get to know about it.

The only relevant configuration would be "don't turn the clocks off if 
you're using the thing", which in actual operation can be taken for 
granted. It's a fairly typical bug to register an IRQ as shared but 
assume in the handler that you'll only ever be called for your own 
device's IRQ while it's powered up/clocked/etc. in its normal 
operational state, hence CONFIG_DEBUG_SHIRQ helps flush those kinds of 
unreliable assumptions out.

Robin.

(this reminds me of the "fun" I once had where a machine was locking up 
during boot, but simply connecting an external debugger to find out 
exactly where it was stuck happened to automatically enable the 
offending power domain and un-stick it)