linux-kernel - Re: [PATCH] iommu/arm-smmu: Use pm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d6bc7f38-b41d-4e89-b484-0e699981b8b4@arm.com>
Date: Wed, 28 Jan 2026 18:44:35 +0000
From: Robin Murphy <robin.murphy@....com>
To: Prakash Gupta <prakash.gupta@....qualcomm.com>,
 Will Deacon <will@...nel.org>, Joerg Roedel <joro@...tes.org>,
 Rob Clark <robin.clark@....qualcomm.com>, Connor Abbott <cwabbott0@...il.com>
Cc: linux-arm-msm@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
 iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
 Akhil P Oommen <akhilpo@....qualcomm.com>,
 Pratyush Brahma <pratyush.brahma@....qualcomm.com>,
 Pranjal Shrivastava <praan@...gle.com>
Subject: Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers

[ +Pranjal as this might matter for v3 too... ]

On 28/01/2026 5:56 am, Prakash Gupta wrote:
> 
> On 1/27/2026 9:35 PM, Robin Murphy wrote:
>> On 2026-01-27 12:11 pm, Prakash Gupta wrote:
>>> Commit d4a44f0750bb ("iommu/arm-smmu: Invoke pm_runtime across the
>>> driver")
>>> enabled pm_runtime for the arm-smmu device. On systems where the SMMU
>>> sits in a power domain, all register accesses must be done while the
>>> device is runtime-resumed to avoid unclocked register reads and
>>> potential NoC errors.
>>>
>>> So far, this has not been an issue for most SMMU clients because
>>> stall-on-fault is enabled by default. While a translation fault is
>>> being handled, the SMMU stalls further translations for that context
>>> bank, so the fault handler would not race with a powered-down
>>> SMMU.
>>>
>>> Adreno SMMU now disables stall-on-fault in the presence of fault
>>> storms to avoid saturating SMMU resources and hanging the GMU. With
>>> stall-on-fault disabled, the SMMU can generate faults while its power
>>> domain may no longer be enabled, which makes unclocked accesses to
>>> fault-status registers in the SMMU fault handlers possible.
>>
>> At face value, that sounds wrong - how does an SMMU generate a fault,
>> or indeed do anything, when it's powered off? In principle it's
>> possible that the SMMU might signal an interrupt, and is _then_
>> suspended (with the interrupt line somehow remaining asserted, so
>> probably more clock-gated than completely powered down) before the
>> interrupt hander runs, but we rather assume that we're not going to
>> have an unhandled hardware IRQ hanging around for longer than the
>> autosuspend delay.
>>
>> So, judging by the diff below, I guess what you really mean is that in
>> the case of a threaded context IRQ handler, it can take long enough
>> between handling the hardware IRQ and the thread actually running that
>> the SMMU may have suspended in between?
> 
> You are correct that the SMMU cannot generate a fault while powered
> down. A more accurate description of the race condition is as follows:
> When stall-on-fault is disabled, the faulting transaction does is
> terminated. This allows the master (the GPU) to complete its work, drop
> its power vote for the SMMU, and allow the SMMU to suspend. However, the
> SMMU fault handler may still be waiting to execute on the CPU.
> If the SMMU suspends before the handler reads the fault registers, an
> unclocked access occurs. This scenario is significantly more likely when
> using threaded IRQs due to the scheduling latency involved. I will
> update the next iteration to reflect this.
> 
>>
>>> Guard the context and global fault handlers with arm_smmu_rpm_get() /
>>> arm_smmu_rpm_put() so that all SMMU fault register accesses are done
>>> with the SMMU powered.
>>>
>>> Fixes: b13044092c1e ("drm/msm: Temporarily disable stall-on-fault
>>> after a page fault")
>>> Co-developed-by: Pratyush Brahma <pratyush.brahma@....qualcomm.com>
>>> Signed-off-by: Pratyush Brahma <pratyush.brahma@....qualcomm.com>
>>> Signed-off-by: Prakash Gupta <prakash.gupta@....qualcomm.com>
>>> ---
>>>    drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c |  5 ++-
>>>    drivers/iommu/arm/arm-smmu/arm-smmu.c      | 53
>>> ++++++++++++++++++++++--------
>>>    2 files changed, 43 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>> b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>> index 573085349df3..2d03df72612d 100644
>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>> @@ -317,6 +317,7 @@ static int qcom_adreno_smmu_init_context(struct
>>> arm_smmu_domain *smmu_domain,
>>>        struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>        struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
>>>        const struct of_device_id *client_match;
>>> +    const struct arm_smmu_impl *impl = qsmmu->data->impl;
>>>        int cbndx = smmu_domain->cfg.cbndx;
>>>        struct adreno_smmu_priv *priv;
>>>    @@ -350,10 +351,12 @@ static int
>>> qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
>>>        priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
>>>        priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
>>>        priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
>>> -    priv->set_stall = qcom_adreno_smmu_set_stall;
>>>        priv->set_prr_bit = NULL;
>>>        priv->set_prr_addr = NULL;
>>>    +    if (impl->context_fault_needs_threaded_irq)
>>> +        priv->set_stall = qcom_adreno_smmu_set_stall;
>>> +
>>>        if (of_device_is_compatible(np, "qcom,smmu-500") &&
>>>            !of_device_is_compatible(np, "qcom,sm8250-smmu-500") &&
>>>            of_device_is_compatible(np, "qcom,adreno-smmu")) {
>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>> b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>> index 5e690cf85ec9..183f12e45b02 100644
>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>> @@ -462,10 +462,23 @@ static irqreturn_t arm_smmu_context_fault(int
>>> irq, void *dev)
>>>        int idx = smmu_domain->cfg.cbndx;
>>>        int ret;
>>>    +    if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq) {
>>
>> Why is this conditional on being threaded, if the global fault handler
>> that can never be threaded at all apparently needs it unconditionally?
> Synchronous runtime PM calls can sleep, which would cause issue if
> called within a hard IRQ context. This is why I added the conditional
> check for threaded IRQs.
> Furthermore, this change only allow the driver to override the
> stall-on-fault setting when context_fault_needs_threaded_irq is true.
> Since the unclocked access issue is tied to disabling stall-on-fault,
> the fix is only logically required for the threaded IRQ path.
> For the Global Fault handler, which runs in a hard IRQ context, you are
> right—we cannot safely vote for power there. I will remove the runtime
> PM call from that section.

Hmm, but then how *do* we actually guarantee that autosuspend doesn't 
happen to kick in and power down the SMMU just as a hardirq handler 
runs, when there's some unexpected event? I fear there's a horrible can 
of worms here...

Thanks,
Robin.