linux-kernel - Re: [PATCH] iomm/arm-smmu: Add stall implementation hook

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1ced023b-157c-21a0-ac75-1adef7f029f0@arm.com>
Date:   Thu, 7 May 2020 11:55:54 +0100
From:   Robin Murphy <robin.murphy@....com>
To:     Sai Prakash Ranjan <saiprakash.ranjan@...eaurora.org>,
        Will Deacon <will@...nel.org>, Joerg Roedel <joro@...tes.org>,
        Rob Clark <robdclark@...il.com>,
        Jordan Crouse <jcrouse@...eaurora.org>
Cc:     iommu@...ts.linux-foundation.org,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        linux-arm-msm@...r.kernel.org
Subject: Re: [PATCH] iomm/arm-smmu: Add stall implementation hook

On 2020-05-07 11:14 am, Sai Prakash Ranjan wrote:
> Hi Will, Robin
> 
> On 2020-04-22 01:50, Sai Prakash Ranjan wrote:
>> Add stall implementation hook to enable stalling
>> faults on QCOM platforms which supports it without
>> causing any kind of hardware mishaps. Without this
>> on QCOM platforms, GPU faults can cause unrelated
>> GPU memory accesses to return zeroes. This has the
>> unfortunate result of command-stream reads from CP
>> getting invalid data, causing a cascade of fail.

I think this came up before, but something about this rationale doesn't 
add up - we're not *using* stalls at all, we're still terminating 
faulting transactions unconditionally; we're just using CFCFG to 
terminate them with a slight delay, rather than immediately. It's really 
not clear how or why that makes a difference. Is it a GPU bug? Or an 
SMMU bug? Is this reliable (or even a documented workaround for 
something), or might things start blowing up again if any other 
behaviour subtly changes? I'm not dead set against adding this, but I'd 
*really* like to have a lot more confidence in it.

>> Suggested-by: Rob Clark <robdclark@...il.com>
>> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@...eaurora.org>
>> ---
>> This has been attempted previously by Rob Clark in 2017, 2018.
>> Hopefully we can get something concluded in 2020.
>>  * https://patchwork.kernel.org/patch/9953803/
>>  * https://patchwork.kernel.org/patch/10618713/
>> ---
>>  drivers/iommu/arm-smmu-qcom.c | 1 +
>>  drivers/iommu/arm-smmu.c      | 7 +++++++
>>  drivers/iommu/arm-smmu.h      | 1 +
>>  3 files changed, 9 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu-qcom.c 
>> b/drivers/iommu/arm-smmu-qcom.c
>> index 24c071c1d8b0..a13b229389d4 100644
>> --- a/drivers/iommu/arm-smmu-qcom.c
>> +++ b/drivers/iommu/arm-smmu-qcom.c
>> @@ -32,6 +32,7 @@ static int qcom_sdm845_smmu500_reset(struct
>> arm_smmu_device *smmu)
>>
>>  static const struct arm_smmu_impl qcom_smmu_impl = {
>>      .reset = qcom_sdm845_smmu500_reset,
>> +    .stall = true,
>>  };
>>
>>  struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device 
>> *smmu)
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index e622f4e33379..16b03fca9966 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -488,6 +488,11 @@ static irqreturn_t arm_smmu_context_fault(int
>> irq, void *dev)
>>                  fsr, iova, fsynr, cbfrsynra, idx);
>>
>>      arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
>> +
>> +    if (smmu->impl && smmu->impl->stall && (fsr & ARM_SMMU_FSR_SS))
>> +        arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_RESUME,
>> +                  ARM_SMMU_RESUME_TERMINATE);

Shouldn't this be *before* the write to FSR, in case the outstanding 
fault causes that to be immediately reasserted before we write CB_RESUME 
and we end up immediately taking the IRQ a second time?

(The overall enablement being in impl is sound, but you still don't get 
to play "works on my machine" in the architectural code :P)

Robin.

>> +
>>      return IRQ_HANDLED;
>>  }
>>
>> @@ -659,6 +664,8 @@ static void arm_smmu_write_context_bank(struct
>> arm_smmu_device *smmu, int idx)
>>          reg |= ARM_SMMU_SCTLR_S1_ASIDPNE;
>>      if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
>>          reg |= ARM_SMMU_SCTLR_E;
>> +    if (smmu->impl && smmu->impl->stall)
>> +        reg |= ARM_SMMU_SCTLR_CFCFG;
>>
>>      arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
>>  }
>> diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
>> index 8d1cd54d82a6..d5134e0d5cce 100644
>> --- a/drivers/iommu/arm-smmu.h
>> +++ b/drivers/iommu/arm-smmu.h
>> @@ -386,6 +386,7 @@ struct arm_smmu_impl {
>>      int (*init_context)(struct arm_smmu_domain *smmu_domain);
>>      void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync,
>>               int status);
>> +    bool stall;
>>  };
>>
>>  static inline void __iomem *arm_smmu_page(struct arm_smmu_device 
>> *smmu, int n)
> 
> Any comments on this patch?
> 
> Thanks,
> Sai
>