[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1ced023b-157c-21a0-ac75-1adef7f029f0@arm.com>
Date: Thu, 7 May 2020 11:55:54 +0100
From: Robin Murphy <robin.murphy@....com>
To: Sai Prakash Ranjan <saiprakash.ranjan@...eaurora.org>,
Will Deacon <will@...nel.org>, Joerg Roedel <joro@...tes.org>,
Rob Clark <robdclark@...il.com>,
Jordan Crouse <jcrouse@...eaurora.org>
Cc: iommu@...ts.linux-foundation.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-arm-msm@...r.kernel.org
Subject: Re: [PATCH] iomm/arm-smmu: Add stall implementation hook
On 2020-05-07 11:14 am, Sai Prakash Ranjan wrote:
> Hi Will, Robin
>
> On 2020-04-22 01:50, Sai Prakash Ranjan wrote:
>> Add stall implementation hook to enable stalling
>> faults on QCOM platforms which supports it without
>> causing any kind of hardware mishaps. Without this
>> on QCOM platforms, GPU faults can cause unrelated
>> GPU memory accesses to return zeroes. This has the
>> unfortunate result of command-stream reads from CP
>> getting invalid data, causing a cascade of fail.
I think this came up before, but something about this rationale doesn't
add up - we're not *using* stalls at all, we're still terminating
faulting transactions unconditionally; we're just using CFCFG to
terminate them with a slight delay, rather than immediately. It's really
not clear how or why that makes a difference. Is it a GPU bug? Or an
SMMU bug? Is this reliable (or even a documented workaround for
something), or might things start blowing up again if any other
behaviour subtly changes? I'm not dead set against adding this, but I'd
*really* like to have a lot more confidence in it.
>> Suggested-by: Rob Clark <robdclark@...il.com>
>> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@...eaurora.org>
>> ---
>> This has been attempted previously by Rob Clark in 2017, 2018.
>> Hopefully we can get something concluded in 2020.
>> * https://patchwork.kernel.org/patch/9953803/
>> * https://patchwork.kernel.org/patch/10618713/
>> ---
>> drivers/iommu/arm-smmu-qcom.c | 1 +
>> drivers/iommu/arm-smmu.c | 7 +++++++
>> drivers/iommu/arm-smmu.h | 1 +
>> 3 files changed, 9 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu-qcom.c
>> b/drivers/iommu/arm-smmu-qcom.c
>> index 24c071c1d8b0..a13b229389d4 100644
>> --- a/drivers/iommu/arm-smmu-qcom.c
>> +++ b/drivers/iommu/arm-smmu-qcom.c
>> @@ -32,6 +32,7 @@ static int qcom_sdm845_smmu500_reset(struct
>> arm_smmu_device *smmu)
>>
>> static const struct arm_smmu_impl qcom_smmu_impl = {
>> .reset = qcom_sdm845_smmu500_reset,
>> + .stall = true,
>> };
>>
>> struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device
>> *smmu)
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index e622f4e33379..16b03fca9966 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -488,6 +488,11 @@ static irqreturn_t arm_smmu_context_fault(int
>> irq, void *dev)
>> fsr, iova, fsynr, cbfrsynra, idx);
>>
>> arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
>> +
>> + if (smmu->impl && smmu->impl->stall && (fsr & ARM_SMMU_FSR_SS))
>> + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_RESUME,
>> + ARM_SMMU_RESUME_TERMINATE);
Shouldn't this be *before* the write to FSR, in case the outstanding
fault causes that to be immediately reasserted before we write CB_RESUME
and we end up immediately taking the IRQ a second time?
(The overall enablement being in impl is sound, but you still don't get
to play "works on my machine" in the architectural code :P)
Robin.
>> +
>> return IRQ_HANDLED;
>> }
>>
>> @@ -659,6 +664,8 @@ static void arm_smmu_write_context_bank(struct
>> arm_smmu_device *smmu, int idx)
>> reg |= ARM_SMMU_SCTLR_S1_ASIDPNE;
>> if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
>> reg |= ARM_SMMU_SCTLR_E;
>> + if (smmu->impl && smmu->impl->stall)
>> + reg |= ARM_SMMU_SCTLR_CFCFG;
>>
>> arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
>> }
>> diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
>> index 8d1cd54d82a6..d5134e0d5cce 100644
>> --- a/drivers/iommu/arm-smmu.h
>> +++ b/drivers/iommu/arm-smmu.h
>> @@ -386,6 +386,7 @@ struct arm_smmu_impl {
>> int (*init_context)(struct arm_smmu_domain *smmu_domain);
>> void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync,
>> int status);
>> + bool stall;
>> };
>>
>> static inline void __iomem *arm_smmu_page(struct arm_smmu_device
>> *smmu, int n)
>
> Any comments on this patch?
>
> Thanks,
> Sai
>
Powered by blists - more mailing lists