lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fad5dc096a2bd9404341ba8738ba8fc9@codeaurora.org>
Date:   Thu, 07 May 2020 17:36:41 +0530
From:   Sai Prakash Ranjan <saiprakash.ranjan@...eaurora.org>
To:     Robin Murphy <robin.murphy@....com>,
        Rob Clark <robdclark@...il.com>
Cc:     Will Deacon <will@...nel.org>, Joerg Roedel <joro@...tes.org>,
        Jordan Crouse <jcrouse@...eaurora.org>,
        iommu@...ts.linux-foundation.org,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        linux-arm-msm@...r.kernel.org
Subject: Re: [PATCH] iomm/arm-smmu: Add stall implementation hook

Hi Robin,

On 2020-05-07 16:25, Robin Murphy wrote:
> On 2020-05-07 11:14 am, Sai Prakash Ranjan wrote:
>> Hi Will, Robin
>> 
>> On 2020-04-22 01:50, Sai Prakash Ranjan wrote:
>>> Add stall implementation hook to enable stalling
>>> faults on QCOM platforms which supports it without
>>> causing any kind of hardware mishaps. Without this
>>> on QCOM platforms, GPU faults can cause unrelated
>>> GPU memory accesses to return zeroes. This has the
>>> unfortunate result of command-stream reads from CP
>>> getting invalid data, causing a cascade of fail.
> 
> I think this came up before, but something about this rationale
> doesn't add up - we're not *using* stalls at all, we're still
> terminating faulting transactions unconditionally; we're just using
> CFCFG to terminate them with a slight delay, rather than immediately.
> It's really not clear how or why that makes a difference. Is it a GPU
> bug? Or an SMMU bug? Is this reliable (or even a documented workaround
> for something), or might things start blowing up again if any other
> behaviour subtly changes? I'm not dead set against adding this, but
> I'd *really* like to have a lot more confidence in it.
> 

Yes it has come up before, you can find details in below links.
  - https://patchwork.kernel.org/patch/9953803/
  - https://patchwork.kernel.org/patch/10618713/

Rob Clark can add more details on this probably for the GPU faults.
As for the reliability, downstream kernel(I mean kernels with which 
android
devices with QCOM chipsets are shipped) has stalling enabled for a long 
time
now and has been stable in the field. So we can say that we are safe 
with
this enabled in QCOM implementation.

>>> Suggested-by: Rob Clark <robdclark@...il.com>
>>> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@...eaurora.org>
>>> ---
>>> This has been attempted previously by Rob Clark in 2017, 2018.
>>> Hopefully we can get something concluded in 2020.
>>>  * https://patchwork.kernel.org/patch/9953803/
>>>  * https://patchwork.kernel.org/patch/10618713/
>>> ---
>>>  drivers/iommu/arm-smmu-qcom.c | 1 +
>>>  drivers/iommu/arm-smmu.c      | 7 +++++++
>>>  drivers/iommu/arm-smmu.h      | 1 +
>>>  3 files changed, 9 insertions(+)
>>> 
>>> diff --git a/drivers/iommu/arm-smmu-qcom.c 
>>> b/drivers/iommu/arm-smmu-qcom.c
>>> index 24c071c1d8b0..a13b229389d4 100644
>>> --- a/drivers/iommu/arm-smmu-qcom.c
>>> +++ b/drivers/iommu/arm-smmu-qcom.c
>>> @@ -32,6 +32,7 @@ static int qcom_sdm845_smmu500_reset(struct
>>> arm_smmu_device *smmu)
>>> 
>>>  static const struct arm_smmu_impl qcom_smmu_impl = {
>>>      .reset = qcom_sdm845_smmu500_reset,
>>> +    .stall = true,
>>>  };
>>> 
>>>  struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device 
>>> *smmu)
>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>> index e622f4e33379..16b03fca9966 100644
>>> --- a/drivers/iommu/arm-smmu.c
>>> +++ b/drivers/iommu/arm-smmu.c
>>> @@ -488,6 +488,11 @@ static irqreturn_t arm_smmu_context_fault(int
>>> irq, void *dev)
>>>                  fsr, iova, fsynr, cbfrsynra, idx);
>>> 
>>>      arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
>>> +
>>> +    if (smmu->impl && smmu->impl->stall && (fsr & ARM_SMMU_FSR_SS))
>>> +        arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_RESUME,
>>> +                  ARM_SMMU_RESUME_TERMINATE);
> 
> Shouldn't this be *before* the write to FSR, in case the outstanding
> fault causes that to be immediately reasserted before we write
> CB_RESUME and we end up immediately taking the IRQ a second time?
> 

Yes, I will fixup this in the next version.

> (The overall enablement being in impl is sound, but you still don't
> get to play "works on my machine" in the architectural code :P)
> 

We could have our own context fault handler in QCOM implementation,
but that would just be duplicating things from arm-smmu context fault
handler. So I did not think it makes much sense to have our own
fault handler in qcom impl just for enabling stall model.

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ