lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2a7098cb-5f36-d0ab-4f74-ac6e4ab7e0fd@arm.com>
Date:   Mon, 18 Mar 2019 17:24:17 +0000
From:   Robin Murphy <robin.murphy@....com>
To:     Russell King - ARM Linux admin <linux@...linux.org.uk>
Cc:     Marc Gonzalez <marc.w.gonzalez@...e.fr>,
        Marc Zyngier <marc.zyngier@....com>,
        Will Deacon <will.deacon@....com>,
        Jens Axboe <axboe@...nel.dk>,
        Jeffrey Hugo <jhugo@...eaurora.org>,
        Catalin Marinas <catalin.marinas@....com>,
        LKML <linux-kernel@...r.kernel.org>,
        Bjorn Andersson <bjorn.andersson@...aro.org>,
        MSM <linux-arm-msm@...r.kernel.org>,
        AngeloGioacchino Del Regno <kholk11@...il.com>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH] arm64/io: Don't use WZR in writel

On 18/03/2019 17:19, Robin Murphy wrote:
> On 18/03/2019 17:00, Russell King - ARM Linux admin wrote:
>> On Mon, Mar 18, 2019 at 04:04:03PM +0000, Robin Murphy wrote:
>>> On 12/03/2019 12:36, Marc Gonzalez wrote:
>>>> On 24/02/2019 04:53, Bjorn Andersson wrote:
>>>>
>>>>> On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:
>>>>>
>>>>>> On Sat, 23 Feb 2019 18:12:54 +0000, Bjorn Andersson wrote:
>>>>>>>
>>>>>>> On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
>>>>>>>
>>>>>>>> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
>>>>>>>>
>>>>>>>>> Also, just one more thing: yes this thing is going ARM64-wide and
>>>>>>>>> - from my findings - it's targeting certain Qualcomm SoCs, but...
>>>>>>>>> I'm not sure that only QC is affected by that, others may as well
>>>>>>>>> have the same stupid bug.
>>>>>>>>
>>>>>>>> At the moment, only QC SoCs seem to be affected, probably because
>>>>>>>> everyone else has debugged their hypervisor (or most likely doesn't
>>>>>>>> bother with shipping one).
>>>>>>>>
>>>>>>>> In all honesty, we need some information from QC here: which 
>>>>>>>> SoCs are
>>>>>>>> affected, what is the exact nature of the bug, can it be 
>>>>>>>> triggered from
>>>>>>>> EL0. Randomly papering over symptoms is not something I really like
>>>>>>>> doing, and is likely to generate problems on unaffected systems.
>>>>>>>
>>>>>>> The bug at hand is that the XZR is not deemed a valid source in the
>>>>>>> virtualization of the SMMU registers. It was identified and fixed 
>>>>>>> for
>>>>>>> all platforms that are shipping kernels based on v4.9 or later.
>>>>>>
>>>>>> When you say "fixed": Do you mean fixed in the firmware? Or by adding
>>>>>> a workaround in the shipped kernel?
>>>>>
>>>>> I mean that it's fixed in the firmware.
>>>>>
>>>>>> If the former, is this part of an official QC statement, with an
>>>>>> associated erratum number?
>>>>>
>>>>> I don't know, will get back to you on this one.
>>>>>
>>>>>> Is this really limited to the SMMU accesses?
>>>>>
>>>>> Yes.
>>>>>
>>>>>>> As such Angelo's list of affected platforms covers the high-profile
>>>>>>> ones. In particular MSM8996 and MSM8998 is getting pretty good 
>>>>>>> support
>>>>>>> upstream, if we can figure out a way around this issue.
>>>>>>
>>>>>> We'd need an exhaustive list of the affected SoCs, and work out if we
>>>>>> can limit the hack to the SMMU driver (cc'ing Robin, who's the one
>>>>>> who'd know about it).
>>>>>
>>>>> I will try to compose a list.
>>>>
>>>> FWIW, I have just been bitten by this issue. I needed to enable an 
>>>> SMMU to
>>>> filter PCIe EP accesses to system RAM (or something). I'm using an 
>>>> APQ8098
>>>> MEDIABOX dev board. My system hangs in arm_smmu_device_reset() doing:
>>>>
>>>>     /* Invalidate the TLB, just in case */
>>>>     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>>>     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>>
>>>>
>>>> With the 'Z' constraint, gcc generates:
>>>>
>>>>     str wzr, [x0]
>>>>
>>>> without the 'Z' constraint, gcc generates:
>>>>
>>>>     mov    w1, 0
>>>>     str w1, [x0]
>>>>
>>>>
>>>> I can work around the problem using the following patch:
>>>>
>>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>>> index 045d93884164..93117519aed8 100644
>>>> --- a/drivers/iommu/arm-smmu.c
>>>> +++ b/drivers/iommu/arm-smmu.c
>>>> @@ -59,6 +59,11 @@
>>>>    #include "arm-smmu-regs.h"
>>>> +static inline void qcom_writel(u32 val, volatile void __iomem *addr)
>>>> +{
>>>> +    asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
>>>> +}
>>>> +
>>>>    #define ARM_MMU500_ACTLR_CPRE        (1 << 1)
>>>>    #define ARM_MMU500_ACR_CACHE_LOCK    (1 << 26)
>>>> @@ -422,7 +427,7 @@ static void __arm_smmu_tlb_sync(struct 
>>>> arm_smmu_device *smmu,
>>>>    {
>>>>        unsigned int spin_cnt, delay;
>>>> -    writel_relaxed(0, sync);
>>>> +    qcom_writel(0, sync);
>>>>        for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>>>>            for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>>>>                if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
>>>> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct 
>>>> arm_smmu_device *smmu)
>>>>        }
>>>>        /* Invalidate the TLB, just in case */
>>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>> +    qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>>> +    qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>>        reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>>>>
>>>>
>>>>
>>>> Can a quirk be used to work around the issue?
>>>> Or can we just "pessimize" the 3 writes for everybody?
>>>> (Might be cheaper than a test anyway)
>>>
>>> If it really is just the SMMU driver which is affected, we can work 
>>> around
>>> it for free (not counting the 'cost' of slightly-weird-looking code, of
>>> course). If the diff below works as expected, I'll write it up properly.
>>>
>>> Robin.
>>> ----->8-----
>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>> index 045d93884164..7ff29e33298f 100644
>>> --- a/drivers/iommu/arm-smmu.c
>>> +++ b/drivers/iommu/arm-smmu.c
>>> @@ -422,7 +422,7 @@ static void __arm_smmu_tlb_sync(struct 
>>> arm_smmu_device
>>> *smmu,
>>>   {
>>>       unsigned int spin_cnt, delay;
>>>
>>> -    writel_relaxed(0, sync);
>>> +    writel_relaxed((unsigned long)sync, sync);
>>>       for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>>>           for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>>>               if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
>>> @@ -681,7 +681,12 @@ static void arm_smmu_write_context_bank(struct
>>> arm_smmu_device *smmu, int idx)
>>>
>>>       /* Unassigned context banks only need disabling */
>>>       if (!cfg) {
>>> -        writel_relaxed(0, cb_base + ARM_SMMU_CB_SCTLR);
>>> +        /*
>>> +         * For Qualcomm reasons, we want to guarantee that we write a
>>> +         * zero from a register which is not WZR. Fortunately, the cfg
>>> +         * logic here plays right into our hands...
>>> +         */
>>> +        writel_relaxed((unsigned long)cfg, cb_base + 
>>> ARM_SMMU_CB_SCTLR);
>>>           return;
>>>       }
>>>
>>> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct
>>> arm_smmu_device *smmu)
>>>       }
>>>
>>>       /* Invalidate the TLB, just in case */
>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>> +    writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>> +    writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>
>>>       reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>>>
>>
>> Given what we've seen from Clang for futex stuff in 32-bit ARM, are
>> you really sure that the above will not result in Clang still spotting
>> that the value is zero and using a wzr for all these cases?
> 
> The trick is that in the write-only TLBI cases the variable we're 
> passing in really is nonzero, so that can't possibly happen. For the 
> context bank reset, yes, I am assuming that no complier will ever be 
> perverse enough to detect that cfg is not written after the NULL check 
> and immediately reallocate it to XZR for no good reason. I'd like to 
> think that assumption is going to hold for the reasonable scope of this 
> particular workaround, though.

Well, crap. So much for that hubris...


00000000000000f0 <arm_smmu_write_context_bank>:
       f0:       52800504        mov     w4, #0x28 
// #40
       f4:       f940240a        ldr     x10, [x0,#72]
       f8:       a9411402        ldp     x2, x5, [x0,#16]
       fc:       9b247c24        smull   x4, w1, w4
      100:       8b040148        add     x8, x10, x4
      104:       1ac52023        lsl     w3, w1, w5
      108:       8b23c042        add     x2, x2, w3, sxtw
      10c:       f9401107        ldr     x7, [x8,#32]
      110:       b5000067        cbnz    x7, 11c 
<arm_smmu_write_context_bank+0x2c>
      114:       b900005f        str     wzr, [x2]


Time to come up with a better SCTLR reset value, I guess.

Robin.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ