linux-kernel - Re: [PATCH 0/5] arm64: Add workaround for Cortex-A77 erratum 1542418

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <14773d6b-96d5-b894-7fc4-17c54f15ee30@arm.com>
Date:   Fri, 15 Nov 2019 01:14:07 +0000
From:   Suzuki K Poulose <suzuki.poulose@....com>
To:     will@...nel.org
Cc:     linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        james.morse@....com, catalin.marinas@....com, mark.rutland@....com,
        maz@...nel.org
Subject: Re: [PATCH 0/5] arm64: Add workaround for Cortex-A77 erratum 1542418

Hi Will

On 11/14/2019 04:39 PM, Will Deacon wrote:
> Hi Suzuki,
> 
> On Thu, Nov 14, 2019 at 02:59:13PM +0000, Suzuki K Poulose wrote:
>> This series adds workaround for Arm erratum 1542418 which affects
> 
> Searching for that erratum number doesn't find me a description :(

I believe this was published in the Cortex-A77 SDEN v9.0. I will chase
it internally.

> 
>> Cortex-A77 cores (r0p0 - r1p0). Affected cores may execute stale
>> instructions from the L0 macro-op cache violating the
>> prefetch-speculation-protection guaranteed by the architecture.
>> This happens when the when the branch predictor bases its predictions
>> on a branch at this address on the stale history due to ASID or VMID
>> reuse.
> 
> Two immediate questions:
> 
>   1. Can we disable the L0 MOP cache?
Yes, but it hurts performance.

>   2. Can we invalidate the branch predictor? If Spectre-v2 taught us
>      anything it's that removing those instructions was a mistake!

The workaround suggested is actually invalidating the branch history
but in a costly way. I am unaware of any.
> Moving on...
> 
> Have you reproduced this at top-level? If I recall the
> prefetch-speculation-protection, it's designed to protect against the
> case where you have a direct branch:

No, see below.

> 
> addr:	B	foo
> 
> and another CPU writes out a new function:
> 
> bar:
> 	insn0
> 	...
> 	insnN
> 
> before doing any necessary maintenance and then patches the original
> branch to:
> 
> addr:	B	bar
> 
> The idea is that a concurrently executing CPU could mispredict the original
> branch to point at 'bar', fetch the instructions before they've been written
> out and then confirm the prediction by looking at the newly written branch
> instruction. Even without the prefetch-speculation-protection, that's
> fairly difficult to achieve in practice: you'd need to be doing something
> like reusing memory to hold the instructions so that the initial
> misprediction occurs.
> 
> How does A77 stop this from occurring when the ASID is not reallocated (e.g.
> the example above)? Is the MOP cache flushed somehow?

IIUC, The MOP cache is flushed on I-cache invalidate, thus it is fine.	

> 
> With this erratum, it sounds like you have to end up reusing an ASID from
> a task that had a branch at 'addr' in its address space that branched to
> the address of 'bar' (again. in its address space). Is that right? That
> sounds super rare to me, particularly with ASLR: not only does the aliasing

AFAICS, yes and on top of that, it should also miss "addr" in MOP-cache
and hit "bar" before the I-cache invalidate is received. This may cause
the "bar" to be fetched from mop (and is not canceled even though there
was a mop-flush triggered by the i-cache invalidate after the hit) and
"addr" should miss in I-cache, causing it to fetch the updated instruction.

Also this means that the new context must not have executed "addr"
(which would give a hit in MOP-cache) while "bar" was fetched. So,
this adds on more constraints to actually hit it.

> branch need to exist, but it needs to be held in the branch predictor while
> we cycle through 64k ASIDs *and* the race with the writer needs to happen
> so that we get stale instructions from the MOP cache.
> 
> Is there something I'm missing that makes this remotely plausible?

No :-)

Cheers
Suzuki