linux-kernel - Re: [QUESTION FOR ARM64 TLB] performance issue and implementation difference of TLB flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <369d1be2-d418-1bfb-bfc2-b25e4e542d76@bytedance.com>
Date:   Fri, 5 May 2023 17:48:55 +0800
From:   Gang Li <ligang.bdlg@...edance.com>
To:     Mark Rutland <mark.rutland@....com>,
        Gang Li <ligang.bdlg@...edance.com>
Cc:     Will Deacon <will@...nel.org>,
        Tomasz Nowicki <tomasz.nowicki@...aro.org>,
        Laura Abbott <lauraa@...eaurora.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Ard Biesheuvel <ardb@...nel.org>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>,
        Feiyang Chen <chenfeiyang@...ngson.cn>,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [QUESTION FOR ARM64 TLB] performance issue and implementation
 difference of TLB flush

This series accidentally lost CC. Now I forward the lost emails to the
mailing list.

On 2023/4/28 17:27, Mark Rutland wrote:
> 
> 
> Hi,
> 
> Just to check -- did you mean to drop the other Ccs? It would be good to keep
> this discussion on-list if possible.
> 
> On Fri, Apr 28, 2023 at 01:49:46PM +0800, Gang Li wrote:
>> On 2023/4/27 15:30, Mark Rutland wrote:
>>> On Thu, Apr 27, 2023 at 11:26:50AM +0800, Gang Li wrote:
>>>> 1. I am curious to know the reason behind the design choice of flushing
>>>> the TLB on all cores for ARM64's clear_fixmap, while AMD64 only flushes
>>>> the TLB on a single core. Are there any TLB design details that make a
>>>> difference here?
>>>
>>> I don't know why arm64 only clears this on a single CPU.
>>
>> Sorry, I'm a bit confused.
>>
>> Did you mean you don't know why *amd64* only clears this on a single
>> CPU?
> 
> Yes, sorry; I meant to say "amd64" rather than "arm64" here.
> 
>> Looks like I should ask amd64 guy 😉
> 
> 😉
> 
>>> On arm64 we *must* invalidate the TLB on all CPUs as the kernel page tables are
>>> shared by all CPUs, and the architectural Break-Before-Make rules in require
>>> the TLB to be invalidated between two valid (but distinct) entries.
>>
>> ghes_unmap is protected by a spin_lock, so only one core can access this
>> mem area at a time. I understand that there will be no TLB for
>> this memory area on other cores.
>>
>> Is it because arm64 has speculative execution? Even if the core does not
>> hold the spin_lock, the TLB will still cache the critical section?
> 
> The architecture allows a CPU to allocate TLB entries at any time for any
> reason, for any valid translation table entries reachable from the root in
> TTBR{0,1}_ELx. That can be due to speculation, prefetching, and/or other
> reasons.
> 
> Due to that, it doesn't matter whether or not a CPU explicitly accesses a
> memory location -- TLB entries can be allocated regardless. Consequently, the
> spinlock doesn't make any difference.
> 
> Thanks,
> Mark.
>