linux-kernel - Re: [QUESTION FOR ARM64 TLB] performance issue and implementation difference of TLB flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fd8bda4b-32ee-d06d-af77-12e30e70c0bf@bytedance.com>
Date:   Tue, 16 May 2023 15:47:16 +0800
From:   Gang Li <ligang.bdlg@...edance.com>
To:     Mark Rutland <mark.rutland@....com>
Cc:     Will Deacon <will@...nel.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Ard Biesheuvel <ardb@...nel.org>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>,
        Feiyang Chen <chenfeiyang@...ngson.cn>,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [QUESTION FOR ARM64 TLB] performance issue and implementation
 difference of TLB flush

Hi,

On 2023/5/9 22:30, Mark Rutland wrote:
> For example, early in D8.13 we have the rule:
> 
> | R_SQBCS
> |
> |   When address translation is enabled, a translation table entry for an
> |   in-context translation regime that does not cause a Translation fault, an
> |   Address size fault, or an Access flag fault is permitted to be cached in a
> |   TLB or intermediate TLB caching structure as the result of an explicit or
> |   speculative access.
> 

Thanks a lot!

I looked up the x86 manual and found that the x86 TLB cache mechanism is
similar to arm64 (but the x86 guys haven't reply me yet):

Intel® 64 and IA-32 Architectures Software Developer Manuals:
> 4.10.2.3 Details of TLB Use
> Subject to the limitations given in the previous paragraph, the
> processor may cache a translation for any linear address, even if that
> address is not used to access memory. For example, the processor may
> cache translations required for prefetches and for accesses that result
> from speculative execution that would never actually occur in the
> executed code path.

Both architectures have similar TLB cache policies, why arm64 flush all
and x86 flush local in ghes_map and ghes_unmap?

I think flush all may be unnecessary.

1. Before accessing ghes data. Each CPU needs to call ghes_map, which
will create the mapping and flush their own TLb to make sure the current
CPU is using the latest mapping.

2. And there is no need to flush all in ghes_unmap, because the ghes_map
of other CPUs will flush their own TLBs before accessing the memory.

What do you think?

Thanks,
Gang Li.