lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fd8bda4b-32ee-d06d-af77-12e30e70c0bf@bytedance.com>
Date:   Tue, 16 May 2023 15:47:16 +0800
From:   Gang Li <ligang.bdlg@...edance.com>
To:     Mark Rutland <mark.rutland@....com>
Cc:     Will Deacon <will@...nel.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Ard Biesheuvel <ardb@...nel.org>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>,
        Feiyang Chen <chenfeiyang@...ngson.cn>,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [QUESTION FOR ARM64 TLB] performance issue and implementation
 difference of TLB flush

Hi,

On 2023/5/9 22:30, Mark Rutland wrote:
> For example, early in D8.13 we have the rule:
> 
> | R_SQBCS
> |
> |   When address translation is enabled, a translation table entry for an
> |   in-context translation regime that does not cause a Translation fault, an
> |   Address size fault, or an Access flag fault is permitted to be cached in a
> |   TLB or intermediate TLB caching structure as the result of an explicit or
> |   speculative access.
> 

Thanks a lot!

I looked up the x86 manual and found that the x86 TLB cache mechanism is
similar to arm64 (but the x86 guys haven't reply me yet):

IntelĀ® 64 and IA-32 Architectures Software Developer Manuals:
> 4.10.2.3 Details of TLB Use
> Subject to the limitations given in the previous paragraph, the
> processor may cache a translation for any linear address, even if that
> address is not used to access memory. For example, the processor may
> cache translations required for prefetches and for accesses that result
> from speculative execution that would never actually occur in the
> executed code path.

Both architectures have similar TLB cache policies, why arm64 flush all
and x86 flush local in ghes_map and ghes_unmap?

I think flush all may be unnecessary.

1. Before accessing ghes data. Each CPU needs to call ghes_map, which
will create the mapping and flush their own TLb to make sure the current
CPU is using the latest mapping.

2. And there is no need to flush all in ghes_unmap, because the ghes_map
of other CPUs will flush their own TLBs before accessing the memory.

What do you think?

Thanks,
Gang Li.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ