[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <369d1be2-d418-1bfb-bfc2-b25e4e542d76@bytedance.com>
Date: Fri, 5 May 2023 17:48:55 +0800
From: Gang Li <ligang.bdlg@...edance.com>
To: Mark Rutland <mark.rutland@....com>,
Gang Li <ligang.bdlg@...edance.com>
Cc: Will Deacon <will@...nel.org>,
Tomasz Nowicki <tomasz.nowicki@...aro.org>,
Laura Abbott <lauraa@...eaurora.org>,
Catalin Marinas <catalin.marinas@....com>,
Ard Biesheuvel <ardb@...nel.org>,
Anshuman Khandual <anshuman.khandual@....com>,
Kefeng Wang <wangkefeng.wang@...wei.com>,
Feiyang Chen <chenfeiyang@...ngson.cn>,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [QUESTION FOR ARM64 TLB] performance issue and implementation
difference of TLB flush
This series accidentally lost CC. Now I forward the lost emails to the
mailing list.
On 2023/4/28 17:27, Mark Rutland wrote:
>
>
> Hi,
>
> Just to check -- did you mean to drop the other Ccs? It would be good to keep
> this discussion on-list if possible.
>
> On Fri, Apr 28, 2023 at 01:49:46PM +0800, Gang Li wrote:
>> On 2023/4/27 15:30, Mark Rutland wrote:
>>> On Thu, Apr 27, 2023 at 11:26:50AM +0800, Gang Li wrote:
>>>> 1. I am curious to know the reason behind the design choice of flushing
>>>> the TLB on all cores for ARM64's clear_fixmap, while AMD64 only flushes
>>>> the TLB on a single core. Are there any TLB design details that make a
>>>> difference here?
>>>
>>> I don't know why arm64 only clears this on a single CPU.
>>
>> Sorry, I'm a bit confused.
>>
>> Did you mean you don't know why *amd64* only clears this on a single
>> CPU?
>
> Yes, sorry; I meant to say "amd64" rather than "arm64" here.
>
>> Looks like I should ask amd64 guy 😉
>
> 😉
>
>>> On arm64 we *must* invalidate the TLB on all CPUs as the kernel page tables are
>>> shared by all CPUs, and the architectural Break-Before-Make rules in require
>>> the TLB to be invalidated between two valid (but distinct) entries.
>>
>> ghes_unmap is protected by a spin_lock, so only one core can access this
>> mem area at a time. I understand that there will be no TLB for
>> this memory area on other cores.
>>
>> Is it because arm64 has speculative execution? Even if the core does not
>> hold the spin_lock, the TLB will still cache the critical section?
>
> The architecture allows a CPU to allocate TLB entries at any time for any
> reason, for any valid translation table entries reachable from the root in
> TTBR{0,1}_ELx. That can be due to speculation, prefetching, and/or other
> reasons.
>
> Due to that, it doesn't matter whether or not a CPU explicitly accesses a
> memory location -- TLB entries can be allocated regardless. Consequently, the
> spinlock doesn't make any difference.
>
> Thanks,
> Mark.
>
Powered by blists - more mailing lists