[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <871podlr8m.fsf@DESKTOP-5N7EMDA>
Date: Thu, 11 Sep 2025 09:20:25 +0800
From: "Huang, Ying" <ying.huang@...ux.alibaba.com>
To: Ryan Roberts <ryan.roberts@....com>, Yang Shi <yang@...amperecomputing.com>
Cc: Catalin Marinas <catalin.marinas@....com>, Will Deacon
<will@...nel.org>, Mark Rutland <mark.rutland@....com>, James Morse
<james.morse@....com>, "Christoph Lameter (Ampere)" <cl@...two.org>,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v1 2/2] arm64: tlbflush: Don't broadcast if mm was
only active on local cpu
Yang Shi <yang@...amperecomputing.com> writes:
> Hi Ryan,
>
>
> On 8/29/25 8:35 AM, Ryan Roberts wrote:
>> There are 3 variants of tlb flush that invalidate user mappings:
>> flush_tlb_mm(), flush_tlb_page() and __flush_tlb_range(). All of these
>> would previously unconditionally broadcast their tlbis to all cpus in
>> the inner shareable domain.
>>
>> But this is a waste of effort if we can prove that the mm for which we
>> are flushing the mappings has only ever been active on the local cpu. In
>> that case, it is safe to avoid the broadcast and simply invalidate the
>> current cpu.
>>
>> So let's track in mm_context_t::active_cpu either the mm has never been
>> active on any cpu, has been active on more than 1 cpu, or has been
>> active on precisely 1 cpu - and in that case, which one. We update this
>> when switching context, being careful to ensure that it gets updated
>> *before* installing the mm's pgtables. On the reader side, we ensure we
>> read *after* the previous write(s) to the pgtable(s) that necessitated
>> the tlb flush have completed. This guarrantees that if a cpu that is
>> doing a tlb flush sees it's own id in active_cpu, then the old pgtable
>> entry cannot have been seen by any other cpu and we can flush only the
>> local cpu.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@....com>
>> ---
>> arch/arm64/include/asm/mmu.h | 12 ++++
>> arch/arm64/include/asm/mmu_context.h | 2 +
>> arch/arm64/include/asm/tlbflush.h | 90 +++++++++++++++++++++++++---
>> arch/arm64/mm/context.c | 30 +++++++++-
>> 4 files changed, 123 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
>> index 6e8aa8e72601..ca32fb860309 100644
>> --- a/arch/arm64/include/asm/mmu.h
>> +++ b/arch/arm64/include/asm/mmu.h
>> @@ -17,6 +17,17 @@
>> #include <linux/refcount.h>
>> #include <asm/cpufeature.h>
>> +/*
>> + * Sentinal values for mm_context_t::active_cpu. ACTIVE_CPU_NONE indicates the
>> + * mm has never been active on any CPU. ACTIVE_CPU_MULTIPLE indicates the mm
>> + * has been active on multiple CPUs. Any other value is the ID of the single
>> + * CPU that the mm has been active on.
>> + */
>> +enum active_cpu {
>> + ACTIVE_CPU_NONE = UINT_MAX,
>> + ACTIVE_CPU_MULTIPLE = UINT_MAX - 1,
>> +};
>> +
>> typedef struct {
>> atomic64_t id;
>> #ifdef CONFIG_COMPAT
>> @@ -26,6 +37,7 @@ typedef struct {
>> void *vdso;
>> unsigned long flags;
>> u8 pkey_allocation_map;
>> + unsigned int active_cpu;
>
> Any reason why you don't use bit mask to mark the active CPUs? And
> mm_struct also has cpu_bitmap to record the active CPUs which the
> process has run on. Why not just use it? x86 uses it to determine
> which CPUs kernel should send TLB flush IPI to. I understand this
> series just check whether local cpu is the active cpu or not, but bit
> mask should not make things more complicated. And it also can provide
> more flexibility. We can extend this, for example, use IPI to send
> local TLB flush if the number of active cpus is quite low. AFAIK, x86
> added TLBI broadcast support too, and fallback to IPI if the number of
> active cpus is <= 3. IIRC, Christohper's series did the similar
> thing. He should be interested in this series too, cc'ed him.
Agree! One possible disadvantage of this series is that the benefit
will gone after the process is migrated to another CPU. This is quite
common if the process isn't bound to a CPU on a system without many
CPUs. A cpumask will be helpful for this situation.
>
>> } mm_context_t;
>> /*
[snip]
---
Best Regards,
Huang, Ying
Powered by blists - more mailing lists