[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <90ed2574-74f8-47a0-ac46-4b9418c3242d@arm.com>
Date: Tue, 2 Sep 2025 17:54:07 +0100
From: Ryan Roberts <ryan.roberts@....com>
To: Catalin Marinas <catalin.marinas@....com>
Cc: Will Deacon <will@...nel.org>, Mark Rutland <mark.rutland@....com>,
James Morse <james.morse@....com>, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v1 2/2] arm64: tlbflush: Don't broadcast if mm was
only active on local cpu
On 02/09/2025 17:23, Catalin Marinas wrote:
> On Fri, Aug 29, 2025 at 04:35:08PM +0100, Ryan Roberts wrote:
>> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
>> index f66b8c4696d0..651440e0aff9 100644
>> --- a/arch/arm64/include/asm/tlbflush.h
>> +++ b/arch/arm64/include/asm/tlbflush.h
>> @@ -182,6 +182,52 @@ static inline unsigned long get_trans_granule(void)
>> (__pages >> (5 * (scale) + 1)) - 1; \
>> })
>>
>> +/*
>> + * Determines whether the user tlbi invalidation can be performed only on the
>> + * local CPU or whether it needs to be broadcast. (Returns true for local).
>> + * Additionally issues appropriate barrier to ensure prior pgtable updates are
>> + * visible to the table walker. Must be paired with flush_tlb_user_post().
>> + */
>> +static inline bool flush_tlb_user_pre(struct mm_struct *mm)
>> +{
>> + unsigned int self, active;
>> + bool local;
>> +
>> + migrate_disable();
>> +
>> + self = smp_processor_id();
>> +
>> + /*
>> + * The load of mm->context.active_cpu must not be reordered before the
>> + * store to the pgtable that necessitated this flush. This ensures that
>> + * if the value read is our cpu id, then no other cpu can have seen the
>> + * old pgtable value and therefore does not need this old value to be
>> + * flushed from its tlb. But we don't want to upgrade the dsb(ishst),
>> + * needed to make the pgtable updates visible to the walker, to a
>> + * dsb(ish) by default. So speculatively load without a barrier and if
>> + * it indicates our cpu id, then upgrade the barrier and re-load.
>> + */
>> + active = READ_ONCE(mm->context.active_cpu);
>> + if (active == self) {
>> + dsb(ish);
>> + active = READ_ONCE(mm->context.active_cpu);
>> + } else {
>> + dsb(ishst);
>> + }
>
> Does the ISH vs ISHST make much difference in practice? I wonder whether
> we could keep this simple.
I don't know. I was being conservative - I'm a bit nervous about upgrading a
barrier unconditionally. I'll run some benchmarks with the simpler version and
compare.
>
>> diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
>> index b2ac06246327..adf4fc26ddb6 100644
>> --- a/arch/arm64/mm/context.c
>> +++ b/arch/arm64/mm/context.c
>> @@ -214,9 +214,10 @@ static u64 new_context(struct mm_struct *mm)
>>
>> void check_and_switch_context(struct mm_struct *mm)
>> {
>> - unsigned long flags;
>> - unsigned int cpu;
>> + unsigned int cpu = smp_processor_id();
>> u64 asid, old_active_asid;
>> + unsigned int active;
>> + unsigned long flags;
>>
>> if (system_supports_cnp())
>> cpu_set_reserved_ttbr0();
>> @@ -251,7 +252,6 @@ void check_and_switch_context(struct mm_struct *mm)
>> atomic64_set(&mm->context.id, asid);
>> }
>>
>> - cpu = smp_processor_id();
>> if (cpumask_test_and_clear_cpu(cpu, &tlb_flush_pending))
>> local_flush_tlb_all();
>>
>> @@ -262,6 +262,30 @@ void check_and_switch_context(struct mm_struct *mm)
>>
>> arm64_apply_bp_hardening();
>>
>> + /*
>> + * Update mm->context.active_cpu in such a manner that we avoid cmpxchg
>> + * and dsb unless we definitely need it. If initially ACTIVE_CPU_NONE
>> + * then we are the first cpu to run so set it to our id. If initially
>> + * any id other than ours, we are the second cpu to run so set it to
>> + * ACTIVE_CPU_MULTIPLE. If we update the value then we must issue
>> + * dsb(ishst) to ensure stores to mm->context.active_cpu are ordered
>> + * against the TTBR0 write in cpu_switch_mm()/uaccess_enable(); the
>> + * store must be visible to another cpu before this cpu could have
>> + * populated any TLB entries based on the pgtables that will be
>> + * installed.
>> + */
>> + active = READ_ONCE(mm->context.active_cpu);
>> + if (active != cpu && active != ACTIVE_CPU_MULTIPLE) {
>> + if (active == ACTIVE_CPU_NONE)
>> + active = cmpxchg_relaxed(&mm->context.active_cpu,
>> + ACTIVE_CPU_NONE, cpu);
>> +
>> + if (active != ACTIVE_CPU_NONE)
>> + WRITE_ONCE(mm->context.active_cpu, ACTIVE_CPU_MULTIPLE);
>> +
>> + dsb(ishst);
>> + }
>
> I think this works. I recall James had a litmus test for the model
> checker confirming this.
>
> In a cut-down version, we'd have:
>
> P0: P1:
>
> set_pte(); WRITE_ONCE(active_cpu);
> dsb(); dsb();
> READ_ONCE(active_cpu); READ_ONCE(pte);
>
> The pte read on P1 is implied following the TTBR0 write. As you
> described, if P0 did not see the active_cpu update, P1 would have seen
> the updated pte.
>
> So far I couldn't fail this, so:
>
> Reviewed-by: Catalin Marinas <catalin.marinas@....com>
Hmm... Rb at v1.. are you feeling ok? :)
Powered by blists - more mailing lists