[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2adc4355-f1e2-4355-b04e-efae4425a3d3@linux.dev>
Date: Wed, 7 Jan 2026 10:47:50 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: Dave Hansen <dave.hansen@...el.com>, david@...nel.org
Cc: dave.hansen@...ux.intel.com, will@...nel.org, aneesh.kumar@...nel.org,
npiggin@...il.com, peterz@...radead.org, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, x86@...nel.org, hpa@...or.com,
arnd@...db.de, lorenzo.stoakes@...cle.com, ziy@...dia.com,
baolin.wang@...ux.alibaba.com, Liam.Howlett@...cle.com, npache@...hat.com,
ryan.roberts@....com, dev.jain@....com, baohua@...nel.org,
shy828301@...il.com, riel@...riel.com, jannh@...gle.com,
linux-arch@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, ioworker0@...il.com, akpm@...ux-foundation.org
Subject: Re: [PATCH RESEND v3 1/2] mm/tlb: skip redundant IPI when TLB flush
already synchronized
On 2026/1/7 00:24, Dave Hansen wrote:
> On 1/6/26 04:03, Lance Yang wrote:
>> From: Lance Yang <lance.yang@...ux.dev>
>>
>> When unsharing hugetlb PMD page tables, we currently send two IPIs: one
>> for TLB invalidation, and another to synchronize with concurrent GUP-fast
>> walkers via tlb_remove_table_sync_one().
>>
>> However, if the TLB flush already sent IPIs to all CPUs (when freed_tables
>> or unshared_tables is true), the second IPI is redundant. GUP-fast runs
>> with IRQs disabled, so when the TLB flush IPI completes, any concurrent
>> GUP-fast must have finished.
>>
>> To avoid the redundant IPI, we add a flag to mmu_gather to track whether
>> the TLB flush sent IPIs. We pass the mmu_gather pointer through the TLB
>> flush path via flush_tlb_info, so native_flush_tlb_multi() can set the
>> flag when it sends IPIs for freed_tables. We also set the flag for
>> local-only flushes, since disabling IRQs provides the same guarantee.
>
> The lack of imperative voice is killing me. :)
Oops.
>
>> diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h
>> index 866ea78ba156..c5950a92058c 100644
>> --- a/arch/x86/include/asm/tlb.h
>> +++ b/arch/x86/include/asm/tlb.h
>> @@ -20,7 +20,8 @@ static inline void tlb_flush(struct mmu_gather *tlb)
>> end = tlb->end;
>> }
>>
>> - flush_tlb_mm_range(tlb->mm, start, end, stride_shift, tlb->freed_tables);
>> + flush_tlb_mm_range(tlb->mm, start, end, stride_shift,
>> + tlb->freed_tables || tlb->unshared_tables, tlb);
>> }
>
> I think this hunk sums up v3 pretty well. Where there was a single boolean, now there are two. To add to that, the structure that contains the booleans is itself being passed in. The boolean is still named 'freed_tables', and is going from:
>
> tlb->freed_tables
>
> which is pretty obviously correct to:
>
> tlb->freed_tables || tlb->unshared_tables
>
> which is _far_ from obviously correct.
>
> I'm at a loss for why the patch wouldn't just do this:
>
> - flush_tlb_mm_range(tlb->mm, start, end, stride_shift, tlb->freed_tables);
> + flush_tlb_mm_range(tlb->mm, start, end, stride_shift, tlb);
>
> I suspect these were sent out in a bit of haste, which isn't the first time I've gotten that feeling with this series.
>
> Could we slow down, please?
Sorry, I went too fast ...
>
>> static inline void invlpg(unsigned long addr)
>> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
>> index 00daedfefc1b..83c260c88b80 100644
>> --- a/arch/x86/include/asm/tlbflush.h
>> +++ b/arch/x86/include/asm/tlbflush.h
>> @@ -220,6 +220,7 @@ struct flush_tlb_info {
>> * will be zero.
>> */
>> struct mm_struct *mm;
>> + struct mmu_gather *tlb;
>> unsigned long start;
>> unsigned long end;
>> u64 new_tlb_gen;
>
> This also gives me pause.
>
> There is a *lot* of redundant information between 'struct mmu_gather' and 'struct tlb_flush_info'. There needs to at least be a description of what the relationship is and how these relate to each other. I would have naively thought that the right move here would be to pull the mmu_gather data out at one discrete time rather than store a pointer to it.
>
> What I see here is, I suspect, the most expedient way to do it. I'd _certainly_ have done this myself if I was just hacking something together to play with as quickly as possible.
>
> So, in the end, I don't hate the approach here (yet). But it is almost impossible to evaluate it because the series is taking some rather egregious shortcuts and is lacking any real semblance of a refactoring effort.
The flag lifetime issue David pointed out is real, and you're right
about the messy parameters :)
And, yeah, I need to think more those. Maybe v3 can be fixed, or maybe
v2 is actually sufficient - it's conservative but safe (no false positives).
Will take more time, thanks!
Powered by blists - more mailing lists