[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <878qmpwl0i.fsf@linux.dev>
Date: Wed, 21 May 2025 17:23:57 -0700
From: Roman Gushchin <roman.gushchin@...ux.dev>
To: Jann Horn <jannh@...gle.com>
Cc: Hugh Dickins <hughd@...gle.com>, Andrew Morton
<akpm@...ux-foundation.org>, Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, Will Deacon
<will@...nel.org>, "Aneesh Kumar K.V" <aneesh.kumar@...nel.org>, Nick
Piggin <npiggin@...il.com>, linux-arch@...r.kernel.org
Subject: Re: [PATCH v5] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP
vmas into free_pgtables()
Jann Horn <jannh@...gle.com> writes:
> On Tue, May 20, 2025 at 11:50 PM Roman Gushchin
> <roman.gushchin@...ux.dev> wrote:
>> Commit b67fbebd4cf9 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas")
>> added a forced tlbflush to tlb_vma_end(), which is required to avoid a
>> race between munmap() and unmap_mapping_range(). However it added some
>> overhead to other paths where tlb_vma_end() is used, but vmas are not
>> removed, e.g. madvise(MADV_DONTNEED).
>>
>> Fix this by moving the tlb flush out of tlb_end_vma() into new
>> tlb_flush_vmas() called from free_pgtables(), somewhat similar to the
>> stable version of the original commit:
>> commit 895428ee124a ("mm: Force TLB flush for PFNMAP mappings before
>> unlink_file_vma()").
>>
>> Note, that if tlb->fullmm is set, no flush is required, as the whole
>> mm is about to be destroyed.
>>
>> Signed-off-by: Roman Gushchin <roman.gushchin@...ux.dev>
>> Cc: Jann Horn <jannh@...gle.com>
>> Cc: Peter Zijlstra <peterz@...radead.org>
>> Cc: Will Deacon <will@...nel.org>
>> Cc: "Aneesh Kumar K.V" <aneesh.kumar@...nel.org>
>> Cc: Andrew Morton <akpm@...ux-foundation.org>
>> Cc: Nick Piggin <npiggin@...il.com>
>> Cc: Hugh Dickins <hughd@...gle.com>
>> Cc: linux-arch@...r.kernel.org
>> Cc: linux-mm@...ck.org
>>
>> ---
>> v5:
>> - tlb_free_vma() -> tlb_free_vmas() to avoid extra checks
>>
>> v4:
>> - naming/comments update (by Peter Z.)
>> - check vma->vma->vm_flags in tlb_free_vma() (by Peter Z.)
>>
>> v3:
>> - added initialization of vma_pfn in __tlb_reset_range() (by Hugh D.)
>>
>> v2:
>> - moved vma_pfn flag handling into tlb.h (by Peter Z.)
>> - added comments (by Peter Z.)
>> - fixed the vma_pfn flag setting (by Hugh D.)
>> ---
>> include/asm-generic/tlb.h | 49 +++++++++++++++++++++++++++++++--------
>> mm/memory.c | 2 ++
>> 2 files changed, 41 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
>> index 88a42973fa47..8a8b9535a930 100644
>> --- a/include/asm-generic/tlb.h
>> +++ b/include/asm-generic/tlb.h
>> @@ -58,6 +58,11 @@
>> * Defaults to flushing at tlb_end_vma() to reset the range; helps when
>> * there's large holes between the VMAs.
>> *
>> + * - tlb_free_vmas()
>> + *
>> + * tlb_free_vmas() marks the start of unlinking of one or more vmas
>> + * and freeing page-tables.
>> + *
>> * - tlb_remove_table()
>> *
>> * tlb_remove_table() is the basic primitive to free page-table directories
>> @@ -399,7 +404,10 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
>> * Do not reset mmu_gather::vma_* fields here, we do not
>> * call into tlb_start_vma() again to set them if there is an
>> * intermediate flush.
>> + *
>> + * Except for vma_pfn, that only cares if there's pending TLBI.
>> */
>> + tlb->vma_pfn = 0;
>
> This looks dodgy to me. Can you explain here in more detail why this
> is okay? Looking at current mainline, tlb->vma_pfn is only set to 1
> when tlb_start_vma() calls into tlb_update_vma_flags(); it is never
> set again after tlb_start_vma(), so I don't think it's legal to just
> clear it in the middle of a VMA.
>
> If we had something like this callgraph on a VM_MIXEDMAP mapping, with
> an intermediate TLB flush in the middle of the VM_MIXEDMAP mapping:
>
> tlb_start_vma()
> [sets tlb->vma_pfn]
> zap_pte_range
> tlb_flush_mmu_tlbonly
> __tlb_reset_range
> [clears tlb->vma_pfn]
> zap_pte_range
> [zaps more PTEs and queues a pending TLB flush]
> tlb_end_vma()
> free_pgtables
> tlb_free_vmas
> [checks for tlb->vma_pfn]
>
> then tlb_free_vmas() will erroneously not do a flush when it should've
> done one, right?
Good catch!
>
> Why does it even matter to you whether tlb->vma_pfn ever gets reset? I
> think more or less at worst you do one extra TLB flush in some case
> involving a munmap() across multiple VMAs including a mix of
> VM_PFNMAP/VM_MIXEDMAP and non-VM_PFNMAP/VM_MIXEDMAP VMAs (which is
> already a fairly weird scenario on its own), with the region being
> operated on large enough or complicated enough that you already did at
> least one TLB flush, and the unmap sufficiently large or with
> sufficient address space around it that page tables are getting freed,
> or something like that? That seems like a scenario in which one more
> flush isn't going to be a big deal.
Agree.
>
> If you wanted to do this properly, I think you could do something like:
>
> - add another flag tlb->current_vma_pfn that tracks whether the
> current vma is pfnmap/mixedmap
> - reset tlb->vma_pfn on TLB flush
> - set tlb->vma_pfn again if a TLB flush is enqueued while
> tlb->current_vma_pfn is true
>
> But that seems way too complicated, so I would just delete these three
> lines from the patch.
Agree, except tlb->vma_pfn needs to be initialized somewhere. This
is primarily why these lines were added based on the feedback to the
original version. However we missed the race you pointed at.
I guess instead tlb->vma_pfn can be initialized in __tlb_gather_mmu().
I'll send out the fixed version shortly.
Thank you!
Powered by blists - more mailing lists