[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez39SaJe=cwq3JZ6UM0BLMHEj76Kdmd9=Ho1nr17fAco6Q@mail.gmail.com>
Date: Wed, 21 May 2025 02:38:20 +0200
From: Jann Horn <jannh@...gle.com>
To: Roman Gushchin <roman.gushchin@...ux.dev>, Hugh Dickins <hughd@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Will Deacon <will@...nel.org>, "Aneesh Kumar K.V" <aneesh.kumar@...nel.org>,
Nick Piggin <npiggin@...il.com>, linux-arch@...r.kernel.org
Subject: Re: [PATCH v5] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP
vmas into free_pgtables()
On Tue, May 20, 2025 at 11:50 PM Roman Gushchin
<roman.gushchin@...ux.dev> wrote:
> Commit b67fbebd4cf9 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas")
> added a forced tlbflush to tlb_vma_end(), which is required to avoid a
> race between munmap() and unmap_mapping_range(). However it added some
> overhead to other paths where tlb_vma_end() is used, but vmas are not
> removed, e.g. madvise(MADV_DONTNEED).
>
> Fix this by moving the tlb flush out of tlb_end_vma() into new
> tlb_flush_vmas() called from free_pgtables(), somewhat similar to the
> stable version of the original commit:
> commit 895428ee124a ("mm: Force TLB flush for PFNMAP mappings before
> unlink_file_vma()").
>
> Note, that if tlb->fullmm is set, no flush is required, as the whole
> mm is about to be destroyed.
>
> Signed-off-by: Roman Gushchin <roman.gushchin@...ux.dev>
> Cc: Jann Horn <jannh@...gle.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Will Deacon <will@...nel.org>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@...nel.org>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Nick Piggin <npiggin@...il.com>
> Cc: Hugh Dickins <hughd@...gle.com>
> Cc: linux-arch@...r.kernel.org
> Cc: linux-mm@...ck.org
>
> ---
> v5:
> - tlb_free_vma() -> tlb_free_vmas() to avoid extra checks
>
> v4:
> - naming/comments update (by Peter Z.)
> - check vma->vma->vm_flags in tlb_free_vma() (by Peter Z.)
>
> v3:
> - added initialization of vma_pfn in __tlb_reset_range() (by Hugh D.)
>
> v2:
> - moved vma_pfn flag handling into tlb.h (by Peter Z.)
> - added comments (by Peter Z.)
> - fixed the vma_pfn flag setting (by Hugh D.)
> ---
> include/asm-generic/tlb.h | 49 +++++++++++++++++++++++++++++++--------
> mm/memory.c | 2 ++
> 2 files changed, 41 insertions(+), 10 deletions(-)
>
> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index 88a42973fa47..8a8b9535a930 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -58,6 +58,11 @@
> * Defaults to flushing at tlb_end_vma() to reset the range; helps when
> * there's large holes between the VMAs.
> *
> + * - tlb_free_vmas()
> + *
> + * tlb_free_vmas() marks the start of unlinking of one or more vmas
> + * and freeing page-tables.
> + *
> * - tlb_remove_table()
> *
> * tlb_remove_table() is the basic primitive to free page-table directories
> @@ -399,7 +404,10 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
> * Do not reset mmu_gather::vma_* fields here, we do not
> * call into tlb_start_vma() again to set them if there is an
> * intermediate flush.
> + *
> + * Except for vma_pfn, that only cares if there's pending TLBI.
> */
> + tlb->vma_pfn = 0;
This looks dodgy to me. Can you explain here in more detail why this
is okay? Looking at current mainline, tlb->vma_pfn is only set to 1
when tlb_start_vma() calls into tlb_update_vma_flags(); it is never
set again after tlb_start_vma(), so I don't think it's legal to just
clear it in the middle of a VMA.
If we had something like this callgraph on a VM_MIXEDMAP mapping, with
an intermediate TLB flush in the middle of the VM_MIXEDMAP mapping:
tlb_start_vma()
[sets tlb->vma_pfn]
zap_pte_range
tlb_flush_mmu_tlbonly
__tlb_reset_range
[clears tlb->vma_pfn]
zap_pte_range
[zaps more PTEs and queues a pending TLB flush]
tlb_end_vma()
free_pgtables
tlb_free_vmas
[checks for tlb->vma_pfn]
then tlb_free_vmas() will erroneously not do a flush when it should've
done one, right?
Why does it even matter to you whether tlb->vma_pfn ever gets reset? I
think more or less at worst you do one extra TLB flush in some case
involving a munmap() across multiple VMAs including a mix of
VM_PFNMAP/VM_MIXEDMAP and non-VM_PFNMAP/VM_MIXEDMAP VMAs (which is
already a fairly weird scenario on its own), with the region being
operated on large enough or complicated enough that you already did at
least one TLB flush, and the unmap sufficiently large or with
sufficient address space around it that page tables are getting freed,
or something like that? That seems like a scenario in which one more
flush isn't going to be a big deal.
If you wanted to do this properly, I think you could do something like:
- add another flag tlb->current_vma_pfn that tracks whether the
current vma is pfnmap/mixedmap
- reset tlb->vma_pfn on TLB flush
- set tlb->vma_pfn again if a TLB flush is enqueued while
tlb->current_vma_pfn is true
But that seems way too complicated, so I would just delete these three
lines from the patch.
Powered by blists - more mailing lists