[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAG48ez39-KgQ6wFy65huzaCPZfiuBQoPJR-D9peS++i7aaVMfA@mail.gmail.com>
Date: Mon, 6 Jan 2025 14:25:54 +0100
From: Jann Horn <jannh@...gle.com>
To: Rik van Riel <riel@...riel.com>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>, Andy Lutomirski <luto@...nel.org>,
Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH] x86/mm: Fix flush_tlb_range() when used for zapping
normal PMDs
On Sat, Jan 4, 2025 at 4:09 AM Rik van Riel <riel@...riel.com> wrote:
> On Fri, 2025-01-03 at 23:11 +0100, Jann Horn wrote:
> > On Fri, Jan 3, 2025 at 10:55 PM Rik van Riel <riel@...riel.com>
> > wrote:
> > > On Fri, 2025-01-03 at 19:39 +0100, Jann Horn wrote:
> > > > 02fc2aa06e9e0ecdba3fe948cafe5892b72e86c0..3da645139748538daac7016
> > > > 6618d
> > > > 8ad95116eb74 100644
> > > > --- a/arch/x86/include/asm/tlbflush.h
> > > > +++ b/arch/x86/include/asm/tlbflush.h
> > > > @@ -242,7 +242,7 @@ void flush_tlb_multi(const struct cpumask
> > > > *cpumask,
> > > > flush_tlb_mm_range((vma)->vm_mm, start,
> > > > end, \
> > > > ((vma)->vm_flags &
> > > > VM_HUGETLB) \
> > > > ?
> > > > huge_page_shift(hstate_vma(vma)) \
> > > > - : PAGE_SHIFT, false)
> > > > + : PAGE_SHIFT, true)
> > > >
> > > >
> > >
> > > The code looks good, but should this macro get
> > > a comment indicating that code that only frees
> > > pages, but not page tables, should be calling
> > > flush_tlb() instead?
> >
> > Documentation/core-api/cachetlb.rst seems to be the common place
> > that's supposed to document the rules - the macro I'm touching is
> > just
> > the x86 implementation. (The arm64 implementation also has some
> > fairly
> > extensive comments that say flush_tlb_range() "also invalidates any
> > walk-cache entries associated with translations for the specified
> > address range" while flush_tlb_page() "only invalidates a single,
> > last-level page-table entry and therefore does not affect any
> > walk-caches".) I wouldn't want to add yet more documentation for this
> > API inside the X86 code. I guess it would make sense to add pointers
> > from the x86 code to the documentation (and copy the details about
> > last-level TLBs from the arm64 code into the docs).
> >
> > I don't see a function flush_tlb() outside of some (non-x86) arch
> > code.
>
> I see zap_pte_range() calling tlb_flush_mmu(),
> which calls tlb_flush_mmu_tlbonly() in include/asm-generic/tlb.h,
> which in turn calls tlb_flush().
>
> The asm-generic version of tlb_flush() goes through
> flush_tlb_mm(), which on x86 would call flush_tlb_mm_range
> with flush_tables = true.
>
> Luckily x86 seems to have its own implementation of
> tlb_flush(), which avoids that issue.
Aah, right. Yeah, I think the tlb_flush() infrastructure with "struct
mmu_gather" is probably one of the two really optimized TLB flushing
hotpaths (the other one being the reclaim path).
I think tlb_flush() is for somewhat different use cases though - my
understanding is that it is mainly for operations that need batching
and/or want to delay TLB flushes while dropping page table locks.
> > I don't know if it makes sense to tell developers to not use
> > flush_tlb_range() for freeing pages. If the performance of
> > flush_tlb_range() actually is an issue, I guess one fix would be to
> > refactor this and add a parameter or something?
> >
>
> I don't know whether this is a real issue on
> architectures other than x86.
arm64 seems to have code specifically for doing flushes without
affecting cached higher-level entries - __flush_tlb_range_nosync()
receives a "last_level" parameter (which is plumbed through from the
arm64 version of tlb_flush()) and picks "vale1is" or "vae1is"
depending on it.
> For now it looks like the code does the right
> thing when only pages are being freed, so we
> may not need that parameter.
>
> --
> All Rights Reversed.
Powered by blists - more mailing lists