[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171130124319.ovyierac7ywxzhjy@hirez.programming.kicks-ass.net>
Date: Thu, 30 Nov 2017 13:43:19 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>,
Andy Lutomirski <luto@...nel.org>,
Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...en8.de>,
Brian Gerst <brgerst@...il.com>,
Denys Vlasenko <dvlasenk@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Rik van Riel <riel@...hat.com>, daniel.gruss@...k.tugraz.at,
hughd@...gle.com, keescook@...gle.com, linux-mm@...ck.org,
michael.schwarz@...k.tugraz.at, moritz.lipp@...k.tugraz.at,
richard.fellner@...dent.tugraz.at
Subject: Re: [PATCH 6/6] x86/mm/kaiser: Optimize __native_flush_tlb
On Wed, Nov 29, 2017 at 11:33:07AM +0100, Peter Zijlstra wrote:
> static inline void __native_flush_tlb(void)
> {
> + flush_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid));
>
> /*
> + * If current->mm == NULL then we borrow a mm
> + * which may change during a task switch and
> + * therefore we must not be preempted while we
> + * write CR3 back:
> */
> + preempt_disable();
> + native_write_cr3(__native_read_cr3());
> + preempt_enable();
> + /*
> + * Does not need tlb_flush_shared_nonglobals()
> + * since the CR3 write without PCIDs flushes all
> + * non-globals.
> + */
> + return;
OK, so seeing that comment today made me realize I had so far failed to
audit the whole flush user vs flush kernel thing.
In short the above comment is complete crap.
> }
The longer story is that:
flush_tlb_all()
flush_tlb_kernel_range()
need to flush kernel pages and thus flush _all_ the (kernel) ASIDs.
Whereas:
flush_tlb_mm()
flush_tlb_range()
flush_tlb_page()
Only flush user pages, and thus only need to flush the respective user
and kernel ASID.
The last 3 all map to flush_tlb_mm_range() which, through
flush_tlb_func_{local,remote} ends up in flush_tlb_func_common(), which
then uses either __flush_tlb() or __flush_tlb_single().
Both __flush_tlb() (the above function) and __flush_tlb_single() only
(need to) flush the 2 ASIDs that contain the user mapping.
Now the problem is that flush_tlb_kernel_range() is implemented using
either __flush_tlb_all() or __flush_tlb_single(), and it is that last
use that is buggered.
So at the very least we need the below to cure things, but there is
another inconsistency; do_flush_tlb_all() is used by both
flush_tlb_all() and flush_tlb_kernel_range() and increments NR_TLB_*,
do_kernel_range_flush() OTOH does not increment NR_TLB_*. I'm not fixing
that, but I'll leave a comment around or something, so we can later try
and figure out what exact statistics we want.
---
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 9587722162ee..ccaf6e126582 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -388,12 +388,6 @@ static inline void __native_flush_tlb(void)
preempt_disable();
native_write_cr3(__native_read_cr3());
preempt_enable();
- /*
- * Does not need tlb_flush_shared_nonglobals()
- * since the CR3 write without PCIDs flushes all
- * non-globals.
- */
- return;
}
static inline void __native_flush_tlb_global_irq_disabled(void)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 122c48fa6012..24bd86118b46 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -609,6 +609,8 @@ static void do_kernel_range_flush(void *info)
/* flush range by one by one 'invlpg' */
for (addr = f->start; addr < f->end; addr += PAGE_SIZE)
__flush_tlb_single(addr);
+
+ tlb_flush_shared_nonglobals();
}
void flush_tlb_kernel_range(unsigned long start, unsigned long end)
Powered by blists - more mailing lists