[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160825150515.02c2d8ea@riellap.home.surriel.com>
Date: Thu, 25 Aug 2016 15:04:59 -0400
From: Rik van Riel <riel@...hat.com>
To: serebrin@...gle.com
Cc: mingo@...nel.org, peterz@...radead.org, hpa@...or.com,
linux-kernel@...r.kernel.org, luto@...nel.org, bp@...e.de,
mgorman@...e.de, tglx@...utronix.de
Subject: [PATCH RFC UGLY] x86,mm,sched: make lazy TLB mode even lazier
Subject: x86,mm,sched: make lazy TLB mode even lazier
Lazy TLB mode can result in an idle CPU being woken up for a TLB
flush, when all it really needed to do was flush %cr3 before the
next context switch.
This is mostly fine on bare metal, though sub-optimal from a power
saving point of view, and deeper C states could make TLB flushes
take a little longer than desired.
On virtual machines, the pain can be much worse, especially if a
currently non-running VCPU is woken up for a TLB invalidation
IPI, on a CPU that is busy running another task. It could take
a while before that IPI is handled, leading to performance issues.
This patch is still ugly, and the sched.h include needs to be cleaned
up a lot (how would the scheduler people like to see the context switch
blocking abstracted?)
This patch deals with the issue by introducing a third tlb state,
TLBSTATE_FLUSH, which causes %cr3 to be flushed at the next
context switch. A CPU is transitioned from TLBSTATE_LAZY to
TLBSTATE_FLUSH with the rq lock held, to prevent context switches.
Nothing is done for a CPU that is already in TLBSTATE_FLUH mode.
This patch is totally untested, because I am at a conference right
now, and Benjamin has the test case :)
Signed-off-by: Rik van Riel <riel@...hat.com>
Reported-by: Benjamin Serebrin <serebrin@...gle.com>
---
arch/x86/include/asm/tlbflush.h | 1 +
arch/x86/mm/tlb.c | 38 +++++++++++++++++++++++++++++++++++---
2 files changed, 36 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 4e5be94e079a..5ae8e4b174f8 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -310,6 +310,7 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
#define TLBSTATE_OK 1
#define TLBSTATE_LAZY 2
+#define TLBSTATE_FLUSH 3
static inline void reset_lazy_tlbstate(void)
{
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 5643fd0b1a7d..5b4cda49ac0c 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -6,6 +6,7 @@
#include <linux/interrupt.h>
#include <linux/module.h>
#include <linux/cpu.h>
+#include "../../../kernel/sched/sched.h"
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
@@ -140,10 +141,12 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
}
#ifdef CONFIG_SMP
else {
+ int oldstate = this_cpu_read(cpu_tlbstate.state);
this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK);
BUG_ON(this_cpu_read(cpu_tlbstate.active_mm) != next);
- if (!cpumask_test_cpu(cpu, mm_cpumask(next))) {
+ if (oldstate == TLBSTATE_FLUSH ||
+ !cpumask_test_cpu(cpu, mm_cpumask(next))) {
/*
* On established mms, the mm_cpumask is only changed
* from irq context, from ptep_clear_flush() while in
@@ -242,11 +245,29 @@ static void flush_tlb_func(void *info)
}
+/*
+ * This function moves a CPU from TLBSTATE_LAZY to TLBSTATE_FLUSH, which
+ * will force it to flush %cr3 at the next context switch, effectively
+ * doing a delayed TLB flush for a CPU in lazy TLB mode.
+ * This takes the runqueue lock to protect against the race condition
+ * of the target CPU rescheduling while we change its TLB state.
+ * Do nothing if the TLB state is already set to TLBSTATE_FLUSH.
+ */
+static void set_lazy_tlbstate_flush(int cpu) {
+ if (per_cpu(cpu_tlbstate.state, cpu) == TLBSTATE_LAZY) {
+ raw_spin_lock(&cpu_rq(cpu)->lock);
+ if (per_cpu(cpu_tlbstate.state, cpu) == TLBSTATE_LAZY)
+ per_cpu(cpu_tlbstate.state, cpu) = TLBSTATE_FLUSH;
+ raw_spin_unlock(&cpu_rq(cpu)->lock);
+ }
+}
+
void native_flush_tlb_others(const struct cpumask *cpumask,
struct mm_struct *mm, unsigned long start,
unsigned long end)
{
struct flush_tlb_info info;
+ unsigned int cpu;
if (end == 0)
end = start + PAGE_SIZE;
@@ -262,8 +283,6 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
(end - start) >> PAGE_SHIFT);
if (is_uv_system()) {
- unsigned int cpu;
-
cpu = smp_processor_id();
cpumask = uv_flush_tlb_others(cpumask, mm, start, end, cpu);
if (cpumask)
@@ -271,6 +290,19 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
&info, 1);
return;
}
+
+ /*
+ * Instead of sending IPIs to CPUs in lazy TLB mode, move that
+ * CPUs TLB state to TLBSTATE_FLUSH, causing the TLB to be flushed
+ * at the next context switch.
+ */
+ for_each_cpu(cpu, cpumask) {
+ if (per_cpu(cpu_tlbstate.state, cpu) != TLBSTATE_OK) {
+ set_lazy_tlbstate_flush(cpu);
+ cpumask_clear_cpu(cpu, (struct cpumask *)cpumask);
+ }
+ }
+
smp_call_function_many(cpumask, flush_tlb_func, &info, 1);
}
Powered by blists - more mailing lists