[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrUMntRe_LYX5wj9YD4xZ+84QExK+ZNb3yxEBDEKa7nePQ@mail.gmail.com>
Date: Tue, 17 Jul 2018 13:04:07 -0700
From: Andy Lutomirski <luto@...nel.org>
To: Rik van Riel <riel@...riel.com>
Cc: LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
Andrew Lutomirski <luto@...nel.org>,
Mike Galbraith <efault@....de>,
kernel-team <kernel-team@...com>, Ingo Molnar <mingo@...nel.org>,
Dave Hansen <dave.hansen@...el.com>
Subject: Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier
On Mon, Jul 16, 2018 at 12:03 PM, Rik van Riel <riel@...riel.com> wrote:
> Lazy TLB mode can result in an idle CPU being woken up by a TLB flush,
> when all it really needs to do is reload %CR3 at the next context switch,
> assuming no page table pages got freed.
>
> Memory ordering is used to prevent race conditions between switch_mm_irqs_off,
> which checks whether .tlb_gen changed, and the TLB invalidation code, which
> increments .tlb_gen whenever page table entries get invalidated.
>
> The atomic increment in inc_mm_tlb_gen is its own barrier; the context
> switch code adds an explicit barrier between reading tlbstate.is_lazy and
> next->context.tlb_gen.
>
> Unlike the 2016 version of this patch, CPUs with cpu_tlbstate.is_lazy set
> are not removed from the mm_cpumask(mm), since that would prevent the TLB
> flush IPIs at page table free time from being sent to all the CPUs
> that need them.
>
> This patch reduces total CPU use in the system by about 1-2% for a
> memcache workload on two socket systems, and by about 1% for a heavily
> multi-process netperf between two systems.
>
I'm not 100% certain I'm replying to the right email, and I haven't
gotten the tip-bot notification at all, but:
I think you've introduced a minor-ish performance regression due to
changing the old (admittedly terribly documented) control flow a bit.
Before, if real_prev == next, we would skip:
load_mm_cr4(next);
switch_ldt(real_prev, next);
Now we don't any more. I think you should reinstate that
optimization. It's probably as simple as wrapping them in an if
(real_priv != next) with a comment like /* Remote changes that would
require a cr4 or ldt reload will unconditionally send an IPI even to
lazy CPUs. So, if we aren't changing our mm, we don't need to refresh
cr4 or the ldt */
Hmm. load_mm_cr4() should bypass itself when mm == &init_mm. Want to
fix that part or should I?
--Andy
Powered by blists - more mailing lists