[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <B6566AA4-2E8A-43BD-A224-0F6D4747B8FB@surriel.com>
Date: Wed, 18 Jul 2018 16:58:28 -0400
From: Rik van Riel <riel@...riel.com>
To: Andy Lutomirski <luto@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
Mike Galbraith <efault@....de>,
kernel-team <kernel-team@...com>, Ingo Molnar <mingo@...nel.org>,
Dave Hansen <dave.hansen@...el.com>
Subject: Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier
> On Jul 17, 2018, at 4:04 PM, Andy Lutomirski <luto@...nel.org> wrote:
>
>
> I think you've introduced a minor-ish performance regression due to
> changing the old (admittedly terribly documented) control flow a bit.
> Before, if real_prev == next, we would skip:
>
> load_mm_cr4(next);
> switch_ldt(real_prev, next);
>
> Now we don't any more. I think you should reinstate that
> optimization. It's probably as simple as wrapping them in an if
> (real_priv != next) with a comment like /* Remote changes that would
> require a cr4 or ldt reload will unconditionally send an IPI even to
> lazy CPUs. So, if we aren't changing our mm, we don't need to refresh
> cr4 or the ldt */
Looks like switch_ldt already skips reloading the LDT when prev equals
next, or when they simply have the same LDT values:
if (unlikely((unsigned long)prev->context.ldt |
(unsigned long)next->context.ldt))
load_mm_ldt(next);
It appears that the cr4 bits have a similar optimization:
static inline void cr4_set_bits(unsigned long mask)
{
unsigned long cr4, flags;
local_irq_save(flags);
cr4 = this_cpu_read(cpu_tlbstate.cr4);
if ((cr4 | mask) != cr4)
__cr4_set(cr4 | mask);
local_irq_restore(flags);
}
>
> Hmm. load_mm_cr4() should bypass itself when mm == &init_mm. Want to
> fix that part or should I?
>
Looks like there might not be anything to do here, after all.
On to the lazy TLB mm_struct refcounting stuff :)
Powered by blists - more mailing lists