[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <F93A3335-4830-4E36-9179-D0FA5D640C8A@amacapital.net>
Date: Wed, 18 Jul 2018 13:13:13 -1000
From: Andy Lutomirski <luto@...capital.net>
To: Rik van Riel <riel@...riel.com>
Cc: Andy Lutomirski <luto@...nel.org>,
LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
Mike Galbraith <efault@....de>,
kernel-team <kernel-team@...com>, Ingo Molnar <mingo@...nel.org>,
Dave Hansen <dave.hansen@...el.com>
Subject: Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier
> On Jul 18, 2018, at 10:58 AM, Rik van Riel <riel@...riel.com> wrote:
>
>
>
>> On Jul 17, 2018, at 4:04 PM, Andy Lutomirski <luto@...nel.org> wrote:
>>
>>
>> I think you've introduced a minor-ish performance regression due to
>> changing the old (admittedly terribly documented) control flow a bit.
>> Before, if real_prev == next, we would skip:
>>
>> load_mm_cr4(next);
>> switch_ldt(real_prev, next);
>>
>> Now we don't any more. I think you should reinstate that
>> optimization. It's probably as simple as wrapping them in an if
>> (real_priv != next) with a comment like /* Remote changes that would
>> require a cr4 or ldt reload will unconditionally send an IPI even to
>> lazy CPUs. So, if we aren't changing our mm, we don't need to refresh
>> cr4 or the ldt */
>
> Looks like switch_ldt already skips reloading the LDT when prev equals
> next, or when they simply have the same LDT values:
>
> if (unlikely((unsigned long)prev->context.ldt |
> (unsigned long)next->context.ldt))
> load_mm_ldt(next);
>
Read that again? It will reload if there’s an LDT, even if it’s the same one.
> It appears that the cr4 bits have a similar optimization:
>
> static inline void cr4_set_bits(unsigned long mask)
> {
> unsigned long cr4, flags;
>
> local_irq_save(flags);
> cr4 = this_cpu_read(cpu_tlbstate.cr4);
> if ((cr4 | mask) != cr4)
> __cr4_set(cr4 | mask);
> local_irq_restore(flags);
> }
>>
>> Hmm. load_mm_cr4() should bypass itself when mm == &init_mm. Want to
>> fix that part or should I?
>>
> Looks like there might not be anything to do here, after all.
But if init_mm and the thread that just went idle have different selected cr4 values, we’ll still write it. With your lazy TLB work, it’s less of a big deal, but still.
I’m happy to fix this myself, though.
>
> On to the lazy TLB mm_struct refcounting stuff :)
>
Which refcount? mm_users shouldn’t be hot, so I assume you’re talking about mm_count. My suggestion is to get rid of mm_count instead of trying to optimize it.
Powered by blists - more mailing lists