linux-kernel - Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 18 Jul 2018 13:13:13 -1000
From:   Andy Lutomirski <luto@...capital.net>
To:     Rik van Riel <riel@...riel.com>
Cc:     Andy Lutomirski <luto@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
        Mike Galbraith <efault@....de>,
        kernel-team <kernel-team@...com>, Ingo Molnar <mingo@...nel.org>,
        Dave Hansen <dave.hansen@...el.com>
Subject: Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier



> On Jul 18, 2018, at 10:58 AM, Rik van Riel <riel@...riel.com> wrote:
> 
> 
> 
>> On Jul 17, 2018, at 4:04 PM, Andy Lutomirski <luto@...nel.org> wrote:
>> 
>> 
>> I think you've introduced a minor-ish performance regression due to
>> changing the old (admittedly terribly documented) control flow a bit.
>> Before, if real_prev == next, we would skip:
>> 
>> load_mm_cr4(next);
>> switch_ldt(real_prev, next);
>> 
>> Now we don't any more.  I think you should reinstate that
>> optimization.  It's probably as simple as wrapping them in an if
>> (real_priv != next) with a comment like /* Remote changes that would
>> require a cr4 or ldt reload will unconditionally send an IPI even to
>> lazy CPUs.  So, if we aren't changing our mm, we don't need to refresh
>> cr4 or the ldt */
> 
> Looks like switch_ldt already skips reloading the LDT when prev equals
> next, or when they simply have the same LDT values:
> 
>        if (unlikely((unsigned long)prev->context.ldt |
>                     (unsigned long)next->context.ldt))
>                load_mm_ldt(next);
> 

Read that again?  It will reload if there’s an LDT, even if it’s the same one.

> It appears that the cr4 bits have a similar optimization:
> 
> static inline void cr4_set_bits(unsigned long mask)
> {
>        unsigned long cr4, flags;
> 
>        local_irq_save(flags);
>        cr4 = this_cpu_read(cpu_tlbstate.cr4);
>        if ((cr4 | mask) != cr4)
>                __cr4_set(cr4 | mask);
>        local_irq_restore(flags);
> }
>> 
>> Hmm.  load_mm_cr4() should bypass itself when mm == &init_mm.  Want to
>> fix that part or should I?
>> 
> Looks like there might not be anything to do here, after all.

But if init_mm and the thread that just went idle have different selected cr4 values, we’ll still write it.  With your lazy TLB work, it’s less of a big deal, but still.

I’m happy to fix this myself, though.

> 
> On to the lazy TLB mm_struct refcounting stuff :)
> 

Which refcount?  mm_users shouldn’t be hot, so I assume you’re talking about mm_count. My suggestion is to get rid of mm_count instead of trying to optimize it.