lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <67F32577-24D8-4E9F-ADB1-927B3AC18B5A@amacapital.net>
Date:   Tue, 17 Jul 2018 12:27:53 -1000
From:   Andy Lutomirski <luto@...capital.net>
To:     Rik van Riel <riel@...riel.com>
Cc:     Andy Lutomirski <luto@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
        Mike Galbraith <efault@....de>,
        kernel-team <kernel-team@...com>, Ingo Molnar <mingo@...nel.org>,
        Dave Hansen <dave.hansen@...el.com>
Subject: Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier



> On Jul 17, 2018, at 12:05 PM, Rik van Riel <riel@...riel.com> wrote:
> 
> 
> 
>> On Jul 17, 2018, at 5:29 PM, Andy Lutomirski <luto@...nel.org> wrote:
>> 
>> On Tue, Jul 17, 2018 at 1:16 PM, Rik van Riel <riel@...riel.com> wrote:
>>> Can I skip both the cr4 and let switches when the TLB contents
>>> are no longer valid and got reloaded?
>>> 
>>> If the TLB contents are still valid, either because we never went
>>> into lazy TLB mode, or because no invalidates happened while
>>> we were lazy, we immediately return.
>>> 
>>> The cr4 and ldt reloads only happen if the TLB was invalidated
>>> while we were in lazy TLB mode.
>> 
>> Yes, since the only events that would change the LDT or the required
>> CR4 value will unconditionally broadcast to every CPU in mm_cpumask
>> regardless of whether they're lazy.  The interesting case is that you
>> go lazy, you miss an invalidation IPI because you were lazy, then you
>> go unlazy, notice the tlb_gen change, and flush.  If this happens, you
>> know that you only missed a page table update and not an LDT update or
>> a CR4 update, because the latter would have sent the IPI even though
>> you were lazy.  So you should skip the CR4 and LDT updates.
>> 
>> I suppose a different approach would be to fix the issue below and to
>> try to track when the LDT actually needs reloading.  But that latter
>> part seems a bit complicated for minimal gain.
>> 
>> (Do you believe me?  If not, please argue back!)
>> 
> I believe you :)
> 
>>>> Hmm.  load_mm_cr4() should bypass itself when mm == &init_mm.  Want to
>>>> fix that part or should I?
>>> 
>>> I would be happy to send in a patch for this, and one for
>>> the above optimization you pointed out.
>>> 
>> 
>> Yes please!
>> 
> There is a third optimization left to do. Currently every time
> we switch into lazy tlb mode, we take a refcount on the mm,
> even when switching from one kernel thread to another, or
> when repeatedly switching between the same mm and kernel
> threads.
> 
> We could keep that refcount (on a per cpu basis) from the time
> we first switch to that mm in lazy tlb mode, to when we switch
> the CPU to a different mm.
> 
> That would allow us to not bounce the cache line with the
> mm_struct reference count on every lazy TLB context switch.
> 
> Does that seem like a reasonable optimization?

Are you referring to the core sched code that deals with mm_count and active_mm?  If so, last time I looked at it, I convinced myself that it was totally useless, at least on x86. I think the my reasoning was that, when mm_users went to zero, we already waited for RCU before tearing down page tables.

Things may have changed, but I strongly suspect that it should be possibly for at least x86 to opt out of mm_count and maybe even active_mm entirely.  If nothing else, you’re shooting the mm out of CR3 on all CPUs whenever the pagetables get freed, and more or less the same logic should be sufficient so that, whenever mm_users hits zero, we can synchronously or via RCU callback kill the mm entirely.

Want to take a look at that?

> 
> Am I overlooking anything?
> 
> I'll try to get all three optimizations working, and will run them
> through some testing here before posting upstream.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ