linux-kernel - Re: [PATCH v2 05/10] x86/mm: Rework lazy TLB mode and TLB freshness tracking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Sun, 18 Jun 2017 01:06:15 -0700
From:   Nadav Amit <nadav.amit@...il.com>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     X86 ML <x86@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
        Borislav Petkov <bp@...en8.de>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...e.de>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        Rik van Riel <riel@...hat.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Arjan van de Ven <arjan@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Banman <abanman@....com>, Mike Travis <travis@....com>,
        Dimitri Sivanich <sivanich@....com>,
        Juergen Gross <jgross@...e.com>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>
Subject: Re: [PATCH v2 05/10] x86/mm: Rework lazy TLB mode and TLB freshness
 tracking


> On Jun 13, 2017, at 9:56 PM, Andy Lutomirski <luto@...nel.org> wrote:
> 
> x86's lazy TLB mode used to be fairly weak -- it would switch to
> init_mm the first time it tried to flush a lazy TLB.  This meant an
> unnecessary CR3 write and, if the flush was remote, an unnecessary
> IPI.
> 
> Rewrite it entirely.  When we enter lazy mode, we simply remove the
> cpu from mm_cpumask.  This means that we need a way to figure out
> whether we've missed a flush when we switch back out of lazy mode.
> I use the tlb_gen machinery to track whether a context is up to
> date.
> 
> Note to reviewers: this patch, my itself, looks a bit odd.  I'm
> using an array of length 1 containing (ctx_id, tlb_gen) rather than
> just storing tlb_gen, and making it at array isn't necessary yet.
> I'm doing this because the next few patches add PCID support, and,
> with PCID, we need ctx_id, and the array will end up with a length
> greater than 1.  Making it an array now means that there will be
> less churn and therefore less stress on your eyeballs.
> 
> NB: This is dubious but, AFAICT, still correct on Xen and UV.
> xen_exit_mmap() uses mm_cpumask() for nefarious purposes and this
> patch changes the way that mm_cpumask() works.  This should be okay,
> since Xen *also* iterates all online CPUs to find all the CPUs it
> needs to twiddle.
> 
> The UV tlbflush code is rather dated and should be changed.
> 
> Cc: Andrew Banman <abanman@....com>
> Cc: Mike Travis <travis@....com>
> Cc: Dimitri Sivanich <sivanich@....com>
> Cc: Juergen Gross <jgross@...e.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@...cle.com>
> Signed-off-by: Andy Lutomirski <luto@...nel.org>
> ---
> arch/x86/include/asm/mmu_context.h |   6 +-
> arch/x86/include/asm/tlbflush.h    |   4 -
> arch/x86/mm/init.c                 |   1 -
> arch/x86/mm/tlb.c                  | 242 +++++++++++++++++++------------------
> 4 files changed, 131 insertions(+), 122 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> index e5295d485899..69a4f1ee86ac 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -125,8 +125,10 @@ static inline void switch_ldt(struct mm_struct *prev, struct mm_struct *next)
> 
> static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
> {
> -	if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
> -		this_cpu_write(cpu_tlbstate.state, TLBSTATE_LAZY);
> +	int cpu = smp_processor_id();
> +
> +	if (cpumask_test_cpu(cpu, mm_cpumask(mm)))
> +		cpumask_clear_cpu(cpu, mm_cpumask(mm));

The indication for laziness that was in cpu_tlbstate.state may be a better
indication whether the cpu needs to be cleared from the previous mm_cpumask().
If you kept this indication, you could have used this per-cpu information in
switch_mm_irqs_off() instead of "cpumask_test_cpu(cpu, mm_cpumask(next))”,
which might have been accessed by another core.