linux-kernel - Re: [PATCH v3 06/11] x86/mm: Rework lazy TLB mode and TLB freshness tracking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrXkRQDWQH6oZfg4-36i4sgxjhfXmfaatHmmgXKVwtX+qA@mail.gmail.com>
Date:   Wed, 21 Jun 2017 09:04:48 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Andy Lutomirski <luto@...nel.org>, X86 ML <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Borislav Petkov <bp@...en8.de>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...e.de>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        Nadav Amit <nadav.amit@...il.com>,
        Rik van Riel <riel@...hat.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Arjan van de Ven <arjan@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Banman <abanman@....com>, Mike Travis <travis@....com>,
        Dimitri Sivanich <sivanich@....com>,
        Juergen Gross <jgross@...e.com>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>
Subject: Re: [PATCH v3 06/11] x86/mm: Rework lazy TLB mode and TLB freshness tracking

On Wed, Jun 21, 2017 at 2:01 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> On Tue, 20 Jun 2017, Andy Lutomirski wrote:
>> -/*
>> - * The flush IPI assumes that a thread switch happens in this order:
>> - * [cpu0: the cpu that switches]
>> - * 1) switch_mm() either 1a) or 1b)
>> - * 1a) thread switch to a different mm
>> - * 1a1) set cpu_tlbstate to TLBSTATE_OK
>> - *   Now the tlb flush NMI handler flush_tlb_func won't call leave_mm
>> - *   if cpu0 was in lazy tlb mode.
>> - * 1a2) update cpu active_mm
>> - *   Now cpu0 accepts tlb flushes for the new mm.
>> - * 1a3) cpu_set(cpu, new_mm->cpu_vm_mask);
>> - *   Now the other cpus will send tlb flush ipis.
>> - * 1a4) change cr3.
>> - * 1a5) cpu_clear(cpu, old_mm->cpu_vm_mask);
>> - *   Stop ipi delivery for the old mm. This is not synchronized with
>> - *   the other cpus, but flush_tlb_func ignore flush ipis for the wrong
>> - *   mm, and in the worst case we perform a superfluous tlb flush.
>> - * 1b) thread switch without mm change
>> - *   cpu active_mm is correct, cpu0 already handles flush ipis.
>> - * 1b1) set cpu_tlbstate to TLBSTATE_OK
>> - * 1b2) test_and_set the cpu bit in cpu_vm_mask.
>> - *   Atomically set the bit [other cpus will start sending flush ipis],
>> - *   and test the bit.
>> - * 1b3) if the bit was 0: leave_mm was called, flush the tlb.
>> - * 2) switch %%esp, ie current
>> - *
>> - * The interrupt must handle 2 special cases:
>> - * - cr3 is changed before %%esp, ie. it cannot use current->{active_,}mm.
>> - * - the cpu performs speculative tlb reads, i.e. even if the cpu only
>> - *   runs in kernel space, the cpu could load tlb entries for user space
>> - *   pages.
>> - *
>> - * The good news is that cpu_tlbstate is local to each cpu, no
>> - * write/read ordering problems.
>
> While the new code is really well commented, it would be a good thing to
> have a single place where all of this including the ordering constraints
> are documented.

I'll look at the end of the whole series and see if I can come up with
something good.

>
>> @@ -215,12 +200,13 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f,
>>       VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[0].ctx_id) !=
>>                  loaded_mm->context.ctx_id);
>>
>> -     if (this_cpu_read(cpu_tlbstate.state) != TLBSTATE_OK) {
>> +     if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(loaded_mm))) {
>>               /*
>> -              * leave_mm() is adequate to handle any type of flush, and
>> -              * we would prefer not to receive further IPIs.
>> +              * We're in lazy mode -- don't flush.  We can get here on
>> +              * remote flushes due to races and on local flushes if a
>> +              * kernel thread coincidentally flushes the mm it's lazily
>> +              * still using.
>
> Ok. That's more informative.
>
> Reviewed-by: Thomas Gleixner <tglx@...utronix.de>