linux-kernel - Re: [RFC PATCH v2 11/12] x86/mm/tlb: Use async and inline messages for flushing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrU0=BpGy5OQezQ7or33n-EFgBVDNe5g8prSUjL2SoRAwA@mail.gmail.com>
Date:   Fri, 31 May 2019 14:14:49 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     Nadav Amit <namit@...are.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Andy Lutomirski <luto@...nel.org>,
        Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...el.com>,
        Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>, X86 ML <x86@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [RFC PATCH v2 11/12] x86/mm/tlb: Use async and inline messages
 for flushing

On Thu, May 30, 2019 at 11:37 PM Nadav Amit <namit@...are.com> wrote:
>
> When we flush userspace mappings, we can defer the TLB flushes, as long
> the following conditions are met:
>
> 1. No tables are freed, since otherwise speculative page walks might
>    cause machine-checks.
>
> 2. No one would access userspace before flush takes place. Specifically,
>    NMI handlers and kprobes would avoid accessing userspace.
>

I think I need to ask the big picture question.  When someone calls
flush_tlb_mm_range() (or the other entry points), if no page tables
were freed, they want the guarantee that future accesses (initiated
observably after the flush returns) will not use paging entries that
were replaced by stores ordered before flush_tlb_mm_range().  We also
need the guarantee that any effects from any memory access using the
old paging entries will become globally visible before
flush_tlb_mm_range().

I'm wondering if receipt of an IPI is enough to guarantee any of this.
If CPU 1 sets a dirty bit and CPU 2 writes to the APIC to send an IPI
to CPU 1, at what point is CPU 2 guaranteed to be able to observe the
dirty bit?  An interrupt entry today is fully serializing by the time
it finishes, but interrupt entries are epicly slow, and I don't know
if the APIC waits long enough.  Heck, what if IRQs are off on the
remote CPU?  There are a handful of places where we touch user memory
with IRQs off, and it's (sadly) possible for user code to turn off
IRQs with iopl().

I *think* that Intel has stated recently that SMT siblings are
guaranteed to stop speculating when you write to the APIC ICR to poke
them, but SMT is very special.

My general conclusion is that I think the code needs to document what
is guaranteed and why.

--Andy