linux-kernel - Re: [RFC PATCH v7 00/16] Add support for eXclusive Page Frame Ownership

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <31fe7522-0a59-94c8-663e-049e9ad2bff6@intel.com>
Date:   Thu, 10 Jan 2019 15:40:04 -0800
From:   Dave Hansen <dave.hansen@...el.com>
To:     Khalid Aziz <khalid.aziz@...cle.com>, juergh@...il.com,
        tycho@...ho.ws, jsteckli@...zon.de, ak@...ux.intel.com,
        torvalds@...ux-foundation.org, liran.alon@...cle.com,
        keescook@...gle.com, konrad.wilk@...cle.com
Cc:     deepa.srinivasan@...cle.com, chris.hyser@...cle.com,
        tyhicks@...onical.com, dwmw@...zon.co.uk,
        andrew.cooper3@...rix.com, jcm@...hat.com,
        boris.ostrovsky@...cle.com, kanth.ghatraju@...cle.com,
        joao.m.martins@...cle.com, jmattson@...gle.com,
        pradeep.vincent@...cle.com, john.haxby@...cle.com,
        tglx@...utronix.de, kirill.shutemov@...ux.intel.com, hch@....de,
        steven.sistare@...cle.com, kernel-hardening@...ts.openwall.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [RFC PATCH v7 00/16] Add support for eXclusive Page Frame
 Ownership

First of all, thanks for picking this back up.  It looks to be going in
a very positive direction!

On 1/10/19 1:09 PM, Khalid Aziz wrote:
> I implemented a solution to reduce performance penalty and
> that has had large impact. When XPFO code flushes stale TLB entries,
> it does so for all CPUs on the system which may include CPUs that
> may not have any matching TLB entries or may never be scheduled to
> run the userspace task causing TLB flush.
...
> A rogue process can launch a ret2dir attack only from a CPU that has 
> dual mapping for its pages in physmap in its TLB. We can hence defer 
> TLB flush on a CPU until a process that would have caused a TLB
> flush is scheduled on that CPU.

This logic is a bit suspect to me.  Imagine a situation where we have
two attacker processes: one which is causing page to go from
kernel->user (and be unmapped from the kernel) and a second process that
*was* accessing that page.

The second process could easily have the page's old TLB entry.  It could
abuse that entry as long as that CPU doesn't context switch
(switch_mm_irqs_off()) or otherwise flush the TLB entry.

As for where to flush the TLB...  As you know, using synchronous IPIs is
obviously the most bulletproof from a mitigation perspective.  If you
can batch the IPIs, you can get the overhead down, but you need to do
the flushes for a bunch of pages at once, which I think is what you were
exploring but haven't gotten working yet.

Anything else you do will have *some* reduced mitigation value, which
isn't a deal-breaker (to me at least).  Some ideas:

Take a look at the SWITCH_TO_KERNEL_CR3 in head_64.S.  Every time that
gets called, we've (potentially) just done a user->kernel transition and
might benefit from flushing the TLB.  We're always doing a CR3 write (on
Meltdown-vulnerable hardware) and it can do a full TLB flush based on if
X86_CR3_PCID_NOFLUSH_BIT is set.  So, when you need a TLB flush, you
would set a bit that ADJUST_KERNEL_CR3 would see on the next
user->kernel transition on *each* CPU.  Potentially, multiple TLB
flushes could be coalesced this way.  The downside of this is that
you're exposed to the old TLB entries if a flush is needed while you are
already *in* the kernel.

You could also potentially do this from C code, like in the syscall
entry code, or in sensitive places, like when you're returning from a
guest after a VMEXIT in the kvm code.