linux-kernel - Re: [PATCH v6 03/11] mm, x86: Add support for eXclusive Page Frame Ownership (XPFO)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170921000901.v7zo4g5edhqqfabm@docker>
Date:   Wed, 20 Sep 2017 18:09:01 -0600
From:   Tycho Andersen <tycho@...ker.com>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        kernel-hardening@...ts.openwall.com,
        Marco Benatto <marco.antonio.780@...il.com>,
        Juerg Haefliger <juerg.haefliger@...onical.com>, x86@...nel.org
Subject: Re: [PATCH v6 03/11] mm, x86: Add support for eXclusive Page Frame
 Ownership (XPFO)

On Wed, Sep 20, 2017 at 04:21:15PM -0700, Dave Hansen wrote:
> On 09/20/2017 03:34 PM, Tycho Andersen wrote:
> >> I really have to wonder whether there are better ret2dir defenses than
> >> this.  The allocator just seems like the *wrong* place to be doing this
> >> because it's such a hot path.
> > 
> > This might be crazy, but what if we defer flushing of the kernel
> > ranges until just before we return to userspace? We'd still manipulate
> > the prot/xpfo bits for the pages, but then just keep a list of which
> > ranges need to be flushed, and do the right thing before we return.
> > This leaves a little window between the actual allocation and the
> > flush, but userspace would need another thread in its threadgroup to
> > predict the next allocation, write the bad stuff there, and do the
> > exploit all in that window.
> 
> I think the common case is still that you enter the kernel, allocate a
> single page (or very few) and then exit.  So, you don't really reduce
> the total number of flushes.
> 
> Just think of this in terms of IPIs to do the remote TLB flushes.  A CPU
> can do roughly 1 million page faults and allocations a second.  Say you
> have a 2-socket x 28-core x 2 hyperthead system = 112 CPU threads.
> That's 111M IPI interrupts/second, just for the TLB flushes, *ON* *EACH*
> *CPU*.

Since we only need to flush when something switches from a userspace
to a kernel page or back, hopefully it's not this bad, but point
taken.

> I think the only thing that will really help here is if you batch the
> allocations.  For instance, you could make sure that the per-cpu-pageset
> lists always contain either all kernel or all user data.  Then remap the
> entire list at once and do a single flush after the entire list is consumed.

Just so I understand, the idea would be that we only flush when the
type of allocation alternates, so:

kmalloc(..., GFP_KERNEL);
kmalloc(..., GFP_KERNEL);
/* remap+flush here */
kmalloc(..., GFP_HIGHUSER);
/* remap+flush here */
kmalloc(..., GFP_KERNEL);

?

Tycho