[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170921000901.v7zo4g5edhqqfabm@docker>
Date: Wed, 20 Sep 2017 18:09:01 -0600
From: Tycho Andersen <tycho@...ker.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
kernel-hardening@...ts.openwall.com,
Marco Benatto <marco.antonio.780@...il.com>,
Juerg Haefliger <juerg.haefliger@...onical.com>, x86@...nel.org
Subject: Re: [PATCH v6 03/11] mm, x86: Add support for eXclusive Page Frame
Ownership (XPFO)
On Wed, Sep 20, 2017 at 04:21:15PM -0700, Dave Hansen wrote:
> On 09/20/2017 03:34 PM, Tycho Andersen wrote:
> >> I really have to wonder whether there are better ret2dir defenses than
> >> this. The allocator just seems like the *wrong* place to be doing this
> >> because it's such a hot path.
> >
> > This might be crazy, but what if we defer flushing of the kernel
> > ranges until just before we return to userspace? We'd still manipulate
> > the prot/xpfo bits for the pages, but then just keep a list of which
> > ranges need to be flushed, and do the right thing before we return.
> > This leaves a little window between the actual allocation and the
> > flush, but userspace would need another thread in its threadgroup to
> > predict the next allocation, write the bad stuff there, and do the
> > exploit all in that window.
>
> I think the common case is still that you enter the kernel, allocate a
> single page (or very few) and then exit. So, you don't really reduce
> the total number of flushes.
>
> Just think of this in terms of IPIs to do the remote TLB flushes. A CPU
> can do roughly 1 million page faults and allocations a second. Say you
> have a 2-socket x 28-core x 2 hyperthead system = 112 CPU threads.
> That's 111M IPI interrupts/second, just for the TLB flushes, *ON* *EACH*
> *CPU*.
Since we only need to flush when something switches from a userspace
to a kernel page or back, hopefully it's not this bad, but point
taken.
> I think the only thing that will really help here is if you batch the
> allocations. For instance, you could make sure that the per-cpu-pageset
> lists always contain either all kernel or all user data. Then remap the
> entire list at once and do a single flush after the entire list is consumed.
Just so I understand, the idea would be that we only flush when the
type of allocation alternates, so:
kmalloc(..., GFP_KERNEL);
kmalloc(..., GFP_KERNEL);
/* remap+flush here */
kmalloc(..., GFP_HIGHUSER);
/* remap+flush here */
kmalloc(..., GFP_KERNEL);
?
Tycho
Powered by blists - more mailing lists