linux-kernel - Re: [PATCH 0/5] Volatile Ranges (v12) & LSF-MM discussion fodder

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140402175852.GS14688@cmpxchg.org>
Date:	Wed, 2 Apr 2014 13:58:52 -0400
From:	Johannes Weiner <hannes@...xchg.org>
To:	John Stultz <john.stultz@...aro.org>
Cc:	Dave Hansen <dave@...1.net>, "H. Peter Anvin" <hpa@...or.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Android Kernel Team <kernel-team@...roid.com>,
	Robert Love <rlove@...gle.com>, Mel Gorman <mel@....ul.ie>,
	Hugh Dickins <hughd@...gle.com>,
	Rik van Riel <riel@...hat.com>,
	Dmitry Adamushko <dmitry.adamushko@...il.com>,
	Neil Brown <neilb@...e.de>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Mike Hommey <mh@...ndium.org>, Taras Glek <tglek@...illa.com>,
	Jan Kara <jack@...e.cz>,
	KOSAKI Motohiro <kosaki.motohiro@...il.com>,
	Michel Lespinasse <walken@...gle.com>,
	Minchan Kim <minchan@...nel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH 0/5] Volatile Ranges (v12) & LSF-MM discussion fodder

On Wed, Apr 02, 2014 at 10:40:16AM -0700, John Stultz wrote:
> On Wed, Apr 2, 2014 at 9:36 AM, Johannes Weiner <hannes@...xchg.org> wrote:
> > On Tue, Apr 01, 2014 at 09:12:44PM -0700, John Stultz wrote:
> >> On 04/01/2014 04:01 PM, Dave Hansen wrote:
> >> > On 04/01/2014 02:35 PM, H. Peter Anvin wrote:
> >> >> On 04/01/2014 02:21 PM, Johannes Weiner wrote:
> >> > John, this was something that the Mozilla guys asked for, right?  Any
> >> > idea why this isn't ever a problem for them?
> >> So one of their use cases for it is for library text. Basically they
> >> want to decompress a compressed library file into memory. Then they plan
> >> to mark the uncompressed pages volatile, and then be able to call into
> >> it. Ideally for them, the kernel would only purge cold pages, leaving
> >> the hot pages in memory. When they traverse a purged page, they handle
> >> the SIGBUS and patch the page up.
> >
> > How big are these libraries compared to overall system size?
> 
> Mike or Taras would have to refresh my memory on this detail. My
> recollection is it mostly has to do with keeping the on-disk size of
> the library small, so it can load off of slow media very quickly.
> 
> >> Now.. this is not what I'd consider a normal use case, but was hoping to
> >> illustrate some of the more interesting uses and demonstrate the
> >> interfaces flexibility.
> >
> > I'm just dying to hear a "normal" use case then. :)
> 
> So the more "normal" use cause would be marking objects volatile and
> then non-volatile w/o accessing them in-between. In this case the
> zero-fill vs SIGBUS semantics don't really matter, its really just a
> trade off in how we handle applications deviating (intentionally or
> not) from this use case.
> 
> So to maybe flesh out the context here for folks who are following
> along (but weren't in the hallway at LSF :),  Johannes made a fairly
> interesting proposal (Johannes: Please correct me here where I'm maybe
> slightly off here) to use only the dirty bits of the ptes to mark a
> page as volatile. Then the kernel could reclaim these clean pages as
> it needed, and when we marked the range as non-volatile, the pages
> would be re-dirtied and if any of the pages were missing, we could
> return a flag with the purged state.  This had some different
> semantics then what I've been working with for awhile (for example,
> any writes to pages would implicitly clear volatility), so I wasn't
> completely comfortable with it, but figured I'd think about it to see
> if it could be done. Particularly since it would in some ways simplify
> tmpfs/shm shared volatility that I'd eventually like to do.
> 
> After thinking it over in the hallway, I talked some of the details w/
> Johnnes and there was one issue that while w/ anonymous memory, we can
> still add a VM_VOLATILE flag on the vma, so we can get SIGBUS
> semantics, but since on shared volatile ranges, we don't have anything
> to hang a volatile flag on w/o adding some new vma like structure to
> the address_space structure (much as we did in the past w/ earlier
> volatile range implementations). This would negate much of the point
> of using the dirty bits to simplify the shared volatility
> implementation.
> 
> Thus Johannes is reasonably questioning the need for SIGBUS semantics,
> since if it wasn't needed, the simpler page-cleaning based volatility
> could potentially be used.

Thanks for summarizing this again!

> Now, while for the case I'm personally most interested in (ashmem),
> zero-fill would technically be ok, since that's what Android does.
> Even so, I don't think its the best approach for the interface, since
> applications may end up quite surprised by the results when they
> accidentally don't follow the "don't touch volatile pages" rule.
> 
> That point beside, I think the other problem with the page-cleaning
> volatility approach is that there are other awkward side effects. For
> example: Say an application marks a range as volatile. One page in the
> range is then purged. The application, due to a bug or otherwise,
> reads the volatile range. This causes the page to be zero-filled in,
> and the application silently uses the corrupted data (which isn't
> great). More problematic though, is that by faulting the page in,
> they've in effect lost the purge state for that page. When the
> application then goes to mark the range as non-volatile, all pages are
> present, so we'd return that no pages were purged.  From an
> application perspective this is pretty ugly.
> 
> Johannes: Any thoughts on this potential issue with your proposal? Am
> I missing something else?

No, this is accurate.  However, I don't really see how this is
different than any other use-after-free bug.  If you access malloc
memory after free(), you might receive a SIGSEGV, you might see random
data, you might corrupt somebody else's data.  This certainly isn't
nice, but it's not exactly new behavior, is it?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/