linux-kernel - Re: [PATCH] mm/mincore: allow for making sys

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <nycvar.YFH.7.76.1901091050560.16954@cbobk.fhfr.pm>
Date:   Wed, 9 Jan 2019 11:08:57 +0100 (CET)
From:   Jiri Kosina <jikos@...nel.org>
To:     Dave Chinner <david@...morbit.com>
cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        Jann Horn <jannh@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Greg KH <gregkh@...uxfoundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Michal Hocko <mhocko@...e.com>, Linux-MM <linux-mm@...ck.org>,
        kernel list <linux-kernel@...r.kernel.org>,
        Linux API <linux-api@...r.kernel.org>
Subject: Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

On Wed, 9 Jan 2019, Dave Chinner wrote:

> FWIW, I just realised that the easiest, most reliable way to invalidate 
> the page cache over a file range is simply to do a O_DIRECT read on it. 

Neat, good catch indeed. Still, it's only the invalidation part, but the 
residency check is the crucial one.

> > Rationale has been provided by Daniel Gruss in this thread -- if the 
> > attacker is left with cache timing as the only available vector, he's 
> > going to be much more successful with mounting hardware cache timing 
> > attack anyway.
> 
> No, he said:
> 
> "Restricting mincore() is sufficient to fix the hardware-agnostic
> part."
> 
> That's not correct - preadv2(RWF_NOWAIT) is also hardware agnostic and 
> provides exactly the same information about the page cache as mincore.  

Yeah, preadv2(RWF_NOWAIT) is in the same teritory as mincore(), it has 
"just" been overlooked. I can't speak for Daniel, but I believe he might 
be ok with rephrasing the above as "Restricting mincore() and RWF_NOWAIT 
is sufficient ...".

> Timed read/mmap access loops for cache observation are also hardware 
> agnostic, and on fast SSD based storage will only be marginally slower 
> bandwidth than preadv2(RWF_NOWAIT).
> 
> Attackers will pick whatever leak vector we don't fix, so we either fix 
> them all (which I think is probably impossible without removing caching 
> altogether) 

We can't really fix the fact that it's possible to do the timing on the HW 
caches though.

> or we start thinking about how we need to isolate the page cache so that 
> information isn't shared across important security boundaries (e.g. page 
> cache contents are per-mount namespace).

Umm, sorry for being dense, but how would that help that particular attack 
scenario on a system that doesn't really employ any namespacing? (which I 
still believe is a majority of the systems out there, but I might have 
just missed the containers train long time ago :) ).

-- 
Jiri Kosina
SUSE Labs