linux-kernel - Re: Access to non-RAM pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a1980256cf3d2ea8e91707fcda7cf141b27a212d.camel@kernel.crashing.org>
Date:   Mon, 03 Sep 2018 10:48:05 +1000
From:   Benjamin Herrenschmidt <benh@...nel.crashing.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Jiri Kosina <jikos@...nel.org>
Cc:     Jürgen Groß <jgross@...e.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Michal Hocko <mhocko@...e.com>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        Will Deacon <will.deacon@....com>
Subject: Re: Access to non-RAM pages

On Sat, 2018-09-01 at 11:06 -0700, Linus Torvalds wrote:
> [ Adding a few new people the the cc.
> 
>   The issue is the worry about software-speculative accesses (ie
> things like CONFIG_DCACHE_WORD_ACCESS - not talking about the hw
> speculation now) accessing past RAM into possibly contiguous IO ]
> 
> On Sat, Sep 1, 2018 at 10:27 AM Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> > 
> > If you have a machine with RAM that touches IO, you need to disable
> > the last page, exactly the same way we disable and marked reserved the
> > first page at zero.

So I missed the departure of that train ... stupid question, with
CONFIG_DCACHE_WORD_ACCESS, if that can be unaligned (I assume it can),
what prevents it from crossing into a non-mapped page (not even IO) and
causing an oops ? Looking at a random user in fs/dcache.c its not a
uaccess-style read with recovery.... Or am I missing somethign obvious
here ?

IE, should we "reserve" the last page of any memory region (maybe mark
it read-only) to avoid this along with avoiding leakage into IO space ?

> > I thought we already did that.
> 
> We don't seem to do that.
> 
> And it's not just the last page, it's _any_ last page in a region that
> bumps up to IO. That's actually much more common in the low 4G area on
> PC's, I suspect, although the reserved BIOS ranges always tend to be
> there.

What makes IO more "wrong" than oopsing due to the page not being
mapped ?

> I suspect it should be trivial to do - maybe in
> e820__memblock_setup()? That's where we already trim partial pages
> etc.
>
> In fact, I think this might be done as an extension of commit
> 124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into
> memblock.reserved"), except making sure that non-RAM regions mark one
> page _previous_ as reserved too.
> 
> I assume memory hotplug might have the same issue, and checking
> whether ARM64 and powerpc perhaps might have already done something
> like this (or might need to add it).
> 
> We discussed long ago the case of user space mapping IO in user space,
> and decided we didn't care. But the kernel should probably explicitly
> make sure we don't either, even if I can't recall having ever seen a
> machine that actually maps IO contiguously to RAM. The layout always
> tends to end up having holes anyway.

Can't we put the safety in generic memblock ? IE, don't hand out an
allocation that contain the last page of a "block" and handle that last
page in the memblock->buddy transition rather than in arch specific
code ?

Cheers,
Ben.