linux-kernel - Re: [PATCH] mm: Add new vma flag VM_LOCAL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180515120750.lro2qbskw5cptc5o@lakrids.cambridge.arm.com>
Date:   Tue, 15 May 2018 13:07:51 +0100
From:   Mark Rutland <mark.rutland@....com>
To:     Boaz Harrosh <boazh@...app.com>
Cc:     Matthew Wilcox <willy@...radead.org>,
        Jeff Moyer <jmoyer@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Rik van Riel <riel@...hat.com>, Jan Kara <jack@...e.cz>,
        Matthew Wilcox <mawilcox@...rosoft.com>,
        Amit Golander <Amit.Golander@...app.com>
Subject: Re: [PATCH] mm: Add new vma flag VM_LOCAL_CPU

On Tue, May 15, 2018 at 01:43:23PM +0300, Boaz Harrosh wrote:
> On 15/05/18 03:41, Matthew Wilcox wrote:
> > On Mon, May 14, 2018 at 10:37:38PM +0300, Boaz Harrosh wrote:
> >> On 14/05/18 22:15, Matthew Wilcox wrote:
> >>> On Mon, May 14, 2018 at 08:28:01PM +0300, Boaz Harrosh wrote:
> >>>> On a call to mmap an mmap provider (like an FS) can put
> >>>> this flag on vma->vm_flags.
> >>>>
> >>>> The VM_LOCAL_CPU flag tells the Kernel that the vma will be used
> >>>> from a single-core only, and therefore invalidation (flush_tlb) of
> >>>> PTE(s) need not be a wide CPU scheduling.
> >>>
> >>> I still don't get this.  You're opening the kernel up to being exploited
> >>> by any application which can persuade it to set this flag on a VMA.
> >>>
> >>
> >> No No this is not an application accessible flag this can only be set
> >> by the mmap implementor at ->mmap() time (Say same as VM_VM_MIXEDMAP).
> >>
> >> Please see the zuf patches for usage (Again apologise for pushing before
> >> a user)
> >>
> >> The mmap provider has all the facilities to know that this can not be
> >> abused, not even by a trusted Server.
> > 
> > I don't think page tables work the way you think they work.
> > 
> > +               err = vm_insert_pfn_prot(zt->vma, zt_addr, pfn, prot);
> > 
> > That doesn't just insert it into the local CPU's page table.  Any CPU
> > which directly accesses or even prefetches that address will also get
> > the translation into its cache.
> > 
> 
> Yes I know, but that is exactly the point of this flag. I know that this
> address is only ever accessed from a single core. Because it is an mmap (vma)
> of an O_TMPFILE-exclusive file created in a core-pinned thread and I allow
> only that thread any kind of access to this vma. Both the filehandle and the
> mmaped pointer are kept on the thread stack and have no access from outside.

Even if (in the specific context of your application) software on other
cores might not explicitly access this area, that does not prevent
allocations into TLBs, and TLB maintenance *cannot* be elided.

Even assuming that software *never* explicitly accesses an address which
it has not mapped is insufficient.

For example, imagine you have two threads, each pinned to a CPU, and
some local_cpu_{mmap,munmap} which uses your new flag:

	CPU0				CPU1
	x = local_cpu_mmap(...);
	do_things_with(x);
					// speculatively allocates TLB
					// entries for X.

	// only invalidates local TLBs
	local_cpu_munmap(x);

					// TLB entries for X still live
	
					y = local_cpu_mmap(...);

					// if y == x, we can hit the
					// stale TLB entry, and access
					// the wrong page
					do_things_with(y);

Consider that after we free x, the kernel could reuse the page for any
purpose (e.g. kernel page tables), so this is a major risk.

This flag simply is not safe, unless the *entire* mm is only ever
accessed from a single CPU. In that case, we don't need the flag anyway,
as the mm already has a cpumask.

Thanks,
Mark.