lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191030084005.GC20624@rapoport-lnx>
Date:   Wed, 30 Oct 2019 09:40:06 +0100
From:   Mike Rapoport <rppt@...nel.org>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Alexey Dobriyan <adobriyan@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Arnd Bergmann <arnd@...db.de>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        James Bottomley <jejb@...ux.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Linux API <linux-api@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>, X86 ML <x86@...nel.org>,
        Mike Rapoport <rppt@...ux.ibm.com>
Subject: Re: [PATCH RFC] mm: add MAP_EXCLUSIVE to create exclusive user
 mappings

On Tue, Oct 29, 2019 at 10:00:55AM -0700, Andy Lutomirski wrote:
> On Tue, Oct 29, 2019 at 2:33 AM Mike Rapoport <rppt@...nel.org> wrote:
> >
> > On Mon, Oct 28, 2019 at 02:44:23PM -0600, Andy Lutomirski wrote:
> > >
> > > > On Oct 27, 2019, at 4:17 AM, Mike Rapoport <rppt@...nel.org> wrote:
> > > >
> > > > From: Mike Rapoport <rppt@...ux.ibm.com>
> > > >
> > > > Hi,
> > > >
> > > > The patch below aims to allow applications to create mappins that have
> > > > pages visible only to the owning process. Such mappings could be used to
> > > > store secrets so that these secrets are not visible neither to other
> > > > processes nor to the kernel.
> > > >
> > > > I've only tested the basic functionality, the changes should be verified
> > > > against THP/migration/compaction. Yet, I'd appreciate early feedback.
> > >
> > > I’ve contemplated the concept a fair amount, and I think you should
> > > consider a change to the API. In particular, rather than having it be a
> > > MAP_ flag, make it a chardev.  You can, at least at first, allow only
> > > MAP_SHARED, and admins can decide who gets to use it.  It might also play
> > > better with the VM overall, and you won’t need a VM_ flag for it — you
> > > can just wire up .fault to do the right thing.
> >
> > I think mmap()/mprotect()/madvise() are the natural APIs for such
> > interface.
> 
> Then you have a whole bunch of questions to answer.  For example:
> 
> What happens if you mprotect() or similar when the mapping is already
> in use in a way that's incompatible with MAP_EXCLUSIVE?

Then we refuse to mprotect()? Like in any other case when vm_flags are not
compatible with required madvise()/mprotect() operation.

> Is it actually reasonable to malloc() some memory and then make it exclusive?
> 
> Are you permitted to map a file MAP_EXCLUSIVE?  What does it mean?

I'd limit MAP_EXCLUSIVE only to anonymous memory.

> What does MAP_PRIVATE | MAP_EXCLUSIVE do?

My preference is to have only mmap() and then the semantics is more clear:

MAP_PRIVATE | MAP_EXCLUSIVE creates a pre-populated region, marks it locked
and drops the pages in this region from the direct map.
The pages are returned back on munmap(). 
Then there is no way to change an existing area to be exclusive or vice
versa.

> How does one pass exclusive memory via SCM_RIGHTS?  (If it's a
> memfd-like or chardev interface, it's trivial.  mmap(), not so much.)

Why passing such memory via SCM_RIGHTS would be useful?
 
> And finally, there's my personal giant pet peeve: a major use of this
> will be for virtualization.  I suspect that a lot of people would like
> the majority of KVM guest memory to be unmapped from the host
> pagetables.  But people might also like for guest memory to be
> unmapped in *QEMU's* pagetables, and mmap() is a basically worthless
> interface for this.  Getting fd-backed memory into a guest will take
> some possibly major work in the kernel, but getting vma-backed memory
> into a guest without mapping it in the host user address space seems
> much, much worse.

Well, in my view, the MAP_EXCLUSIVE is intended to keep small secrets
rather than use it for the entire guest memory. I even considered adding a
limit for the mapping size, but then I decided that since RLIMIT_MEMLOCK is
anyway enforced there is no need for a new one.

I agree that getting fd-backed memory into a guest would be less pain that
VMA, but KVM can already use memory outside the control of the kernel via
/dev/map [1].

So unless I'm missing something here, there is no need to use MAP_EXCLUSIVE
for the guest memory.

[1] https://lwn.net/Articles/778240/

> > Switching to a chardev doesn't solve the major problem of direct
> > map fragmentation and defeats the ability to use exclusive memory mappings
> > with the existing allocators, while mprotect() and madvise() do not.
> >
> 
> Will people really want to do malloc() and then remap it exclusive?
> This sounds dubiously useful at best.

Again, my preference is to have mmap() only, but I see a value in this use
case as well. Application developers allocate memory and then sometimes
change its properties rather than go mmap() something. For such usage
mprotect() may be usefull.


-- 
Sincerely yours,
Mike.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ