[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210208212605.GX242749@kernel.org>
Date: Mon, 8 Feb 2021 23:26:05 +0200
From: Mike Rapoport <rppt@...nel.org>
To: Michal Hocko <mhocko@...e.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Andy Lutomirski <luto@...nel.org>,
Arnd Bergmann <arnd@...db.de>, Borislav Petkov <bp@...en8.de>,
Catalin Marinas <catalin.marinas@....com>,
Christopher Lameter <cl@...ux.com>,
Dan Williams <dan.j.williams@...el.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
David Hildenbrand <david@...hat.com>,
Elena Reshetova <elena.reshetova@...el.com>,
"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
James Bottomley <jejb@...ux.ibm.com>,
"Kirill A. Shutemov" <kirill@...temov.name>,
Matthew Wilcox <willy@...radead.org>,
Mark Rutland <mark.rutland@....com>,
Mike Rapoport <rppt@...ux.ibm.com>,
Michael Kerrisk <mtk.manpages@...il.com>,
Palmer Dabbelt <palmer@...belt.com>,
Paul Walmsley <paul.walmsley@...ive.com>,
Peter Zijlstra <peterz@...radead.org>,
Rick Edgecombe <rick.p.edgecombe@...el.com>,
Roman Gushchin <guro@...com>,
Shakeel Butt <shakeelb@...gle.com>,
Shuah Khan <shuah@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Tycho Andersen <tycho@...ho.ws>, Will Deacon <will@...nel.org>,
linux-api@...r.kernel.org, linux-arch@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org,
linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
linux-nvdimm@...ts.01.org, linux-riscv@...ts.infradead.org,
x86@...nel.org, Hagen Paul Pfeifer <hagen@...u.net>,
Palmer Dabbelt <palmerdabbelt@...gle.com>
Subject: Re: [PATCH v17 07/10] mm: introduce memfd_secret system call to
create "secret" memory areas
On Mon, Feb 08, 2021 at 11:49:22AM +0100, Michal Hocko wrote:
> On Mon 08-02-21 10:49:17, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@...ux.ibm.com>
> >
> > Introduce "memfd_secret" system call with the ability to create memory
> > areas visible only in the context of the owning process and not mapped not
> > only to other processes but in the kernel page tables as well.
> >
> > The secretmem feature is off by default and the user must explicitly enable
> > it at the boot time.
> >
> > Once secretmem is enabled, the user will be able to create a file
> > descriptor using the memfd_secret() system call. The memory areas created
> > by mmap() calls from this file descriptor will be unmapped from the kernel
> > direct map and they will be only mapped in the page table of the owning mm.
>
> Is this really true? I guess you meant to say that the memory will
> visible only via page tables to anybody who can mmap the respective file
> descriptor. There is nothing like an owning mm as the fd is inherently a
> shareable resource and the ownership becomes a very vague and hard to
> define term.
Hmm, it seems I've been dragging this paragraph from the very first
mmap(MAP_EXCLUSIVE) rfc and nobody (including myself) noticed the
inconsistency.
> > The file descriptor based memory has several advantages over the
> > "traditional" mm interfaces, such as mlock(), mprotect(), madvise(). It
> > paves the way for VMMs to remove the secret memory range from the process;
>
> I do not understand how it helps to remove the memory from the process
> as the interface explicitly allows to add a memory that is removed from
> all other processes via direct map.
The current implementation does not help to remove the memory from the
process, but using fd-backed memory seems a better interface to remove
guest memory from host mappings than mmap. As Andy nicely put it:
"Getting fd-backed memory into a guest will take some possibly major work in
the kernel, but getting vma-backed memory into a guest without mapping it
in the host user address space seems much, much worse."
> > As secret memory implementation is not an extension of tmpfs or hugetlbfs,
> > usage of a dedicated system call rather than hooking new functionality into
> > memfd_create(2) emphasises that memfd_secret(2) has different semantics and
> > allows better upwards compatibility.
>
> What is this supposed to mean? What are differences?
Well, the phrasing could be better indeed. That supposed to mean that
they differ in the semantics behind the file descriptor: memfd_create
implements sealing for shmem and hugetlbfs while memfd_secret implements
memory hidden from the kernel.
> > The secretmem mappings are locked in memory so they cannot exceed
> > RLIMIT_MEMLOCK. Since these mappings are already locked an attempt to
> > mlock() secretmem range would fail and mlockall() will ignore secretmem
> > mappings.
>
> What about munlock?
Isn't this implied? ;-)
I'll add a sentence about it.
--
Sincerely yours,
Mike.
Powered by blists - more mailing lists