linux-kernel - Re: [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210208212605.GX242749@kernel.org>
Date:   Mon, 8 Feb 2021 23:26:05 +0200
From:   Mike Rapoport <rppt@...nel.org>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Andy Lutomirski <luto@...nel.org>,
        Arnd Bergmann <arnd@...db.de>, Borislav Petkov <bp@...en8.de>,
        Catalin Marinas <catalin.marinas@....com>,
        Christopher Lameter <cl@...ux.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        David Hildenbrand <david@...hat.com>,
        Elena Reshetova <elena.reshetova@...el.com>,
        "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
        James Bottomley <jejb@...ux.ibm.com>,
        "Kirill A. Shutemov" <kirill@...temov.name>,
        Matthew Wilcox <willy@...radead.org>,
        Mark Rutland <mark.rutland@....com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Michael Kerrisk <mtk.manpages@...il.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Rick Edgecombe <rick.p.edgecombe@...el.com>,
        Roman Gushchin <guro@...com>,
        Shakeel Butt <shakeelb@...gle.com>,
        Shuah Khan <shuah@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Tycho Andersen <tycho@...ho.ws>, Will Deacon <will@...nel.org>,
        linux-api@...r.kernel.org, linux-arch@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org,
        linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
        linux-nvdimm@...ts.01.org, linux-riscv@...ts.infradead.org,
        x86@...nel.org, Hagen Paul Pfeifer <hagen@...u.net>,
        Palmer Dabbelt <palmerdabbelt@...gle.com>
Subject: Re: [PATCH v17 07/10] mm: introduce memfd_secret system call to
 create "secret" memory areas

On Mon, Feb 08, 2021 at 11:49:22AM +0100, Michal Hocko wrote:
> On Mon 08-02-21 10:49:17, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@...ux.ibm.com>
> > 
> > Introduce "memfd_secret" system call with the ability to create memory
> > areas visible only in the context of the owning process and not mapped not
> > only to other processes but in the kernel page tables as well.
> > 
> > The secretmem feature is off by default and the user must explicitly enable
> > it at the boot time.
> > 
> > Once secretmem is enabled, the user will be able to create a file
> > descriptor using the memfd_secret() system call. The memory areas created
> > by mmap() calls from this file descriptor will be unmapped from the kernel
> > direct map and they will be only mapped in the page table of the owning mm.
> 
> Is this really true? I guess you meant to say that the memory will
> visible only via page tables to anybody who can mmap the respective file
> descriptor. There is nothing like an owning mm as the fd is inherently a
> shareable resource and the ownership becomes a very vague and hard to
> define term.

Hmm, it seems I've been dragging this paragraph from the very first
mmap(MAP_EXCLUSIVE) rfc and nobody (including myself) noticed the
inconsistency.
 
> > The file descriptor based memory has several advantages over the
> > "traditional" mm interfaces, such as mlock(), mprotect(), madvise(). It
> > paves the way for VMMs to remove the secret memory range from the process;
> 
> I do not understand how it helps to remove the memory from the process
> as the interface explicitly allows to add a memory that is removed from
> all other processes via direct map.

The current implementation does not help to remove the memory from the
process, but using fd-backed memory seems a better interface to remove
guest memory from host mappings than mmap. As Andy nicely put it:

"Getting fd-backed memory into a guest will take some possibly major work in
the kernel, but getting vma-backed memory into a guest without mapping it
in the host user address space seems much, much worse."
 
> > As secret memory implementation is not an extension of tmpfs or hugetlbfs,
> > usage of a dedicated system call rather than hooking new functionality into
> > memfd_create(2) emphasises that memfd_secret(2) has different semantics and
> > allows better upwards compatibility.
> 
> What is this supposed to mean? What are differences?

Well, the phrasing could be better indeed. That supposed to mean that
they differ in the semantics behind the file descriptor: memfd_create
implements sealing for shmem and hugetlbfs while memfd_secret implements
memory hidden from the kernel.
 
> > The secretmem mappings are locked in memory so they cannot exceed
> > RLIMIT_MEMLOCK. Since these mappings are already locked an attempt to
> > mlock() secretmem range would fail and mlockall() will ignore secretmem
> > mappings.
> 
> What about munlock?

Isn't this implied? ;-)
I'll add a sentence about it.
 
-- 
Sincerely yours,
Mike.