lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 26 Jan 2021 10:53:08 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Michal Hocko <mhocko@...e.com>, Mike Rapoport <rppt@...nel.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Andy Lutomirski <luto@...nel.org>,
        Arnd Bergmann <arnd@...db.de>, Borislav Petkov <bp@...en8.de>,
        Catalin Marinas <catalin.marinas@....com>,
        Christopher Lameter <cl@...ux.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Elena Reshetova <elena.reshetova@...el.com>,
        "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
        James Bottomley <jejb@...ux.ibm.com>,
        "Kirill A. Shutemov" <kirill@...temov.name>,
        Matthew Wilcox <willy@...radead.org>,
        Mark Rutland <mark.rutland@....com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Michael Kerrisk <mtk.manpages@...il.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Rick Edgecombe <rick.p.edgecombe@...el.com>,
        Roman Gushchin <guro@...com>,
        Shakeel Butt <shakeelb@...gle.com>,
        Shuah Khan <shuah@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Tycho Andersen <tycho@...ho.ws>, Will Deacon <will@...nel.org>,
        linux-api@...r.kernel.org, linux-arch@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org,
        linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
        linux-nvdimm@...ts.01.org, linux-riscv@...ts.infradead.org,
        x86@...nel.org, Hagen Paul Pfeifer <hagen@...u.net>,
        Palmer Dabbelt <palmerdabbelt@...gle.com>
Subject: Re: [PATCH v16 06/11] mm: introduce memfd_secret system call to
 create "secret" memory areas

On 26.01.21 10:49, Michal Hocko wrote:
> On Tue 26-01-21 11:20:11, Mike Rapoport wrote:
>> On Tue, Jan 26, 2021 at 10:00:13AM +0100, Michal Hocko wrote:
>>> On Tue 26-01-21 10:33:11, Mike Rapoport wrote:
>>>> On Tue, Jan 26, 2021 at 08:16:14AM +0100, Michal Hocko wrote:
>>>>> On Mon 25-01-21 23:36:18, Mike Rapoport wrote:
>>>>>> On Mon, Jan 25, 2021 at 06:01:22PM +0100, Michal Hocko wrote:
>>>>>>> On Thu 21-01-21 14:27:18, Mike Rapoport wrote:
>>>>>>>> From: Mike Rapoport <rppt@...ux.ibm.com>
>>>>>>>>
>>>>>>>> Introduce "memfd_secret" system call with the ability to create memory
>>>>>>>> areas visible only in the context of the owning process and not mapped not
>>>>>>>> only to other processes but in the kernel page tables as well.
>>>>>>>>
>>>>>>>> The user will create a file descriptor using the memfd_secret() system
>>>>>>>> call. The memory areas created by mmap() calls from this file descriptor
>>>>>>>> will be unmapped from the kernel direct map and they will be only mapped in
>>>>>>>> the page table of the owning mm.
>>>>>>>>
>>>>>>>> The secret memory remains accessible in the process context using uaccess
>>>>>>>> primitives, but it is not accessible using direct/linear map addresses.
>>>>>>>>
>>>>>>>> Functions in the follow_page()/get_user_page() family will refuse to return
>>>>>>>> a page that belongs to the secret memory area.
>>>>>>>>
>>>>>>>> A page that was a part of the secret memory area is cleared when it is
>>>>>>>> freed.
>>>>>>>>
>>>>>>>> The following example demonstrates creation of a secret mapping (error
>>>>>>>> handling is omitted):
>>>>>>>>
>>>>>>>> 	fd = memfd_secret(0);
>>>>>>>> 	ftruncate(fd, MAP_SIZE);
>>>>>>>> 	ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>>>>>>>
>>>>>>> I do not see any access control or permission model for this feature.
>>>>>>> Is this feature generally safe to anybody?
>>>>>>
>>>>>> The mappings obey memlock limit. Besides, this feature should be enabled
>>>>>> explicitly at boot with the kernel parameter that says what is the maximal
>>>>>> memory size secretmem can consume.
>>>>>
>>>>> Why is such a model sufficient and future proof? I mean even when it has
>>>>> to be enabled by an admin it is still all or nothing approach. Mlock
>>>>> limit is not really useful because it is per mm rather than per user.
>>>>>
>>>>> Is there any reason why this is allowed for non-privileged processes?
>>>>> Maybe this has been discussed in the past but is there any reason why
>>>>> this cannot be done by a special device which will allow to provide at
>>>>> least some permission policy?
>>>>  
>>>> Why this should not be allowed for non-privileged processes? This behaves
>>>> similarly to mlocked memory, so I don't see a reason why secretmem should
>>>> have different permissions model.
>>>
>>> Because appart from the reclaim aspect it fragments the direct mapping
>>> IIUC. That might have an impact on all others, right?
>>
>> It does fragment the direct map, but first it only splits 1G pages to 2M
>> pages and as was discussed several times already it's not that clear which
>> page size in the direct map is the best and this is very much workload
>> dependent.
> 
> I do appreciate this has been discussed but this changelog is not
> specific on any of that reasoning and I am pretty sure nobody will
> remember details in few years in the future. Also some numbers would be
> appropriate.
> 
>> These are the results of the benchmarks I've run with the default direct
>> mapping covered with 1G pages, with disabled 1G pages using "nogbpages" in
>> the kernel command line and with the entire direct map forced to use 4K
>> pages using a simple patch to arch/x86/mm/init.c.
>>
>> https://docs.google.com/spreadsheets/d/1tdD-cu8e93vnfGsTFxZ5YdaEfs2E1GELlvWNOGkJV2U/edit?usp=sharing
> 
> A good start for the data I am asking above.

I assume you've seen the benchmark results provided by Xing Zhengjun

https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ