linux-kernel - Re: [PATCH 4/6] hugetlbfs: implement memfd sealing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c6c1c10f-a572-bdda-fe5d-5c28ce1c7a11@oracle.com>
Date:   Fri, 3 Nov 2017 16:31:06 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     David Herrmann <dh.herrmann@...il.com>
Cc:     Marc-André Lureau <marcandre.lureau@...hat.com>,
        linux-mm <linux-mm@...ck.org>,
        linux-kernel <linux-kernel@...r.kernel.org>, aarcange@...hat.com,
        Hugh Dickins <hughd@...gle.com>, nyc@...omorphy.com
Subject: Re: [PATCH 4/6] hugetlbfs: implement memfd sealing

On 11/03/2017 10:56 AM, Mike Kravetz wrote:
> On 11/03/2017 10:41 AM, David Herrmann wrote:
>> Hi
>>
>> On Fri, Nov 3, 2017 at 6:12 PM, Mike Kravetz <mike.kravetz@...cle.com> wrote:
>>> On 11/03/2017 10:03 AM, David Herrmann wrote:
>>>> Hi
>>>>
>>>> On Tue, Oct 31, 2017 at 7:40 PM, Marc-André Lureau
>>>> <marcandre.lureau@...hat.com> wrote:
>>>>> Implements memfd sealing, similar to shmem:
>>>>> - WRITE: deny fallocate(PUNCH_HOLE). mmap() write is denied in
>>>>>   memfd_add_seals(). write() doesn't exist for hugetlbfs.
>>>>> - SHRINK: added similar check as shmem_setattr()
>>>>> - GROW: added similar check as shmem_setattr() & shmem_fallocate()
>>>>>
>>>>> Except write() operation that doesn't exist with hugetlbfs, that
>>>>> should make sealing as close as it can be to shmem support.
>>>>
>>>> SEAL, SHRINK, and GROW look fine to me.
>>>>
>>>> Regarding WRITE
>>>
>>> The commit message may not be clear.  However, hugetlbfs does not support
>>> the write system call (or aio).  The only way to modify contents of a
>>> hugetlbfs file is via mmap or hole punch/truncate.  So, we do not really
>>> need to worry about those special (a)io cases for hugetlbfs.
>>
>> This is not about the write(2) syscall. Please consider this scenario
>> about shmem:
>>
>> You create a memfd via memfd_create() and map it writable. You now
>> call another kernel syscall that takes as input _any mapped page
>> range_. You pass your mapped memfd-addresses to it. Those syscalls
>> tend to use get_user_pages() to pin arbitrary user-mapped pages, as
>> such this also affects shmem. In this case, those pages might stay
>> mapped even if you munmap() your memfd!
>>
>> One example of this is using AIO-read() on any other file that
>> supports it, passing your mapped memfd as buffer to _read into_. The
>> operations supported on the memfd are irrelevant here.
>> The selftests contain a FUSE-based test for this, since FUSE allows
>> user-space to GUP pages for an arbitrary amount of time.
>>
>> The original fix for this is:
>>
>>     commit 05f65b5c70909ef686f865f0a85406d74d75f70f
>>     Author: David Herrmann <dh.herrmann@...il.com>
>>     Date:   Fri Aug 8 14:25:36 2014 -0700
>>
>>         shm: wait for pins to be released when sealing
>>
>> Please have a look at this. Your patches use shmem_add_seals() almost
>> unchanged, and as such you call into shmem_wait_for_pins() on
>> hugetlbfs. I would really like to see an explicit ACK that this works
>> on hugetlbfs.
> 
> Thanks for the explanation.  I missed that in your first reply.  I'll
> look into this for hugetlbfs.

I reviewed the routines in the above commit and did not see anything that
would prevent them from working properly with hugetlbfs.  I modified the
fuse test to use hugetlbfs based mapping.  I also instrumented the above
routines and verified that tags were set/checked/cleared as designed for
hugetlb pages.  So, that is an ACK on working with hugetlbfs.

This does bring up the point that the fuse seals test should also be
modified to work with hugetlbfs as part of this series.

-- 
Mike Kravetz