linux-kernel - Re: [RFC PATCH 0/3] Add hugetlb MADV

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <21242c94-6748-b76d-f38e-5ac140c6117b@oracle.com>
Date:   Thu, 27 Jan 2022 09:55:02 -0800
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     David Hildenbrand <david@...hat.com>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org
Cc:     Michal Hocko <mhocko@...e.com>,
        Naoya Horiguchi <naoya.horiguchi@...ux.dev>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Peter Xu <peterx@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Mina Almasry <almasrymina@...gle.com>,
        Shuah Khan <shuah@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [RFC PATCH 0/3] Add hugetlb MADV_DONTNEED support

On 1/27/22 03:57, David Hildenbrand wrote:
> On 13.01.22 19:03, Mike Kravetz wrote:
>> Userfaultfd selftests for hugetlb does not perform UFFD_EVENT_REMAP
>> testing.  However, mremap support was recently added in commit
>> 550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage backed
>> vma").  While attempting to enable mremap support in the test, it was
>> discovered that the mremap test indirectly depends on MADV_DONTNEED.
>>
>> hugetlb does not support MADV_DONTNEED.  However, the only thing
>> preventing support is a check in can_madv_lru_vma().  Simply removing
>> the check will enable support.
>>
>> This is sent as a RFC because there is no existing use case calling
>> for hugetlb MADV_DONTNEED support except possibly the userfaultfd test.
>> However, adding support makes sense as it is fairly trivial and brings
>> hugetlb functionality more in line with 'normal' memory.
>>
> 
> Just a note:
> 
> QEMU doesn't use huge anonymous memory directly (MAP_ANON | MAP_HUGE...)
> but instead always goes either via hugetlbfs or via memfd. 
> 
> For MAP_PRIVATE hugetlb mappings, fallocate(FALLOC_FL_PUNCH_HOLE) seems
> to get the job done (IOW: also discards private anon pages). See the
> comments in the QEMU code below. I remember that that is somewhat
> inconsistent. For ordinary MAP_PRIVATE mapped files I remember that we
> always need fallocate(FALLOC_FL_PUNCH_HOLE) + madvise(QEMU_MADV_DONTNEED)
> to make sure
> 
> a) All file pages are removed
> b) All private anon pages are removed
> 
> IIRC hugetlbfs really is different in that regard, but maybe other fs
> behave similarly.

Yes it is really different.  And, some might even consider that a bug?
Imagine if those private anon (COW) pages contain important data.  They
could be unmapped/freed by some other process that has write access to
the hugetlb file on which the private mapping is based.

I believe this same issue exists for hugetlbfs ftruncate.  When fallocate
hole punch support was added, it was based on the ftruncate functionality.

I am hesitant to change the behavior of hugetlb hole punch or truncate
as people may be relying on that behavior today.  Your QEMU example is
one such case.

Thanks,
-- 
Mike Kravetz