[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <21242c94-6748-b76d-f38e-5ac140c6117b@oracle.com>
Date: Thu, 27 Jan 2022 09:55:02 -0800
From: Mike Kravetz <mike.kravetz@...cle.com>
To: David Hildenbrand <david@...hat.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Cc: Michal Hocko <mhocko@...e.com>,
Naoya Horiguchi <naoya.horiguchi@...ux.dev>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Peter Xu <peterx@...hat.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Mina Almasry <almasrymina@...gle.com>,
Shuah Khan <shuah@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [RFC PATCH 0/3] Add hugetlb MADV_DONTNEED support
On 1/27/22 03:57, David Hildenbrand wrote:
> On 13.01.22 19:03, Mike Kravetz wrote:
>> Userfaultfd selftests for hugetlb does not perform UFFD_EVENT_REMAP
>> testing. However, mremap support was recently added in commit
>> 550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage backed
>> vma"). While attempting to enable mremap support in the test, it was
>> discovered that the mremap test indirectly depends on MADV_DONTNEED.
>>
>> hugetlb does not support MADV_DONTNEED. However, the only thing
>> preventing support is a check in can_madv_lru_vma(). Simply removing
>> the check will enable support.
>>
>> This is sent as a RFC because there is no existing use case calling
>> for hugetlb MADV_DONTNEED support except possibly the userfaultfd test.
>> However, adding support makes sense as it is fairly trivial and brings
>> hugetlb functionality more in line with 'normal' memory.
>>
>
> Just a note:
>
> QEMU doesn't use huge anonymous memory directly (MAP_ANON | MAP_HUGE...)
> but instead always goes either via hugetlbfs or via memfd.
>
> For MAP_PRIVATE hugetlb mappings, fallocate(FALLOC_FL_PUNCH_HOLE) seems
> to get the job done (IOW: also discards private anon pages). See the
> comments in the QEMU code below. I remember that that is somewhat
> inconsistent. For ordinary MAP_PRIVATE mapped files I remember that we
> always need fallocate(FALLOC_FL_PUNCH_HOLE) + madvise(QEMU_MADV_DONTNEED)
> to make sure
>
> a) All file pages are removed
> b) All private anon pages are removed
>
> IIRC hugetlbfs really is different in that regard, but maybe other fs
> behave similarly.
Yes it is really different. And, some might even consider that a bug?
Imagine if those private anon (COW) pages contain important data. They
could be unmapped/freed by some other process that has write access to
the hugetlb file on which the private mapping is based.
I believe this same issue exists for hugetlbfs ftruncate. When fallocate
hole punch support was added, it was based on the ftruncate functionality.
I am hesitant to change the behavior of hugetlb hole punch or truncate
as people may be relying on that behavior today. Your QEMU example is
one such case.
Thanks,
--
Mike Kravetz
Powered by blists - more mailing lists