linux-kernel - Re: [RFC 2/3] mm: changes to split_huge_page() to free zero filled tail pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <00f2dee2-ebc1-e732-f230-bc5b17da9f80@redhat.com>
Date:   Tue, 30 Aug 2022 14:33:42 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Rik van Riel <riel@...riel.com>, alexlzhu@...com,
        linux-mm@...ck.org
Cc:     willy@...radead.org, hannes@...xchg.org, akpm@...ux-foundation.org,
        kernel-team@...com, linux-kernel@...r.kernel.org
Subject: Re: [RFC 2/3] mm: changes to split_huge_page() to free zero filled
 tail pages

On 29.08.22 15:17, Rik van Riel wrote:
> On Mon, 2022-08-29 at 12:02 +0200, David Hildenbrand wrote:
>> On 26.08.22 23:18, Rik van Riel wrote:
>>> On Fri, 2022-08-26 at 12:18 +0200, David Hildenbrand wrote:
>>>> On 25.08.22 23:30, alexlzhu@...com wrote:
>>>>> From: Alexander Zhu <alexlzhu@...com>
>>>
>>> I could see wanting to maybe consolidate the scanning between
>>> KSM and this thing at some point, if it could be done without
>>> too much complexity, but keeping this change to split_huge_page
>>> looks like it might make sense even when KSM is enabled, since
>>> it will get rid of the unnecessary memory much faster than KSM
>>> could.
>>>
>>> Keeping a hundred MB of unnecessary memory around for longer
>>> would simply result in more THPs getting split up, and more
>>> memory pressure for a longer time than we need.
>>
>> Right. I was wondering if we want to map the shared zeropage instead
>> of
>> the "detected to be zero" page, similar to how KSM would do it. For
>> example, with userfaultfd there would be an observable difference.
>>
>> (maybe that's already done in this patch set)
>>
> The patch does not currently do that, but I suppose it could?
> 

It would be interesting to know why KSM decided to replace the mapped
page with the shared zeropage instead of dropping the page and letting
the next read fault populate the shared zeropage. That code predates
userfaultfd IIRC.

> What exactly are the userfaultfd differences here, and how does
> dropping 4kB pages break things vs. using the shared zeropage?

Once userfaultfd (missing mode) is enabled on a VMA:

1) khugepaged will no longer collapse pte_none(pteval), independent of
khugepaged_max_ptes_none setting -- see __collapse_huge_page_isolate.
[it will also not collapse zeropages, but I recall that that's not
actually required]

So it will not close holes, because the user space fault handler is in
charge of making a decision when something will get mapped there and
with which content.

2) Page faults will no longer populate a THP -- the user space handler
is notified instead and has to decide how the fault will be resolved
(place pages).

If you unmap something (resulting in pte_none()) where previously
something used to be mapped in a page table, you might suddenly inform
the user space fault handler about a page fault that it doesn't expect,
because it previously placed a page and did not zap that page itself
(MADV_DONTNEED).

So at least with userfaultfd I think we have to be careful. Not sure if
there are other corner cases (again, KSM behavior is interesting)

-- 
Thanks,

David / dhildenb