lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 30 Aug 2022 14:33:42 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Rik van Riel <riel@...riel.com>, alexlzhu@...com,
        linux-mm@...ck.org
Cc:     willy@...radead.org, hannes@...xchg.org, akpm@...ux-foundation.org,
        kernel-team@...com, linux-kernel@...r.kernel.org
Subject: Re: [RFC 2/3] mm: changes to split_huge_page() to free zero filled
 tail pages

On 29.08.22 15:17, Rik van Riel wrote:
> On Mon, 2022-08-29 at 12:02 +0200, David Hildenbrand wrote:
>> On 26.08.22 23:18, Rik van Riel wrote:
>>> On Fri, 2022-08-26 at 12:18 +0200, David Hildenbrand wrote:
>>>> On 25.08.22 23:30, alexlzhu@...com wrote:
>>>>> From: Alexander Zhu <alexlzhu@...com>
>>>
>>> I could see wanting to maybe consolidate the scanning between
>>> KSM and this thing at some point, if it could be done without
>>> too much complexity, but keeping this change to split_huge_page
>>> looks like it might make sense even when KSM is enabled, since
>>> it will get rid of the unnecessary memory much faster than KSM
>>> could.
>>>
>>> Keeping a hundred MB of unnecessary memory around for longer
>>> would simply result in more THPs getting split up, and more
>>> memory pressure for a longer time than we need.
>>
>> Right. I was wondering if we want to map the shared zeropage instead
>> of
>> the "detected to be zero" page, similar to how KSM would do it. For
>> example, with userfaultfd there would be an observable difference.
>>
>> (maybe that's already done in this patch set)
>>
> The patch does not currently do that, but I suppose it could?
> 

It would be interesting to know why KSM decided to replace the mapped
page with the shared zeropage instead of dropping the page and letting
the next read fault populate the shared zeropage. That code predates
userfaultfd IIRC.

> What exactly are the userfaultfd differences here, and how does
> dropping 4kB pages break things vs. using the shared zeropage?

Once userfaultfd (missing mode) is enabled on a VMA:

1) khugepaged will no longer collapse pte_none(pteval), independent of
khugepaged_max_ptes_none setting -- see __collapse_huge_page_isolate.
[it will also not collapse zeropages, but I recall that that's not
actually required]

So it will not close holes, because the user space fault handler is in
charge of making a decision when something will get mapped there and
with which content.


2) Page faults will no longer populate a THP -- the user space handler
is notified instead and has to decide how the fault will be resolved
(place pages).


If you unmap something (resulting in pte_none()) where previously
something used to be mapped in a page table, you might suddenly inform
the user space fault handler about a page fault that it doesn't expect,
because it previously placed a page and did not zap that page itself
(MADV_DONTNEED).

So at least with userfaultfd I think we have to be careful. Not sure if
there are other corner cases (again, KSM behavior is interesting)

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ