linux-kernel - Re: [PATCH v2 1/1] mm: skip mlocked THPs that are underused early in deferred_split

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f58a472f-4a36-40e7-94d2-229125ae7373@redhat.com>
Date: Mon, 8 Sep 2025 14:04:05 +0200
From: David Hildenbrand <david@...hat.com>
To: Kiryl Shutsemau <kirill@...temov.name>
Cc: Lance Yang <lance.yang@...ux.dev>, akpm@...ux-foundation.org,
 Liam.Howlett@...cle.com, baohua@...nel.org, baolin.wang@...ux.alibaba.com,
 dev.jain@....com, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 lorenzo.stoakes@...cle.com, npache@...hat.com, ryan.roberts@....com,
 usamaarif642@...il.com, ziy@...dia.com
Subject: Re: [PATCH v2 1/1] mm: skip mlocked THPs that are underused early in
 deferred_split_scan()

On 08.09.25 13:44, Kiryl Shutsemau wrote:
> On Mon, Sep 08, 2025 at 01:32:05PM +0200, David Hildenbrand wrote:
>> On 08.09.25 12:38, Kiryl Shutsemau wrote:
>>> On Mon, Sep 08, 2025 at 05:07:41PM +0800, Lance Yang wrote:
>>>> From: Lance Yang <lance.yang@...ux.dev>
>>>>
>>>> When we stumble over a fully-mapped mlocked THP in the deferred shrinker,
>>>> it does not make sense to try to detect whether it is underused, because
>>>> try_to_map_unused_to_zeropage(), called while splitting the folio, will not
>>>> actually replace any zeroed pages by the shared zeropage.
>>>
>>> It makes me think, does KSM follows the same logic as
>>> try_to_map_unused_to_zeropage()?
>>>
>>> I cannot immediately find what prevents KSM from replacing zeroed mlocked
>>> folio with ZERO_PAGE().
>>>
>>> Hm?
>>
>> I assume if you're using mlock and at the same time enable KSM for a
>> process/VMA, you're doing something wrong.
>>
>> In contrast, THP is supposed to be transparent (yeah, I know ...).
> 
> Yeah, I guess it is user error.
> 
> Maybe we should make ksm_compatible() return false for VM_LOCKED?
> KSM breaks mlock() contract.

I was thinking the same and falsely remembered that we would already be 
checking for that.

> 
> But it can be risky if someone already relies on this broken behaviour.

Could be.

Staring at QEMU, we have the following parameters:

	mem-merge=on|off

     	Enables or disables memory merge support. This feature, when 	
         supported by the host, de-duplicates identical memory pages
         among VMs instances (enabled by default).

And

	-overcommit mem-lock=on|off|on-fault

	"Run qemu with hints about host resource overcommit. The default
	 is to assume that host overcommits all resources."


Now, I would assume that anybody who sets "-overcommit mem-lock=on" either

(a) Has KSM disabled on that machine.

(b) Sets mem-merge=off

as well. But QEMU would allow for configuring it.


Interestingly, mm_populate()->populate_vma_page_range() wants to break 
COW. [*]

But if the app later calls fork(), we still allow for cow-sharing pages 
with the child. (another case of "don't do it", like KSM I guess)


[*] it doesn't do it for mappings that start out R/O. I think we might 
end up with sharedzero pages in that case, but not sure if worth fixing.

-- 
Cheers

David / dhildenb